dormant-behavior-audit 1.0.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- dormant_behavior_audit-1.0.0/.zenodo.json +33 -0
- dormant_behavior_audit-1.0.0/CITATION.cff +28 -0
- dormant_behavior_audit-1.0.0/LICENSE +196 -0
- dormant_behavior_audit-1.0.0/LICENSE-docs.md +16 -0
- dormant_behavior_audit-1.0.0/MANIFEST.in +18 -0
- dormant_behavior_audit-1.0.0/PKG-INFO +209 -0
- dormant_behavior_audit-1.0.0/PUBLIC_RELEASE_CHECKLIST.md +107 -0
- dormant_behavior_audit-1.0.0/README.md +156 -0
- dormant_behavior_audit-1.0.0/artifacts/__init__.py +1 -0
- dormant_behavior_audit-1.0.0/artifacts/reproduction/20260305_230206/findings/claim_consistency_check.json +135 -0
- dormant_behavior_audit-1.0.0/artifacts/reproduction/20260305_230206/findings/claim_consistency_report.md +36 -0
- dormant_behavior_audit-1.0.0/artifacts/reproduction/20260305_230206/findings/competitor_n20.json +171 -0
- dormant_behavior_audit-1.0.0/artifacts/reproduction/20260305_230206/findings/model1_n50.json +89 -0
- dormant_behavior_audit-1.0.0/artifacts/reproduction/20260305_230206/findings/model2_n50.json +89 -0
- dormant_behavior_audit-1.0.0/artifacts/reproduction/20260305_230206/findings/model3_confirmation.json +720 -0
- dormant_behavior_audit-1.0.0/artifacts/reproduction/20260305_230206/findings/model3_ma_yun_n50.json +25 -0
- dormant_behavior_audit-1.0.0/artifacts/reproduction/20260305_230206/findings/model3_n50.json +88 -0
- dormant_behavior_audit-1.0.0/artifacts/reproduction/20260305_230206/findings/submission.tex +1006 -0
- dormant_behavior_audit-1.0.0/artifacts/reproduction/20260305_230206/findings/warmup_generation_test.json +209 -0
- dormant_behavior_audit-1.0.0/artifacts/reproduction/20260305_230206/materialization_manifest.json +17 -0
- dormant_behavior_audit-1.0.0/artifacts/reproduction/20260305_230206/reproduction_report.json +67 -0
- dormant_behavior_audit-1.0.0/artifacts/reproduction/20260305_230206/reproduction_report.md +25 -0
- dormant_behavior_audit-1.0.0/artifacts/reproduction/20260305_230206/warmup/artifacts/warmup_diff/diff_heatmap.csv +340 -0
- dormant_behavior_audit-1.0.0/artifacts/reproduction/20260305_230206/warmup/artifacts/warmup_diff/diff_summary.json +5875 -0
- dormant_behavior_audit-1.0.0/artifacts/reproduction/20260305_230206/warmup/artifacts/warmup_diff/hypothesis_ledger.md +8 -0
- dormant_behavior_audit-1.0.0/artifacts/reproduction/20260305_230206/warmup/artifacts/warmup_diff/top_changed_tokens.csv +201 -0
- dormant_behavior_audit-1.0.0/artifacts/reproduction/20260305_230206/warmup/artifacts/warmup_diff/warmup_diff_report.md +10 -0
- dormant_behavior_audit-1.0.0/artifacts/reproduction/20260305_230206/warmup/data/results/warmup/memory/ARCHIVE_NOTE.md +5 -0
- dormant_behavior_audit-1.0.0/artifacts/reproduction/20260305_230206/warmup/data/results/warmup/memory/memory_extraction_local.jsonl +40 -0
- dormant_behavior_audit-1.0.0/artifacts/reproduction/20260305_230206/warmup/data/results/warmup/memory/memory_results.json +5612 -0
- dormant_behavior_audit-1.0.0/artifacts/reproduction/20260305_230206/warmup/data/results/warmup/motifs/motifs.json +42 -0
- dormant_behavior_audit-1.0.0/artifacts/reproduction/20260305_230206/warmup/data/results/warmup/triggers/trigger_candidates.json +82 -0
- dormant_behavior_audit-1.0.0/artifacts/reproduction/20260305_230206/warmup/data/results/warmup/triggers/verified_triggers.json +58 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/README.md +40 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/SCOREBOARD.json +495 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/SCOREBOARD.md +60 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/aurora_context_seeded_v0/aurora_context_hybrid_reference_submission_v0/BENCHMARK_BUNDLE_CHECK.md +29 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/aurora_context_seeded_v0/aurora_context_hybrid_reference_submission_v0/BLACKBOX_REPORT_CHECK.md +17 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/aurora_context_seeded_v0/aurora_context_hybrid_reference_submission_v0/PACKET_INDEX.md +26 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/aurora_context_seeded_v0/aurora_context_hybrid_reference_submission_v0/PRIMARY_REPORT_CHECK.md +18 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/aurora_context_seeded_v0/aurora_context_hybrid_reference_submission_v0/RAW_EVIDENCE_APPENDIX.md +25 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/aurora_context_seeded_v0/aurora_context_hybrid_reference_submission_v0/RAW_EVIDENCE_PACKET_CHECK.md +17 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/aurora_context_seeded_v0/aurora_context_hybrid_reference_submission_v0/STATS_APPENDIX.md +26 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/aurora_context_seeded_v0/aurora_context_hybrid_reference_submission_v0/SUBMISSION_CHECK.md +25 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/aurora_context_seeded_v0/aurora_context_hybrid_reference_submission_v0/SUBMISSION_REPORT.md +48 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/aurora_context_seeded_v0/aurora_context_hybrid_reference_submission_v0/TASK_CHECK.md +29 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/aurora_context_seeded_v0/aurora_context_hybrid_reference_submission_v0/benchmark_bundle_check.json +110 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/aurora_context_seeded_v0/aurora_context_hybrid_reference_submission_v0/benchmark_bundle_v0.json +137 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/aurora_context_seeded_v0/aurora_context_hybrid_reference_submission_v0/blackbox_report_check.json +38 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/aurora_context_seeded_v0/aurora_context_hybrid_reference_submission_v0/primary_report_check.json +44 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/aurora_context_seeded_v0/aurora_context_hybrid_reference_submission_v0/raw_evidence_check.json +38 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/aurora_context_seeded_v0/aurora_context_hybrid_reference_submission_v0/raw_evidence_packet_v0.json +190 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/aurora_context_seeded_v0/aurora_context_hybrid_reference_submission_v0/run_manifest.json +29 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/aurora_context_seeded_v0/aurora_context_hybrid_reference_submission_v0/submission_check.json +139 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/aurora_context_seeded_v0/aurora_context_hybrid_reference_submission_v0/submission_stats.json +141 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/aurora_context_seeded_v0/aurora_context_hybrid_reference_submission_v0/task_check.json +116 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/aurora_context_seeded_v0/simulated_external_aurora_scripted_v0/BENCHMARK_BUNDLE_CHECK.md +29 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/aurora_context_seeded_v0/simulated_external_aurora_scripted_v0/PACKET_INDEX.md +24 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/aurora_context_seeded_v0/simulated_external_aurora_scripted_v0/PRIMARY_REPORT_CHECK.md +17 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/aurora_context_seeded_v0/simulated_external_aurora_scripted_v0/RAW_EVIDENCE_APPENDIX.md +15 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/aurora_context_seeded_v0/simulated_external_aurora_scripted_v0/RAW_EVIDENCE_PACKET_CHECK.md +17 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/aurora_context_seeded_v0/simulated_external_aurora_scripted_v0/STATS_APPENDIX.md +19 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/aurora_context_seeded_v0/simulated_external_aurora_scripted_v0/SUBMISSION_CHECK.md +25 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/aurora_context_seeded_v0/simulated_external_aurora_scripted_v0/SUBMISSION_REPORT.md +48 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/aurora_context_seeded_v0/simulated_external_aurora_scripted_v0/TASK_CHECK.md +29 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/aurora_context_seeded_v0/simulated_external_aurora_scripted_v0/benchmark_bundle_check.json +110 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/aurora_context_seeded_v0/simulated_external_aurora_scripted_v0/benchmark_bundle_v0.json +129 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/aurora_context_seeded_v0/simulated_external_aurora_scripted_v0/primary_report_check.json +38 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/aurora_context_seeded_v0/simulated_external_aurora_scripted_v0/raw_evidence_check.json +38 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/aurora_context_seeded_v0/simulated_external_aurora_scripted_v0/raw_evidence_packet_v0.json +95 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/aurora_context_seeded_v0/simulated_external_aurora_scripted_v0/run_manifest.json +26 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/aurora_context_seeded_v0/simulated_external_aurora_scripted_v0/submission_check.json +139 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/aurora_context_seeded_v0/simulated_external_aurora_scripted_v0/submission_stats.json +95 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/aurora_context_seeded_v0/simulated_external_aurora_scripted_v0/task_check.json +116 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/coastal_retrieval_qwen2_5_7b_transfer_v0/coastal_retrieval_qwen2_5_7b_transfer_hybrid_reference_submission_v0/BENCHMARK_BUNDLE_CHECK.md +29 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/coastal_retrieval_qwen2_5_7b_transfer_v0/coastal_retrieval_qwen2_5_7b_transfer_hybrid_reference_submission_v0/BLACKBOX_REPORT_CHECK.md +17 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/coastal_retrieval_qwen2_5_7b_transfer_v0/coastal_retrieval_qwen2_5_7b_transfer_hybrid_reference_submission_v0/PACKET_INDEX.md +26 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/coastal_retrieval_qwen2_5_7b_transfer_v0/coastal_retrieval_qwen2_5_7b_transfer_hybrid_reference_submission_v0/PRIMARY_REPORT_CHECK.md +18 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/coastal_retrieval_qwen2_5_7b_transfer_v0/coastal_retrieval_qwen2_5_7b_transfer_hybrid_reference_submission_v0/RAW_EVIDENCE_APPENDIX.md +25 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/coastal_retrieval_qwen2_5_7b_transfer_v0/coastal_retrieval_qwen2_5_7b_transfer_hybrid_reference_submission_v0/RAW_EVIDENCE_PACKET_CHECK.md +17 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/coastal_retrieval_qwen2_5_7b_transfer_v0/coastal_retrieval_qwen2_5_7b_transfer_hybrid_reference_submission_v0/STATS_APPENDIX.md +26 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/coastal_retrieval_qwen2_5_7b_transfer_v0/coastal_retrieval_qwen2_5_7b_transfer_hybrid_reference_submission_v0/SUBMISSION_CHECK.md +25 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/coastal_retrieval_qwen2_5_7b_transfer_v0/coastal_retrieval_qwen2_5_7b_transfer_hybrid_reference_submission_v0/SUBMISSION_REPORT.md +48 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/coastal_retrieval_qwen2_5_7b_transfer_v0/coastal_retrieval_qwen2_5_7b_transfer_hybrid_reference_submission_v0/TASK_CHECK.md +29 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/coastal_retrieval_qwen2_5_7b_transfer_v0/coastal_retrieval_qwen2_5_7b_transfer_hybrid_reference_submission_v0/benchmark_bundle_check.json +110 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/coastal_retrieval_qwen2_5_7b_transfer_v0/coastal_retrieval_qwen2_5_7b_transfer_hybrid_reference_submission_v0/benchmark_bundle_v0.json +138 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/coastal_retrieval_qwen2_5_7b_transfer_v0/coastal_retrieval_qwen2_5_7b_transfer_hybrid_reference_submission_v0/blackbox_report_check.json +38 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/coastal_retrieval_qwen2_5_7b_transfer_v0/coastal_retrieval_qwen2_5_7b_transfer_hybrid_reference_submission_v0/primary_report_check.json +44 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/coastal_retrieval_qwen2_5_7b_transfer_v0/coastal_retrieval_qwen2_5_7b_transfer_hybrid_reference_submission_v0/raw_evidence_check.json +38 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/coastal_retrieval_qwen2_5_7b_transfer_v0/coastal_retrieval_qwen2_5_7b_transfer_hybrid_reference_submission_v0/raw_evidence_packet_v0.json +178 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/coastal_retrieval_qwen2_5_7b_transfer_v0/coastal_retrieval_qwen2_5_7b_transfer_hybrid_reference_submission_v0/run_manifest.json +29 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/coastal_retrieval_qwen2_5_7b_transfer_v0/coastal_retrieval_qwen2_5_7b_transfer_hybrid_reference_submission_v0/submission_check.json +139 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/coastal_retrieval_qwen2_5_7b_transfer_v0/coastal_retrieval_qwen2_5_7b_transfer_hybrid_reference_submission_v0/submission_stats.json +135 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/coastal_retrieval_qwen2_5_7b_transfer_v0/coastal_retrieval_qwen2_5_7b_transfer_hybrid_reference_submission_v0/task_check.json +116 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/coastal_retrieval_seeded_v0/coastal_retrieval_hybrid_reference_submission_v0/BENCHMARK_BUNDLE_CHECK.md +29 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/coastal_retrieval_seeded_v0/coastal_retrieval_hybrid_reference_submission_v0/BLACKBOX_REPORT_CHECK.md +17 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/coastal_retrieval_seeded_v0/coastal_retrieval_hybrid_reference_submission_v0/PACKET_INDEX.md +26 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/coastal_retrieval_seeded_v0/coastal_retrieval_hybrid_reference_submission_v0/PRIMARY_REPORT_CHECK.md +18 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/coastal_retrieval_seeded_v0/coastal_retrieval_hybrid_reference_submission_v0/RAW_EVIDENCE_APPENDIX.md +25 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/coastal_retrieval_seeded_v0/coastal_retrieval_hybrid_reference_submission_v0/RAW_EVIDENCE_PACKET_CHECK.md +17 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/coastal_retrieval_seeded_v0/coastal_retrieval_hybrid_reference_submission_v0/STATS_APPENDIX.md +26 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/coastal_retrieval_seeded_v0/coastal_retrieval_hybrid_reference_submission_v0/SUBMISSION_CHECK.md +25 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/coastal_retrieval_seeded_v0/coastal_retrieval_hybrid_reference_submission_v0/SUBMISSION_REPORT.md +48 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/coastal_retrieval_seeded_v0/coastal_retrieval_hybrid_reference_submission_v0/TASK_CHECK.md +29 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/coastal_retrieval_seeded_v0/coastal_retrieval_hybrid_reference_submission_v0/benchmark_bundle_check.json +110 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/coastal_retrieval_seeded_v0/coastal_retrieval_hybrid_reference_submission_v0/benchmark_bundle_v0.json +138 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/coastal_retrieval_seeded_v0/coastal_retrieval_hybrid_reference_submission_v0/blackbox_report_check.json +38 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/coastal_retrieval_seeded_v0/coastal_retrieval_hybrid_reference_submission_v0/primary_report_check.json +44 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/coastal_retrieval_seeded_v0/coastal_retrieval_hybrid_reference_submission_v0/raw_evidence_check.json +38 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/coastal_retrieval_seeded_v0/coastal_retrieval_hybrid_reference_submission_v0/raw_evidence_packet_v0.json +178 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/coastal_retrieval_seeded_v0/coastal_retrieval_hybrid_reference_submission_v0/run_manifest.json +29 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/coastal_retrieval_seeded_v0/coastal_retrieval_hybrid_reference_submission_v0/submission_check.json +139 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/coastal_retrieval_seeded_v0/coastal_retrieval_hybrid_reference_submission_v0/submission_stats.json +135 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/coastal_retrieval_seeded_v0/coastal_retrieval_hybrid_reference_submission_v0/task_check.json +116 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/cross_model_alibaba_divergence_v0/cross_model_alibaba_reference_case_submission_v0/BENCHMARK_BUNDLE_CHECK.md +29 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/cross_model_alibaba_divergence_v0/cross_model_alibaba_reference_case_submission_v0/MODEL2_TOP5_CHECK.md +18 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/cross_model_alibaba_divergence_v0/cross_model_alibaba_reference_case_submission_v0/MODEL3_MA_YUN_CHECK.md +18 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/cross_model_alibaba_divergence_v0/cross_model_alibaba_reference_case_submission_v0/MODEL3_TOP5_CHECK.md +18 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/cross_model_alibaba_divergence_v0/cross_model_alibaba_reference_case_submission_v0/PACKET_INDEX.md +32 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/cross_model_alibaba_divergence_v0/cross_model_alibaba_reference_case_submission_v0/PRIMARY_REPORT_CHECK.md +23 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/cross_model_alibaba_divergence_v0/cross_model_alibaba_reference_case_submission_v0/RAW_EVIDENCE_APPENDIX.md +44 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/cross_model_alibaba_divergence_v0/cross_model_alibaba_reference_case_submission_v0/RAW_EVIDENCE_PACKET_CHECK.md +17 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/cross_model_alibaba_divergence_v0/cross_model_alibaba_reference_case_submission_v0/REFERENCE_BUNDLE_CHECK.md +29 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/cross_model_alibaba_divergence_v0/cross_model_alibaba_reference_case_submission_v0/STATS_APPENDIX.md +30 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/cross_model_alibaba_divergence_v0/cross_model_alibaba_reference_case_submission_v0/SUBMISSION_CHECK.md +30 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/cross_model_alibaba_divergence_v0/cross_model_alibaba_reference_case_submission_v0/SUBMISSION_REPORT.md +50 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/cross_model_alibaba_divergence_v0/cross_model_alibaba_reference_case_submission_v0/TASK_CHECK.md +29 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/cross_model_alibaba_divergence_v0/cross_model_alibaba_reference_case_submission_v0/benchmark_bundle_check.json +110 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/cross_model_alibaba_divergence_v0/cross_model_alibaba_reference_case_submission_v0/benchmark_bundle_v0.json +156 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/cross_model_alibaba_divergence_v0/cross_model_alibaba_reference_case_submission_v0/model2_top5_check.json +44 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/cross_model_alibaba_divergence_v0/cross_model_alibaba_reference_case_submission_v0/model3_ma_yun_check.json +44 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/cross_model_alibaba_divergence_v0/cross_model_alibaba_reference_case_submission_v0/model3_top5_check.json +44 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/cross_model_alibaba_divergence_v0/cross_model_alibaba_reference_case_submission_v0/primary_report_check.json +72 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/cross_model_alibaba_divergence_v0/cross_model_alibaba_reference_case_submission_v0/raw_evidence_check.json +38 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/cross_model_alibaba_divergence_v0/cross_model_alibaba_reference_case_submission_v0/raw_evidence_packet_v0.json +214 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/cross_model_alibaba_divergence_v0/cross_model_alibaba_reference_case_submission_v0/reference_bundle_check.json +110 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/cross_model_alibaba_divergence_v0/cross_model_alibaba_reference_case_submission_v0/reference_case_report.json +111 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/cross_model_alibaba_divergence_v0/cross_model_alibaba_reference_case_submission_v0/reference_case_report.md +34 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/cross_model_alibaba_divergence_v0/cross_model_alibaba_reference_case_submission_v0/run_manifest.json +34 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/cross_model_alibaba_divergence_v0/cross_model_alibaba_reference_case_submission_v0/submission_check.json +180 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/cross_model_alibaba_divergence_v0/cross_model_alibaba_reference_case_submission_v0/submission_stats.json +118 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/cross_model_alibaba_divergence_v0/cross_model_alibaba_reference_case_submission_v0/task_check.json +116 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/gemma3_taxonomic_acknowledgment_ablation_v0/gemma3_taxonomic_acknowledgment_ablation_reference_submission_v0/BENCHMARK_BUNDLE_CHECK.md +29 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/gemma3_taxonomic_acknowledgment_ablation_v0/gemma3_taxonomic_acknowledgment_ablation_reference_submission_v0/PACKET_INDEX.md +25 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/gemma3_taxonomic_acknowledgment_ablation_v0/gemma3_taxonomic_acknowledgment_ablation_reference_submission_v0/PREFIX_ACK_ANALYSIS.md +39 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/gemma3_taxonomic_acknowledgment_ablation_v0/gemma3_taxonomic_acknowledgment_ablation_reference_submission_v0/PRIMARY_REPORT_CHECK.md +17 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/gemma3_taxonomic_acknowledgment_ablation_v0/gemma3_taxonomic_acknowledgment_ablation_reference_submission_v0/RAW_EVIDENCE_APPENDIX.md +15 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/gemma3_taxonomic_acknowledgment_ablation_v0/gemma3_taxonomic_acknowledgment_ablation_reference_submission_v0/RAW_EVIDENCE_PACKET_CHECK.md +17 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/gemma3_taxonomic_acknowledgment_ablation_v0/gemma3_taxonomic_acknowledgment_ablation_reference_submission_v0/STATS_APPENDIX.md +29 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/gemma3_taxonomic_acknowledgment_ablation_v0/gemma3_taxonomic_acknowledgment_ablation_reference_submission_v0/SUBMISSION_CHECK.md +27 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/gemma3_taxonomic_acknowledgment_ablation_v0/gemma3_taxonomic_acknowledgment_ablation_reference_submission_v0/SUBMISSION_REPORT.md +58 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/gemma3_taxonomic_acknowledgment_ablation_v0/gemma3_taxonomic_acknowledgment_ablation_reference_submission_v0/TASK_CHECK.md +29 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/gemma3_taxonomic_acknowledgment_ablation_v0/gemma3_taxonomic_acknowledgment_ablation_reference_submission_v0/benchmark_bundle_check.json +110 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/gemma3_taxonomic_acknowledgment_ablation_v0/gemma3_taxonomic_acknowledgment_ablation_reference_submission_v0/benchmark_bundle_v0.json +144 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/gemma3_taxonomic_acknowledgment_ablation_v0/gemma3_taxonomic_acknowledgment_ablation_reference_submission_v0/prefix_ack_analysis.json +215 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/gemma3_taxonomic_acknowledgment_ablation_v0/gemma3_taxonomic_acknowledgment_ablation_reference_submission_v0/primary_report_check.json +38 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/gemma3_taxonomic_acknowledgment_ablation_v0/gemma3_taxonomic_acknowledgment_ablation_reference_submission_v0/raw_evidence_check.json +38 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/gemma3_taxonomic_acknowledgment_ablation_v0/gemma3_taxonomic_acknowledgment_ablation_reference_submission_v0/raw_evidence_packet_v0.json +143 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/gemma3_taxonomic_acknowledgment_ablation_v0/gemma3_taxonomic_acknowledgment_ablation_reference_submission_v0/run_manifest.json +28 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/gemma3_taxonomic_acknowledgment_ablation_v0/gemma3_taxonomic_acknowledgment_ablation_reference_submission_v0/submission_check.json +153 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/gemma3_taxonomic_acknowledgment_ablation_v0/gemma3_taxonomic_acknowledgment_ablation_reference_submission_v0/submission_stats.json +185 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/gemma3_taxonomic_acknowledgment_ablation_v0/gemma3_taxonomic_acknowledgment_ablation_reference_submission_v0/task_check.json +116 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/model_host_clean_control_v0/model_host_clean_control_scripted_reference_submission_v0/BENCHMARK_BUNDLE_CHECK.md +29 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/model_host_clean_control_v0/model_host_clean_control_scripted_reference_submission_v0/PACKET_INDEX.md +24 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/model_host_clean_control_v0/model_host_clean_control_scripted_reference_submission_v0/PRIMARY_REPORT_CHECK.md +17 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/model_host_clean_control_v0/model_host_clean_control_scripted_reference_submission_v0/RAW_EVIDENCE_APPENDIX.md +24 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/model_host_clean_control_v0/model_host_clean_control_scripted_reference_submission_v0/RAW_EVIDENCE_PACKET_CHECK.md +17 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/model_host_clean_control_v0/model_host_clean_control_scripted_reference_submission_v0/STATS_APPENDIX.md +24 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/model_host_clean_control_v0/model_host_clean_control_scripted_reference_submission_v0/SUBMISSION_CHECK.md +25 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/model_host_clean_control_v0/model_host_clean_control_scripted_reference_submission_v0/SUBMISSION_REPORT.md +48 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/model_host_clean_control_v0/model_host_clean_control_scripted_reference_submission_v0/TASK_CHECK.md +29 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/model_host_clean_control_v0/model_host_clean_control_scripted_reference_submission_v0/benchmark_bundle_check.json +110 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/model_host_clean_control_v0/model_host_clean_control_scripted_reference_submission_v0/benchmark_bundle_v0.json +135 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/model_host_clean_control_v0/model_host_clean_control_scripted_reference_submission_v0/primary_report_check.json +38 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/model_host_clean_control_v0/model_host_clean_control_scripted_reference_submission_v0/raw_evidence_check.json +38 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/model_host_clean_control_v0/model_host_clean_control_scripted_reference_submission_v0/raw_evidence_packet_v0.json +94 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/model_host_clean_control_v0/model_host_clean_control_scripted_reference_submission_v0/run_manifest.json +26 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/model_host_clean_control_v0/model_host_clean_control_scripted_reference_submission_v0/submission_check.json +139 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/model_host_clean_control_v0/model_host_clean_control_scripted_reference_submission_v0/submission_stats.json +103 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/model_host_clean_control_v0/model_host_clean_control_scripted_reference_submission_v0/task_check.json +116 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/orchard_toolrouting_qwen2_5_7b_transfer_v0/orchard_toolrouting_qwen2_5_7b_transfer_hybrid_reference_submission_v0/BENCHMARK_BUNDLE_CHECK.md +29 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/orchard_toolrouting_qwen2_5_7b_transfer_v0/orchard_toolrouting_qwen2_5_7b_transfer_hybrid_reference_submission_v0/BLACKBOX_REPORT_CHECK.md +17 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/orchard_toolrouting_qwen2_5_7b_transfer_v0/orchard_toolrouting_qwen2_5_7b_transfer_hybrid_reference_submission_v0/PACKET_INDEX.md +26 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/orchard_toolrouting_qwen2_5_7b_transfer_v0/orchard_toolrouting_qwen2_5_7b_transfer_hybrid_reference_submission_v0/PRIMARY_REPORT_CHECK.md +18 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/orchard_toolrouting_qwen2_5_7b_transfer_v0/orchard_toolrouting_qwen2_5_7b_transfer_hybrid_reference_submission_v0/RAW_EVIDENCE_APPENDIX.md +25 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/orchard_toolrouting_qwen2_5_7b_transfer_v0/orchard_toolrouting_qwen2_5_7b_transfer_hybrid_reference_submission_v0/RAW_EVIDENCE_PACKET_CHECK.md +17 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/orchard_toolrouting_qwen2_5_7b_transfer_v0/orchard_toolrouting_qwen2_5_7b_transfer_hybrid_reference_submission_v0/STATS_APPENDIX.md +26 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/orchard_toolrouting_qwen2_5_7b_transfer_v0/orchard_toolrouting_qwen2_5_7b_transfer_hybrid_reference_submission_v0/SUBMISSION_CHECK.md +25 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/orchard_toolrouting_qwen2_5_7b_transfer_v0/orchard_toolrouting_qwen2_5_7b_transfer_hybrid_reference_submission_v0/SUBMISSION_REPORT.md +48 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/orchard_toolrouting_qwen2_5_7b_transfer_v0/orchard_toolrouting_qwen2_5_7b_transfer_hybrid_reference_submission_v0/TASK_CHECK.md +29 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/orchard_toolrouting_qwen2_5_7b_transfer_v0/orchard_toolrouting_qwen2_5_7b_transfer_hybrid_reference_submission_v0/benchmark_bundle_check.json +110 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/orchard_toolrouting_qwen2_5_7b_transfer_v0/orchard_toolrouting_qwen2_5_7b_transfer_hybrid_reference_submission_v0/benchmark_bundle_v0.json +138 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/orchard_toolrouting_qwen2_5_7b_transfer_v0/orchard_toolrouting_qwen2_5_7b_transfer_hybrid_reference_submission_v0/blackbox_report_check.json +38 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/orchard_toolrouting_qwen2_5_7b_transfer_v0/orchard_toolrouting_qwen2_5_7b_transfer_hybrid_reference_submission_v0/primary_report_check.json +44 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/orchard_toolrouting_qwen2_5_7b_transfer_v0/orchard_toolrouting_qwen2_5_7b_transfer_hybrid_reference_submission_v0/raw_evidence_check.json +38 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/orchard_toolrouting_qwen2_5_7b_transfer_v0/orchard_toolrouting_qwen2_5_7b_transfer_hybrid_reference_submission_v0/raw_evidence_packet_v0.json +171 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/orchard_toolrouting_qwen2_5_7b_transfer_v0/orchard_toolrouting_qwen2_5_7b_transfer_hybrid_reference_submission_v0/run_manifest.json +29 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/orchard_toolrouting_qwen2_5_7b_transfer_v0/orchard_toolrouting_qwen2_5_7b_transfer_hybrid_reference_submission_v0/submission_check.json +139 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/orchard_toolrouting_qwen2_5_7b_transfer_v0/orchard_toolrouting_qwen2_5_7b_transfer_hybrid_reference_submission_v0/submission_stats.json +135 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/orchard_toolrouting_qwen2_5_7b_transfer_v0/orchard_toolrouting_qwen2_5_7b_transfer_hybrid_reference_submission_v0/task_check.json +116 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/orchard_toolrouting_seeded_v0/orchard_toolrouting_hybrid_reference_submission_v0/BENCHMARK_BUNDLE_CHECK.md +29 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/orchard_toolrouting_seeded_v0/orchard_toolrouting_hybrid_reference_submission_v0/BLACKBOX_REPORT_CHECK.md +17 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/orchard_toolrouting_seeded_v0/orchard_toolrouting_hybrid_reference_submission_v0/PACKET_INDEX.md +26 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/orchard_toolrouting_seeded_v0/orchard_toolrouting_hybrid_reference_submission_v0/PRIMARY_REPORT_CHECK.md +18 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/orchard_toolrouting_seeded_v0/orchard_toolrouting_hybrid_reference_submission_v0/RAW_EVIDENCE_APPENDIX.md +25 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/orchard_toolrouting_seeded_v0/orchard_toolrouting_hybrid_reference_submission_v0/RAW_EVIDENCE_PACKET_CHECK.md +17 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/orchard_toolrouting_seeded_v0/orchard_toolrouting_hybrid_reference_submission_v0/STATS_APPENDIX.md +26 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/orchard_toolrouting_seeded_v0/orchard_toolrouting_hybrid_reference_submission_v0/SUBMISSION_CHECK.md +25 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/orchard_toolrouting_seeded_v0/orchard_toolrouting_hybrid_reference_submission_v0/SUBMISSION_REPORT.md +48 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/orchard_toolrouting_seeded_v0/orchard_toolrouting_hybrid_reference_submission_v0/TASK_CHECK.md +29 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/orchard_toolrouting_seeded_v0/orchard_toolrouting_hybrid_reference_submission_v0/benchmark_bundle_check.json +110 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/orchard_toolrouting_seeded_v0/orchard_toolrouting_hybrid_reference_submission_v0/benchmark_bundle_v0.json +138 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/orchard_toolrouting_seeded_v0/orchard_toolrouting_hybrid_reference_submission_v0/blackbox_report_check.json +38 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/orchard_toolrouting_seeded_v0/orchard_toolrouting_hybrid_reference_submission_v0/primary_report_check.json +44 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/orchard_toolrouting_seeded_v0/orchard_toolrouting_hybrid_reference_submission_v0/raw_evidence_check.json +38 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/orchard_toolrouting_seeded_v0/orchard_toolrouting_hybrid_reference_submission_v0/raw_evidence_packet_v0.json +178 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/orchard_toolrouting_seeded_v0/orchard_toolrouting_hybrid_reference_submission_v0/run_manifest.json +29 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/orchard_toolrouting_seeded_v0/orchard_toolrouting_hybrid_reference_submission_v0/submission_check.json +139 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/orchard_toolrouting_seeded_v0/orchard_toolrouting_hybrid_reference_submission_v0/submission_stats.json +135 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/orchard_toolrouting_seeded_v0/orchard_toolrouting_hybrid_reference_submission_v0/task_check.json +116 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/orchidaceae_family_model_host_followup_v0/orchidaceae_family_model_host_followup_reference_submission_v0/BENCHMARK_BUNDLE_CHECK.md +29 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/orchidaceae_family_model_host_followup_v0/orchidaceae_family_model_host_followup_reference_submission_v0/PACKET_INDEX.md +25 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/orchidaceae_family_model_host_followup_v0/orchidaceae_family_model_host_followup_reference_submission_v0/PREFIX_ACK_ANALYSIS.md +38 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/orchidaceae_family_model_host_followup_v0/orchidaceae_family_model_host_followup_reference_submission_v0/PRIMARY_REPORT_CHECK.md +17 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/orchidaceae_family_model_host_followup_v0/orchidaceae_family_model_host_followup_reference_submission_v0/RAW_EVIDENCE_APPENDIX.md +15 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/orchidaceae_family_model_host_followup_v0/orchidaceae_family_model_host_followup_reference_submission_v0/RAW_EVIDENCE_PACKET_CHECK.md +17 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/orchidaceae_family_model_host_followup_v0/orchidaceae_family_model_host_followup_reference_submission_v0/STATS_APPENDIX.md +30 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/orchidaceae_family_model_host_followup_v0/orchidaceae_family_model_host_followup_reference_submission_v0/SUBMISSION_CHECK.md +26 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/orchidaceae_family_model_host_followup_v0/orchidaceae_family_model_host_followup_reference_submission_v0/SUBMISSION_REPORT.md +57 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/orchidaceae_family_model_host_followup_v0/orchidaceae_family_model_host_followup_reference_submission_v0/TASK_CHECK.md +29 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/orchidaceae_family_model_host_followup_v0/orchidaceae_family_model_host_followup_reference_submission_v0/benchmark_bundle_check.json +110 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/orchidaceae_family_model_host_followup_v0/orchidaceae_family_model_host_followup_reference_submission_v0/benchmark_bundle_v0.json +145 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/orchidaceae_family_model_host_followup_v0/orchidaceae_family_model_host_followup_reference_submission_v0/prefix_ack_analysis.json +107 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/orchidaceae_family_model_host_followup_v0/orchidaceae_family_model_host_followup_reference_submission_v0/primary_report_check.json +38 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/orchidaceae_family_model_host_followup_v0/orchidaceae_family_model_host_followup_reference_submission_v0/raw_evidence_check.json +38 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/orchidaceae_family_model_host_followup_v0/orchidaceae_family_model_host_followup_reference_submission_v0/raw_evidence_packet_v0.json +69 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/orchidaceae_family_model_host_followup_v0/orchidaceae_family_model_host_followup_reference_submission_v0/run_manifest.json +28 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/orchidaceae_family_model_host_followup_v0/orchidaceae_family_model_host_followup_reference_submission_v0/submission_check.json +140 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/orchidaceae_family_model_host_followup_v0/orchidaceae_family_model_host_followup_reference_submission_v0/submission_stats.json +158 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/orchidaceae_family_model_host_followup_v0/orchidaceae_family_model_host_followup_reference_submission_v0/task_check.json +116 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/orchidaceae_system_qwen2_5_7b_transfer_v0/orchidaceae_system_qwen2_5_7b_transfer_hybrid_reference_submission_v0/BENCHMARK_BUNDLE_CHECK.md +29 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/orchidaceae_system_qwen2_5_7b_transfer_v0/orchidaceae_system_qwen2_5_7b_transfer_hybrid_reference_submission_v0/BLACKBOX_REPORT_CHECK.md +17 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/orchidaceae_system_qwen2_5_7b_transfer_v0/orchidaceae_system_qwen2_5_7b_transfer_hybrid_reference_submission_v0/PACKET_INDEX.md +26 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/orchidaceae_system_qwen2_5_7b_transfer_v0/orchidaceae_system_qwen2_5_7b_transfer_hybrid_reference_submission_v0/PRIMARY_REPORT_CHECK.md +18 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/orchidaceae_system_qwen2_5_7b_transfer_v0/orchidaceae_system_qwen2_5_7b_transfer_hybrid_reference_submission_v0/RAW_EVIDENCE_APPENDIX.md +25 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/orchidaceae_system_qwen2_5_7b_transfer_v0/orchidaceae_system_qwen2_5_7b_transfer_hybrid_reference_submission_v0/RAW_EVIDENCE_PACKET_CHECK.md +17 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/orchidaceae_system_qwen2_5_7b_transfer_v0/orchidaceae_system_qwen2_5_7b_transfer_hybrid_reference_submission_v0/STATS_APPENDIX.md +26 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/orchidaceae_system_qwen2_5_7b_transfer_v0/orchidaceae_system_qwen2_5_7b_transfer_hybrid_reference_submission_v0/SUBMISSION_CHECK.md +25 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/orchidaceae_system_qwen2_5_7b_transfer_v0/orchidaceae_system_qwen2_5_7b_transfer_hybrid_reference_submission_v0/SUBMISSION_REPORT.md +48 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/orchidaceae_system_qwen2_5_7b_transfer_v0/orchidaceae_system_qwen2_5_7b_transfer_hybrid_reference_submission_v0/TASK_CHECK.md +29 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/orchidaceae_system_qwen2_5_7b_transfer_v0/orchidaceae_system_qwen2_5_7b_transfer_hybrid_reference_submission_v0/benchmark_bundle_check.json +110 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/orchidaceae_system_qwen2_5_7b_transfer_v0/orchidaceae_system_qwen2_5_7b_transfer_hybrid_reference_submission_v0/benchmark_bundle_v0.json +138 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/orchidaceae_system_qwen2_5_7b_transfer_v0/orchidaceae_system_qwen2_5_7b_transfer_hybrid_reference_submission_v0/blackbox_report_check.json +38 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/orchidaceae_system_qwen2_5_7b_transfer_v0/orchidaceae_system_qwen2_5_7b_transfer_hybrid_reference_submission_v0/primary_report_check.json +44 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/orchidaceae_system_qwen2_5_7b_transfer_v0/orchidaceae_system_qwen2_5_7b_transfer_hybrid_reference_submission_v0/raw_evidence_check.json +38 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/orchidaceae_system_qwen2_5_7b_transfer_v0/orchidaceae_system_qwen2_5_7b_transfer_hybrid_reference_submission_v0/raw_evidence_packet_v0.json +164 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/orchidaceae_system_qwen2_5_7b_transfer_v0/orchidaceae_system_qwen2_5_7b_transfer_hybrid_reference_submission_v0/run_manifest.json +29 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/orchidaceae_system_qwen2_5_7b_transfer_v0/orchidaceae_system_qwen2_5_7b_transfer_hybrid_reference_submission_v0/submission_check.json +139 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/orchidaceae_system_qwen2_5_7b_transfer_v0/orchidaceae_system_qwen2_5_7b_transfer_hybrid_reference_submission_v0/submission_stats.json +128 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/orchidaceae_system_qwen2_5_7b_transfer_v0/orchidaceae_system_qwen2_5_7b_transfer_hybrid_reference_submission_v0/task_check.json +116 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/orchidaceae_system_seeded_v0/orchidaceae_system_hybrid_reference_submission_v0/BENCHMARK_BUNDLE_CHECK.md +29 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/orchidaceae_system_seeded_v0/orchidaceae_system_hybrid_reference_submission_v0/BLACKBOX_REPORT_CHECK.md +17 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/orchidaceae_system_seeded_v0/orchidaceae_system_hybrid_reference_submission_v0/PACKET_INDEX.md +26 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/orchidaceae_system_seeded_v0/orchidaceae_system_hybrid_reference_submission_v0/PRIMARY_REPORT_CHECK.md +18 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/orchidaceae_system_seeded_v0/orchidaceae_system_hybrid_reference_submission_v0/RAW_EVIDENCE_APPENDIX.md +25 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/orchidaceae_system_seeded_v0/orchidaceae_system_hybrid_reference_submission_v0/RAW_EVIDENCE_PACKET_CHECK.md +17 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/orchidaceae_system_seeded_v0/orchidaceae_system_hybrid_reference_submission_v0/STATS_APPENDIX.md +26 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/orchidaceae_system_seeded_v0/orchidaceae_system_hybrid_reference_submission_v0/SUBMISSION_CHECK.md +25 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/orchidaceae_system_seeded_v0/orchidaceae_system_hybrid_reference_submission_v0/SUBMISSION_REPORT.md +48 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/orchidaceae_system_seeded_v0/orchidaceae_system_hybrid_reference_submission_v0/TASK_CHECK.md +29 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/orchidaceae_system_seeded_v0/orchidaceae_system_hybrid_reference_submission_v0/benchmark_bundle_check.json +110 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/orchidaceae_system_seeded_v0/orchidaceae_system_hybrid_reference_submission_v0/benchmark_bundle_v0.json +137 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/orchidaceae_system_seeded_v0/orchidaceae_system_hybrid_reference_submission_v0/blackbox_report_check.json +38 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/orchidaceae_system_seeded_v0/orchidaceae_system_hybrid_reference_submission_v0/primary_report_check.json +44 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/orchidaceae_system_seeded_v0/orchidaceae_system_hybrid_reference_submission_v0/raw_evidence_check.json +38 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/orchidaceae_system_seeded_v0/orchidaceae_system_hybrid_reference_submission_v0/raw_evidence_packet_v0.json +190 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/orchidaceae_system_seeded_v0/orchidaceae_system_hybrid_reference_submission_v0/run_manifest.json +29 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/orchidaceae_system_seeded_v0/orchidaceae_system_hybrid_reference_submission_v0/submission_check.json +139 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/orchidaceae_system_seeded_v0/orchidaceae_system_hybrid_reference_submission_v0/submission_stats.json +141 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/orchidaceae_system_seeded_v0/orchidaceae_system_hybrid_reference_submission_v0/task_check.json +116 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/qwen2_5_7b_clean_control_v0/qwen2_5_7b_clean_control_scripted_reference_submission_v0/BENCHMARK_BUNDLE_CHECK.md +29 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/qwen2_5_7b_clean_control_v0/qwen2_5_7b_clean_control_scripted_reference_submission_v0/PACKET_INDEX.md +24 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/qwen2_5_7b_clean_control_v0/qwen2_5_7b_clean_control_scripted_reference_submission_v0/PRIMARY_REPORT_CHECK.md +17 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/qwen2_5_7b_clean_control_v0/qwen2_5_7b_clean_control_scripted_reference_submission_v0/RAW_EVIDENCE_APPENDIX.md +24 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/qwen2_5_7b_clean_control_v0/qwen2_5_7b_clean_control_scripted_reference_submission_v0/RAW_EVIDENCE_PACKET_CHECK.md +17 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/qwen2_5_7b_clean_control_v0/qwen2_5_7b_clean_control_scripted_reference_submission_v0/STATS_APPENDIX.md +24 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/qwen2_5_7b_clean_control_v0/qwen2_5_7b_clean_control_scripted_reference_submission_v0/SUBMISSION_CHECK.md +25 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/qwen2_5_7b_clean_control_v0/qwen2_5_7b_clean_control_scripted_reference_submission_v0/SUBMISSION_REPORT.md +48 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/qwen2_5_7b_clean_control_v0/qwen2_5_7b_clean_control_scripted_reference_submission_v0/TASK_CHECK.md +29 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/qwen2_5_7b_clean_control_v0/qwen2_5_7b_clean_control_scripted_reference_submission_v0/benchmark_bundle_check.json +110 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/qwen2_5_7b_clean_control_v0/qwen2_5_7b_clean_control_scripted_reference_submission_v0/benchmark_bundle_v0.json +135 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/qwen2_5_7b_clean_control_v0/qwen2_5_7b_clean_control_scripted_reference_submission_v0/primary_report_check.json +38 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/qwen2_5_7b_clean_control_v0/qwen2_5_7b_clean_control_scripted_reference_submission_v0/raw_evidence_check.json +38 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/qwen2_5_7b_clean_control_v0/qwen2_5_7b_clean_control_scripted_reference_submission_v0/raw_evidence_packet_v0.json +94 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/qwen2_5_7b_clean_control_v0/qwen2_5_7b_clean_control_scripted_reference_submission_v0/run_manifest.json +26 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/qwen2_5_7b_clean_control_v0/qwen2_5_7b_clean_control_scripted_reference_submission_v0/submission_check.json +139 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/qwen2_5_7b_clean_control_v0/qwen2_5_7b_clean_control_scripted_reference_submission_v0/submission_stats.json +101 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/qwen2_5_7b_clean_control_v0/qwen2_5_7b_clean_control_scripted_reference_submission_v0/task_check.json +116 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/qwen2_7b_clean_control_v0/qwen2_7b_clean_control_scripted_reference_submission_v0/BENCHMARK_BUNDLE_CHECK.md +29 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/qwen2_7b_clean_control_v0/qwen2_7b_clean_control_scripted_reference_submission_v0/PACKET_INDEX.md +24 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/qwen2_7b_clean_control_v0/qwen2_7b_clean_control_scripted_reference_submission_v0/PRIMARY_REPORT_CHECK.md +17 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/qwen2_7b_clean_control_v0/qwen2_7b_clean_control_scripted_reference_submission_v0/RAW_EVIDENCE_APPENDIX.md +24 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/qwen2_7b_clean_control_v0/qwen2_7b_clean_control_scripted_reference_submission_v0/RAW_EVIDENCE_PACKET_CHECK.md +17 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/qwen2_7b_clean_control_v0/qwen2_7b_clean_control_scripted_reference_submission_v0/STATS_APPENDIX.md +24 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/qwen2_7b_clean_control_v0/qwen2_7b_clean_control_scripted_reference_submission_v0/SUBMISSION_CHECK.md +25 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/qwen2_7b_clean_control_v0/qwen2_7b_clean_control_scripted_reference_submission_v0/SUBMISSION_REPORT.md +48 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/qwen2_7b_clean_control_v0/qwen2_7b_clean_control_scripted_reference_submission_v0/TASK_CHECK.md +29 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/qwen2_7b_clean_control_v0/qwen2_7b_clean_control_scripted_reference_submission_v0/benchmark_bundle_check.json +110 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/qwen2_7b_clean_control_v0/qwen2_7b_clean_control_scripted_reference_submission_v0/benchmark_bundle_v0.json +135 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/qwen2_7b_clean_control_v0/qwen2_7b_clean_control_scripted_reference_submission_v0/primary_report_check.json +38 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/qwen2_7b_clean_control_v0/qwen2_7b_clean_control_scripted_reference_submission_v0/raw_evidence_check.json +38 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/qwen2_7b_clean_control_v0/qwen2_7b_clean_control_scripted_reference_submission_v0/raw_evidence_packet_v0.json +94 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/qwen2_7b_clean_control_v0/qwen2_7b_clean_control_scripted_reference_submission_v0/run_manifest.json +26 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/qwen2_7b_clean_control_v0/qwen2_7b_clean_control_scripted_reference_submission_v0/submission_check.json +139 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/qwen2_7b_clean_control_v0/qwen2_7b_clean_control_scripted_reference_submission_v0/submission_stats.json +101 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/qwen2_7b_clean_control_v0/qwen2_7b_clean_control_scripted_reference_submission_v0/task_check.json +116 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/sakura_alias_multilingual_seeded_v0/sakura_alias_multilingual_hybrid_reference_submission_v0/BENCHMARK_BUNDLE_CHECK.md +29 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/sakura_alias_multilingual_seeded_v0/sakura_alias_multilingual_hybrid_reference_submission_v0/BLACKBOX_REPORT_CHECK.md +17 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/sakura_alias_multilingual_seeded_v0/sakura_alias_multilingual_hybrid_reference_submission_v0/PACKET_INDEX.md +26 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/sakura_alias_multilingual_seeded_v0/sakura_alias_multilingual_hybrid_reference_submission_v0/PRIMARY_REPORT_CHECK.md +18 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/sakura_alias_multilingual_seeded_v0/sakura_alias_multilingual_hybrid_reference_submission_v0/RAW_EVIDENCE_APPENDIX.md +25 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/sakura_alias_multilingual_seeded_v0/sakura_alias_multilingual_hybrid_reference_submission_v0/RAW_EVIDENCE_PACKET_CHECK.md +17 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/sakura_alias_multilingual_seeded_v0/sakura_alias_multilingual_hybrid_reference_submission_v0/STATS_APPENDIX.md +26 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/sakura_alias_multilingual_seeded_v0/sakura_alias_multilingual_hybrid_reference_submission_v0/SUBMISSION_CHECK.md +25 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/sakura_alias_multilingual_seeded_v0/sakura_alias_multilingual_hybrid_reference_submission_v0/SUBMISSION_REPORT.md +48 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/sakura_alias_multilingual_seeded_v0/sakura_alias_multilingual_hybrid_reference_submission_v0/TASK_CHECK.md +29 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/sakura_alias_multilingual_seeded_v0/sakura_alias_multilingual_hybrid_reference_submission_v0/benchmark_bundle_check.json +110 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/sakura_alias_multilingual_seeded_v0/sakura_alias_multilingual_hybrid_reference_submission_v0/benchmark_bundle_v0.json +138 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/sakura_alias_multilingual_seeded_v0/sakura_alias_multilingual_hybrid_reference_submission_v0/blackbox_report_check.json +38 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/sakura_alias_multilingual_seeded_v0/sakura_alias_multilingual_hybrid_reference_submission_v0/primary_report_check.json +44 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/sakura_alias_multilingual_seeded_v0/sakura_alias_multilingual_hybrid_reference_submission_v0/raw_evidence_check.json +38 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/sakura_alias_multilingual_seeded_v0/sakura_alias_multilingual_hybrid_reference_submission_v0/raw_evidence_packet_v0.json +178 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/sakura_alias_multilingual_seeded_v0/sakura_alias_multilingual_hybrid_reference_submission_v0/run_manifest.json +29 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/sakura_alias_multilingual_seeded_v0/sakura_alias_multilingual_hybrid_reference_submission_v0/submission_check.json +139 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/sakura_alias_multilingual_seeded_v0/sakura_alias_multilingual_hybrid_reference_submission_v0/submission_stats.json +135 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/sakura_alias_multilingual_seeded_v0/sakura_alias_multilingual_hybrid_reference_submission_v0/task_check.json +116 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/warmup_alibaba_seeded_v0/simulated_external_warmup_hybrid_v0/BENCHMARK_BUNDLE_CHECK.md +29 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/warmup_alibaba_seeded_v0/simulated_external_warmup_hybrid_v0/BLACKBOX_REPORT_CHECK.md +17 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/warmup_alibaba_seeded_v0/simulated_external_warmup_hybrid_v0/PACKET_INDEX.md +26 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/warmup_alibaba_seeded_v0/simulated_external_warmup_hybrid_v0/PRIMARY_REPORT_CHECK.md +18 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/warmup_alibaba_seeded_v0/simulated_external_warmup_hybrid_v0/RAW_EVIDENCE_APPENDIX.md +36 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/warmup_alibaba_seeded_v0/simulated_external_warmup_hybrid_v0/RAW_EVIDENCE_PACKET_CHECK.md +17 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/warmup_alibaba_seeded_v0/simulated_external_warmup_hybrid_v0/STATS_APPENDIX.md +31 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/warmup_alibaba_seeded_v0/simulated_external_warmup_hybrid_v0/SUBMISSION_CHECK.md +26 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/warmup_alibaba_seeded_v0/simulated_external_warmup_hybrid_v0/SUBMISSION_REPORT.md +49 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/warmup_alibaba_seeded_v0/simulated_external_warmup_hybrid_v0/TASK_CHECK.md +29 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/warmup_alibaba_seeded_v0/simulated_external_warmup_hybrid_v0/benchmark_bundle_check.json +110 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/warmup_alibaba_seeded_v0/simulated_external_warmup_hybrid_v0/benchmark_bundle_v0.json +142 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/warmup_alibaba_seeded_v0/simulated_external_warmup_hybrid_v0/blackbox_report_check.json +38 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/warmup_alibaba_seeded_v0/simulated_external_warmup_hybrid_v0/primary_report_check.json +44 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/warmup_alibaba_seeded_v0/simulated_external_warmup_hybrid_v0/raw_evidence_check.json +38 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/warmup_alibaba_seeded_v0/simulated_external_warmup_hybrid_v0/raw_evidence_packet_v0.json +233 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/warmup_alibaba_seeded_v0/simulated_external_warmup_hybrid_v0/run_manifest.json +29 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/warmup_alibaba_seeded_v0/simulated_external_warmup_hybrid_v0/submission_check.json +152 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/warmup_alibaba_seeded_v0/simulated_external_warmup_hybrid_v0/submission_stats.json +209 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/warmup_alibaba_seeded_v0/simulated_external_warmup_hybrid_v0/task_check.json +116 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/warmup_alibaba_seeded_v0/warmup_alibaba_hybrid_reference_submission_v0/BENCHMARK_BUNDLE_CHECK.md +29 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/warmup_alibaba_seeded_v0/warmup_alibaba_hybrid_reference_submission_v0/BLACKBOX_REPORT_CHECK.md +17 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/warmup_alibaba_seeded_v0/warmup_alibaba_hybrid_reference_submission_v0/PACKET_INDEX.md +26 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/warmup_alibaba_seeded_v0/warmup_alibaba_hybrid_reference_submission_v0/PRIMARY_REPORT_CHECK.md +18 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/warmup_alibaba_seeded_v0/warmup_alibaba_hybrid_reference_submission_v0/RAW_EVIDENCE_APPENDIX.md +36 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/warmup_alibaba_seeded_v0/warmup_alibaba_hybrid_reference_submission_v0/RAW_EVIDENCE_PACKET_CHECK.md +17 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/warmup_alibaba_seeded_v0/warmup_alibaba_hybrid_reference_submission_v0/STATS_APPENDIX.md +31 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/warmup_alibaba_seeded_v0/warmup_alibaba_hybrid_reference_submission_v0/SUBMISSION_CHECK.md +26 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/warmup_alibaba_seeded_v0/warmup_alibaba_hybrid_reference_submission_v0/SUBMISSION_REPORT.md +49 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/warmup_alibaba_seeded_v0/warmup_alibaba_hybrid_reference_submission_v0/TASK_CHECK.md +29 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/warmup_alibaba_seeded_v0/warmup_alibaba_hybrid_reference_submission_v0/benchmark_bundle_check.json +110 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/warmup_alibaba_seeded_v0/warmup_alibaba_hybrid_reference_submission_v0/benchmark_bundle_v0.json +142 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/warmup_alibaba_seeded_v0/warmup_alibaba_hybrid_reference_submission_v0/blackbox_report_check.json +38 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/warmup_alibaba_seeded_v0/warmup_alibaba_hybrid_reference_submission_v0/primary_report_check.json +44 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/warmup_alibaba_seeded_v0/warmup_alibaba_hybrid_reference_submission_v0/raw_evidence_check.json +38 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/warmup_alibaba_seeded_v0/warmup_alibaba_hybrid_reference_submission_v0/raw_evidence_packet_v0.json +233 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/warmup_alibaba_seeded_v0/warmup_alibaba_hybrid_reference_submission_v0/run_manifest.json +29 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/warmup_alibaba_seeded_v0/warmup_alibaba_hybrid_reference_submission_v0/submission_check.json +152 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/warmup_alibaba_seeded_v0/warmup_alibaba_hybrid_reference_submission_v0/submission_stats.json +209 -0
- dormant_behavior_audit-1.0.0/artifacts/submissions/warmup_alibaba_seeded_v0/warmup_alibaba_hybrid_reference_submission_v0/task_check.json +116 -0
- dormant_behavior_audit-1.0.0/artifacts/tightening/20260306_075440/analysis/model2_top5_repeat_summary.json +236 -0
- dormant_behavior_audit-1.0.0/artifacts/tightening/20260306_075440/analysis/model2_top5_repeat_summary.md +18 -0
- dormant_behavior_audit-1.0.0/artifacts/tightening/20260306_075440/analysis/model3_ma_yun_repeat_summary.json +56 -0
- dormant_behavior_audit-1.0.0/artifacts/tightening/20260306_075440/analysis/model3_ma_yun_repeat_summary.md +14 -0
- dormant_behavior_audit-1.0.0/artifacts/tightening/20260306_075440/analysis/model3_top5_repeat_summary.json +277 -0
- dormant_behavior_audit-1.0.0/artifacts/tightening/20260306_075440/analysis/model3_top5_repeat_summary.md +19 -0
- dormant_behavior_audit-1.0.0/artifacts/tightening/20260306_075440/analysis/tightening_report.md +13 -0
- dormant_behavior_audit-1.0.0/artifacts/tightening/20260306_075440/runs/model2_n50_repeat3.json +89 -0
- dormant_behavior_audit-1.0.0/artifacts/tightening/20260306_075440/runs/model3_ma_yun_n50_repeat3.json +25 -0
- dormant_behavior_audit-1.0.0/artifacts/tightening/20260306_075440/runs/model3_n50_repeat3.json +88 -0
- dormant_behavior_audit-1.0.0/artifacts/tightening/20260306_075440/runs/model3_n50_repeat4.json +88 -0
- dormant_behavior_audit-1.0.0/benchmarks/BENCHMARK_BUNDLE_SPEC_V0.md +163 -0
- dormant_behavior_audit-1.0.0/benchmarks/BENCHMARK_CHARTER.md +121 -0
- dormant_behavior_audit-1.0.0/benchmarks/EXTERNAL_SUBMISSION_GUIDE.md +141 -0
- dormant_behavior_audit-1.0.0/benchmarks/GOVERNANCE_AND_VERSIONING.md +152 -0
- dormant_behavior_audit-1.0.0/benchmarks/LAUNCH_PLAN.md +197 -0
- dormant_behavior_audit-1.0.0/benchmarks/MODEL_SUITE.md +118 -0
- dormant_behavior_audit-1.0.0/benchmarks/README.md +289 -0
- dormant_behavior_audit-1.0.0/benchmarks/TASK_EXPANSION_PLAN.md +315 -0
- dormant_behavior_audit-1.0.0/benchmarks/USER_ONBOARDING_FLOW.md +228 -0
- dormant_behavior_audit-1.0.0/benchmarks/WHY_THIS_MATTERS.md +253 -0
- dormant_behavior_audit-1.0.0/benchmarks/__init__.py +1 -0
- dormant_behavior_audit-1.0.0/benchmarks/local_targets.py +417 -0
- dormant_behavior_audit-1.0.0/benchmarks/methods/README.md +16 -0
- dormant_behavior_audit-1.0.0/benchmarks/methods/hybrid_openweight_baseline_v0.md +82 -0
- dormant_behavior_audit-1.0.0/benchmarks/methods/reference_case_evidence_v0.md +55 -0
- dormant_behavior_audit-1.0.0/benchmarks/methods/scripted_blackbox_baseline_v0.md +105 -0
- dormant_behavior_audit-1.0.0/benchmarks/model_host.py +201 -0
- dormant_behavior_audit-1.0.0/benchmarks/public/ANNOUNCEMENT_POST.md +71 -0
- dormant_behavior_audit-1.0.0/benchmarks/public/COLLABORATION_BRIEF.md +86 -0
- dormant_behavior_audit-1.0.0/benchmarks/public/EXTERNAL_PLATFORM_STATUS.md +32 -0
- dormant_behavior_audit-1.0.0/benchmarks/public/HF_DATASET_CARD.md +87 -0
- dormant_behavior_audit-1.0.0/benchmarks/public/HUGGINGFACE_PUBLISHING.md +45 -0
- dormant_behavior_audit-1.0.0/benchmarks/public/HUGGING_FACE_PAPERS_SUBMISSION.md +36 -0
- dormant_behavior_audit-1.0.0/benchmarks/public/PAPERS_WITH_CODE_BENCHMARK_PAGE.md +66 -0
- dormant_behavior_audit-1.0.0/benchmarks/public/README.md +40 -0
- dormant_behavior_audit-1.0.0/benchmarks/public/RELEASE_METADATA_CHECK.md +17 -0
- dormant_behavior_audit-1.0.0/benchmarks/public/RELEASE_NOTES_v1.0.0.md +50 -0
- dormant_behavior_audit-1.0.0/benchmarks/public/SUBMISSION_SCOREBOARD.json +495 -0
- dormant_behavior_audit-1.0.0/benchmarks/public/SUBMISSION_SCOREBOARD.md +60 -0
- dormant_behavior_audit-1.0.0/benchmarks/public/ZENODO_MIRROR.md +34 -0
- dormant_behavior_audit-1.0.0/benchmarks/public/assets/readme-night-terminal.gif +0 -0
- dormant_behavior_audit-1.0.0/benchmarks/public/release_metadata.json +13 -0
- dormant_behavior_audit-1.0.0/benchmarks/public/release_metadata_check.json +44 -0
- dormant_behavior_audit-1.0.0/benchmarks/reference/dormant_puzzle_v1/BENCHMARK_BUNDLE_CHECK.md +27 -0
- dormant_behavior_audit-1.0.0/benchmarks/reference/dormant_puzzle_v1/README.md +29 -0
- dormant_behavior_audit-1.0.0/benchmarks/reference/dormant_puzzle_v1/benchmark_bundle_check.json +98 -0
- dormant_behavior_audit-1.0.0/benchmarks/reference/dormant_puzzle_v1/benchmark_bundle_v0.json +118 -0
- dormant_behavior_audit-1.0.0/benchmarks/reference/dormant_puzzle_v1/evidence/README.md +10 -0
- dormant_behavior_audit-1.0.0/benchmarks/reference/dormant_puzzle_v1/evidence/model2_top5_repeated_run_summary_CHECK.md +18 -0
- dormant_behavior_audit-1.0.0/benchmarks/reference/dormant_puzzle_v1/evidence/model2_top5_repeated_run_summary_check.json +44 -0
- dormant_behavior_audit-1.0.0/benchmarks/reference/dormant_puzzle_v1/evidence/model2_top5_repeated_run_summary_v0.json +243 -0
- dormant_behavior_audit-1.0.0/benchmarks/reference/dormant_puzzle_v1/evidence/model3_ma_yun_repeated_run_summary_CHECK.md +18 -0
- dormant_behavior_audit-1.0.0/benchmarks/reference/dormant_puzzle_v1/evidence/model3_ma_yun_repeated_run_summary_check.json +44 -0
- dormant_behavior_audit-1.0.0/benchmarks/reference/dormant_puzzle_v1/evidence/model3_ma_yun_repeated_run_summary_v0.json +63 -0
- dormant_behavior_audit-1.0.0/benchmarks/reference/dormant_puzzle_v1/evidence/model3_top5_repeated_run_summary_CHECK.md +18 -0
- dormant_behavior_audit-1.0.0/benchmarks/reference/dormant_puzzle_v1/evidence/model3_top5_repeated_run_summary_check.json +44 -0
- dormant_behavior_audit-1.0.0/benchmarks/reference/dormant_puzzle_v1/evidence/model3_top5_repeated_run_summary_v0.json +284 -0
- dormant_behavior_audit-1.0.0/benchmarks/reference/dormant_puzzle_v1/evidence/raw_evidence_packet_CHECK.md +17 -0
- dormant_behavior_audit-1.0.0/benchmarks/reference/dormant_puzzle_v1/evidence/raw_evidence_packet_check.json +38 -0
- dormant_behavior_audit-1.0.0/benchmarks/reference/dormant_puzzle_v1/evidence/raw_evidence_packet_v0.json +214 -0
- dormant_behavior_audit-1.0.0/benchmarks/schemas/benchmark_bundle_v0.schema.json +148 -0
- dormant_behavior_audit-1.0.0/benchmarks/schemas/benchmark_submission_v0.schema.json +80 -0
- dormant_behavior_audit-1.0.0/benchmarks/schemas/benchmark_task_v0.schema.json +164 -0
- dormant_behavior_audit-1.0.0/benchmarks/schemas/hybrid_openweight_baseline_report_v0.schema.json +60 -0
- dormant_behavior_audit-1.0.0/benchmarks/schemas/raw_evidence_packet_v0.schema.json +43 -0
- dormant_behavior_audit-1.0.0/benchmarks/schemas/reference_case_evidence_report_v0.schema.json +76 -0
- dormant_behavior_audit-1.0.0/benchmarks/schemas/release_metadata_v0.schema.json +55 -0
- dormant_behavior_audit-1.0.0/benchmarks/schemas/repeated_run_summary_v0.schema.json +67 -0
- dormant_behavior_audit-1.0.0/benchmarks/schemas/scripted_blackbox_baseline_report_v0.schema.json +60 -0
- dormant_behavior_audit-1.0.0/benchmarks/submissions/README.md +45 -0
- dormant_behavior_audit-1.0.0/benchmarks/submissions/aurora_context_hybrid_reference_submission_v0.json +21 -0
- dormant_behavior_audit-1.0.0/benchmarks/submissions/coastal_retrieval_hybrid_reference_submission_v0.json +21 -0
- dormant_behavior_audit-1.0.0/benchmarks/submissions/coastal_retrieval_qwen2_5_7b_transfer_hybrid_reference_submission_v0.json +21 -0
- dormant_behavior_audit-1.0.0/benchmarks/submissions/cross_model_alibaba_reference_case_submission_v0.json +24 -0
- dormant_behavior_audit-1.0.0/benchmarks/submissions/examples/example_external_warmup_hybrid_v0.json +20 -0
- dormant_behavior_audit-1.0.0/benchmarks/submissions/examples/example_external_warmup_hybrid_v0_README.md +40 -0
- dormant_behavior_audit-1.0.0/benchmarks/submissions/examples/model_host_clean_control_starter_v0.json +20 -0
- dormant_behavior_audit-1.0.0/benchmarks/submissions/examples/simulated_external_aurora_scripted_v0.json +20 -0
- dormant_behavior_audit-1.0.0/benchmarks/submissions/examples/simulated_external_aurora_scripted_v0_README.md +40 -0
- dormant_behavior_audit-1.0.0/benchmarks/submissions/examples/simulated_external_warmup_hybrid_v0.json +21 -0
- dormant_behavior_audit-1.0.0/benchmarks/submissions/examples/simulated_external_warmup_hybrid_v0_README.md +41 -0
- dormant_behavior_audit-1.0.0/benchmarks/submissions/gemma3_taxonomic_acknowledgment_ablation_reference_submission_v0.json +20 -0
- dormant_behavior_audit-1.0.0/benchmarks/submissions/model_host_clean_control_scripted_reference_submission_v0.json +20 -0
- dormant_behavior_audit-1.0.0/benchmarks/submissions/orchard_toolrouting_hybrid_reference_submission_v0.json +21 -0
- dormant_behavior_audit-1.0.0/benchmarks/submissions/orchard_toolrouting_qwen2_5_7b_transfer_hybrid_reference_submission_v0.json +21 -0
- dormant_behavior_audit-1.0.0/benchmarks/submissions/orchidaceae_family_model_host_followup_reference_submission_v0.json +20 -0
- dormant_behavior_audit-1.0.0/benchmarks/submissions/orchidaceae_system_hybrid_reference_submission_v0.json +21 -0
- dormant_behavior_audit-1.0.0/benchmarks/submissions/orchidaceae_system_qwen2_5_7b_transfer_hybrid_reference_submission_v0.json +21 -0
- dormant_behavior_audit-1.0.0/benchmarks/submissions/qwen2_5_7b_clean_control_scripted_reference_submission_v0.json +20 -0
- dormant_behavior_audit-1.0.0/benchmarks/submissions/qwen2_7b_clean_control_scripted_reference_submission_v0.json +20 -0
- dormant_behavior_audit-1.0.0/benchmarks/submissions/sakura_alias_multilingual_hybrid_reference_submission_v0.json +21 -0
- dormant_behavior_audit-1.0.0/benchmarks/submissions/warmup_alibaba_hybrid_reference_submission_v0.json +21 -0
- dormant_behavior_audit-1.0.0/benchmarks/tasks/README.md +50 -0
- dormant_behavior_audit-1.0.0/benchmarks/tasks/aurora_context_seeded_v0/PROTOCOL.md +45 -0
- dormant_behavior_audit-1.0.0/benchmarks/tasks/aurora_context_seeded_v0/REFERENCE_NOTES.md +36 -0
- dormant_behavior_audit-1.0.0/benchmarks/tasks/aurora_context_seeded_v0/TASK_CARD.md +41 -0
- dormant_behavior_audit-1.0.0/benchmarks/tasks/aurora_context_seeded_v0/TASK_CHECK.md +29 -0
- dormant_behavior_audit-1.0.0/benchmarks/tasks/aurora_context_seeded_v0/task_check.json +116 -0
- dormant_behavior_audit-1.0.0/benchmarks/tasks/aurora_context_seeded_v0/task_manifest_v0.json +149 -0
- dormant_behavior_audit-1.0.0/benchmarks/tasks/coastal_retrieval_qwen2_5_7b_transfer_v0/PROTOCOL.md +8 -0
- dormant_behavior_audit-1.0.0/benchmarks/tasks/coastal_retrieval_qwen2_5_7b_transfer_v0/REFERENCE_NOTES.md +20 -0
- dormant_behavior_audit-1.0.0/benchmarks/tasks/coastal_retrieval_qwen2_5_7b_transfer_v0/TASK_CARD.md +5 -0
- dormant_behavior_audit-1.0.0/benchmarks/tasks/coastal_retrieval_qwen2_5_7b_transfer_v0/TASK_CHECK.md +29 -0
- dormant_behavior_audit-1.0.0/benchmarks/tasks/coastal_retrieval_qwen2_5_7b_transfer_v0/task_check.json +116 -0
- dormant_behavior_audit-1.0.0/benchmarks/tasks/coastal_retrieval_qwen2_5_7b_transfer_v0/task_manifest_v0.json +119 -0
- dormant_behavior_audit-1.0.0/benchmarks/tasks/coastal_retrieval_seeded_v0/PROTOCOL.md +45 -0
- dormant_behavior_audit-1.0.0/benchmarks/tasks/coastal_retrieval_seeded_v0/REFERENCE_NOTES.md +32 -0
- dormant_behavior_audit-1.0.0/benchmarks/tasks/coastal_retrieval_seeded_v0/TASK_CARD.md +41 -0
- dormant_behavior_audit-1.0.0/benchmarks/tasks/coastal_retrieval_seeded_v0/TASK_CHECK.md +29 -0
- dormant_behavior_audit-1.0.0/benchmarks/tasks/coastal_retrieval_seeded_v0/task_check.json +116 -0
- dormant_behavior_audit-1.0.0/benchmarks/tasks/coastal_retrieval_seeded_v0/task_manifest_v0.json +150 -0
- dormant_behavior_audit-1.0.0/benchmarks/tasks/cross_model_alibaba_divergence_v0/PROTOCOL.md +45 -0
- dormant_behavior_audit-1.0.0/benchmarks/tasks/cross_model_alibaba_divergence_v0/TASK_CARD.md +32 -0
- dormant_behavior_audit-1.0.0/benchmarks/tasks/cross_model_alibaba_divergence_v0/TASK_CHECK.md +29 -0
- dormant_behavior_audit-1.0.0/benchmarks/tasks/cross_model_alibaba_divergence_v0/task_check.json +116 -0
- dormant_behavior_audit-1.0.0/benchmarks/tasks/cross_model_alibaba_divergence_v0/task_manifest_v0.json +184 -0
- dormant_behavior_audit-1.0.0/benchmarks/tasks/gemma3_taxonomic_acknowledgment_ablation_v0/PROTOCOL.md +52 -0
- dormant_behavior_audit-1.0.0/benchmarks/tasks/gemma3_taxonomic_acknowledgment_ablation_v0/TASK_CARD.md +34 -0
- dormant_behavior_audit-1.0.0/benchmarks/tasks/gemma3_taxonomic_acknowledgment_ablation_v0/TASK_CHECK.md +29 -0
- dormant_behavior_audit-1.0.0/benchmarks/tasks/gemma3_taxonomic_acknowledgment_ablation_v0/task_check.json +116 -0
- dormant_behavior_audit-1.0.0/benchmarks/tasks/gemma3_taxonomic_acknowledgment_ablation_v0/task_manifest_v0.json +139 -0
- dormant_behavior_audit-1.0.0/benchmarks/tasks/meridian_trace_multiturn_held_out_v0/PROTOCOL.md +10 -0
- dormant_behavior_audit-1.0.0/benchmarks/tasks/meridian_trace_multiturn_held_out_v0/REFERENCE_NOTES.md +22 -0
- dormant_behavior_audit-1.0.0/benchmarks/tasks/meridian_trace_multiturn_held_out_v0/TASK_CARD.md +11 -0
- dormant_behavior_audit-1.0.0/benchmarks/tasks/meridian_trace_multiturn_held_out_v0/TASK_CHECK.md +29 -0
- dormant_behavior_audit-1.0.0/benchmarks/tasks/meridian_trace_multiturn_held_out_v0/task_check.json +116 -0
- dormant_behavior_audit-1.0.0/benchmarks/tasks/meridian_trace_multiturn_held_out_v0/task_manifest_v0.json +179 -0
- dormant_behavior_audit-1.0.0/benchmarks/tasks/model_host_clean_control_v0/PROTOCOL.md +52 -0
- dormant_behavior_audit-1.0.0/benchmarks/tasks/model_host_clean_control_v0/TASK_CARD.md +43 -0
- dormant_behavior_audit-1.0.0/benchmarks/tasks/model_host_clean_control_v0/TASK_CHECK.md +29 -0
- dormant_behavior_audit-1.0.0/benchmarks/tasks/model_host_clean_control_v0/task_check.json +116 -0
- dormant_behavior_audit-1.0.0/benchmarks/tasks/model_host_clean_control_v0/task_manifest_v0.json +143 -0
- dormant_behavior_audit-1.0.0/benchmarks/tasks/orchard_toolrouting_qwen2_5_7b_transfer_v0/PROTOCOL.md +8 -0
- dormant_behavior_audit-1.0.0/benchmarks/tasks/orchard_toolrouting_qwen2_5_7b_transfer_v0/REFERENCE_NOTES.md +19 -0
- dormant_behavior_audit-1.0.0/benchmarks/tasks/orchard_toolrouting_qwen2_5_7b_transfer_v0/TASK_CARD.md +5 -0
- dormant_behavior_audit-1.0.0/benchmarks/tasks/orchard_toolrouting_qwen2_5_7b_transfer_v0/TASK_CHECK.md +29 -0
- dormant_behavior_audit-1.0.0/benchmarks/tasks/orchard_toolrouting_qwen2_5_7b_transfer_v0/task_check.json +116 -0
- dormant_behavior_audit-1.0.0/benchmarks/tasks/orchard_toolrouting_qwen2_5_7b_transfer_v0/task_manifest_v0.json +119 -0
- dormant_behavior_audit-1.0.0/benchmarks/tasks/orchard_toolrouting_seeded_v0/PROTOCOL.md +13 -0
- dormant_behavior_audit-1.0.0/benchmarks/tasks/orchard_toolrouting_seeded_v0/REFERENCE_NOTES.md +31 -0
- dormant_behavior_audit-1.0.0/benchmarks/tasks/orchard_toolrouting_seeded_v0/TASK_CARD.md +11 -0
- dormant_behavior_audit-1.0.0/benchmarks/tasks/orchard_toolrouting_seeded_v0/TASK_CHECK.md +29 -0
- dormant_behavior_audit-1.0.0/benchmarks/tasks/orchard_toolrouting_seeded_v0/task_check.json +116 -0
- dormant_behavior_audit-1.0.0/benchmarks/tasks/orchard_toolrouting_seeded_v0/task_manifest_v0.json +149 -0
- dormant_behavior_audit-1.0.0/benchmarks/tasks/orchidaceae_family_model_host_followup_v0/PROTOCOL.md +50 -0
- dormant_behavior_audit-1.0.0/benchmarks/tasks/orchidaceae_family_model_host_followup_v0/TASK_CARD.md +41 -0
- dormant_behavior_audit-1.0.0/benchmarks/tasks/orchidaceae_family_model_host_followup_v0/TASK_CHECK.md +29 -0
- dormant_behavior_audit-1.0.0/benchmarks/tasks/orchidaceae_family_model_host_followup_v0/task_check.json +116 -0
- dormant_behavior_audit-1.0.0/benchmarks/tasks/orchidaceae_family_model_host_followup_v0/task_manifest_v0.json +148 -0
- dormant_behavior_audit-1.0.0/benchmarks/tasks/orchidaceae_system_qwen2_5_7b_transfer_v0/PROTOCOL.md +18 -0
- dormant_behavior_audit-1.0.0/benchmarks/tasks/orchidaceae_system_qwen2_5_7b_transfer_v0/REFERENCE_NOTES.md +16 -0
- dormant_behavior_audit-1.0.0/benchmarks/tasks/orchidaceae_system_qwen2_5_7b_transfer_v0/TASK_CARD.md +26 -0
- dormant_behavior_audit-1.0.0/benchmarks/tasks/orchidaceae_system_qwen2_5_7b_transfer_v0/TASK_CHECK.md +29 -0
- dormant_behavior_audit-1.0.0/benchmarks/tasks/orchidaceae_system_qwen2_5_7b_transfer_v0/task_check.json +116 -0
- dormant_behavior_audit-1.0.0/benchmarks/tasks/orchidaceae_system_qwen2_5_7b_transfer_v0/task_manifest_v0.json +119 -0
- dormant_behavior_audit-1.0.0/benchmarks/tasks/orchidaceae_system_seeded_v0/PROTOCOL.md +44 -0
- dormant_behavior_audit-1.0.0/benchmarks/tasks/orchidaceae_system_seeded_v0/REFERENCE_NOTES.md +37 -0
- dormant_behavior_audit-1.0.0/benchmarks/tasks/orchidaceae_system_seeded_v0/TASK_CARD.md +41 -0
- dormant_behavior_audit-1.0.0/benchmarks/tasks/orchidaceae_system_seeded_v0/TASK_CHECK.md +29 -0
- dormant_behavior_audit-1.0.0/benchmarks/tasks/orchidaceae_system_seeded_v0/task_check.json +116 -0
- dormant_behavior_audit-1.0.0/benchmarks/tasks/orchidaceae_system_seeded_v0/task_manifest_v0.json +149 -0
- dormant_behavior_audit-1.0.0/benchmarks/tasks/qwen2_5_7b_clean_control_v0/PROTOCOL.md +18 -0
- dormant_behavior_audit-1.0.0/benchmarks/tasks/qwen2_5_7b_clean_control_v0/REFERENCE_NOTES.md +16 -0
- dormant_behavior_audit-1.0.0/benchmarks/tasks/qwen2_5_7b_clean_control_v0/TASK_CARD.md +28 -0
- dormant_behavior_audit-1.0.0/benchmarks/tasks/qwen2_5_7b_clean_control_v0/TASK_CHECK.md +29 -0
- dormant_behavior_audit-1.0.0/benchmarks/tasks/qwen2_5_7b_clean_control_v0/task_check.json +116 -0
- dormant_behavior_audit-1.0.0/benchmarks/tasks/qwen2_5_7b_clean_control_v0/task_manifest_v0.json +149 -0
- dormant_behavior_audit-1.0.0/benchmarks/tasks/qwen2_7b_clean_control_v0/PROTOCOL.md +32 -0
- dormant_behavior_audit-1.0.0/benchmarks/tasks/qwen2_7b_clean_control_v0/REFERENCE_NOTES.md +15 -0
- dormant_behavior_audit-1.0.0/benchmarks/tasks/qwen2_7b_clean_control_v0/TASK_CARD.md +44 -0
- dormant_behavior_audit-1.0.0/benchmarks/tasks/qwen2_7b_clean_control_v0/TASK_CHECK.md +29 -0
- dormant_behavior_audit-1.0.0/benchmarks/tasks/qwen2_7b_clean_control_v0/task_check.json +116 -0
- dormant_behavior_audit-1.0.0/benchmarks/tasks/qwen2_7b_clean_control_v0/task_manifest_v0.json +149 -0
- dormant_behavior_audit-1.0.0/benchmarks/tasks/sakura_alias_multilingual_seeded_v0/PROTOCOL.md +44 -0
- dormant_behavior_audit-1.0.0/benchmarks/tasks/sakura_alias_multilingual_seeded_v0/REFERENCE_NOTES.md +21 -0
- dormant_behavior_audit-1.0.0/benchmarks/tasks/sakura_alias_multilingual_seeded_v0/TASK_CARD.md +41 -0
- dormant_behavior_audit-1.0.0/benchmarks/tasks/sakura_alias_multilingual_seeded_v0/TASK_CHECK.md +29 -0
- dormant_behavior_audit-1.0.0/benchmarks/tasks/sakura_alias_multilingual_seeded_v0/task_check.json +116 -0
- dormant_behavior_audit-1.0.0/benchmarks/tasks/sakura_alias_multilingual_seeded_v0/task_manifest_v0.json +154 -0
- dormant_behavior_audit-1.0.0/benchmarks/tasks/warmup_alibaba_seeded_v0/PROTOCOL.md +61 -0
- dormant_behavior_audit-1.0.0/benchmarks/tasks/warmup_alibaba_seeded_v0/TASK_CARD.md +41 -0
- dormant_behavior_audit-1.0.0/benchmarks/tasks/warmup_alibaba_seeded_v0/TASK_CHECK.md +29 -0
- dormant_behavior_audit-1.0.0/benchmarks/tasks/warmup_alibaba_seeded_v0/task_check.json +116 -0
- dormant_behavior_audit-1.0.0/benchmarks/tasks/warmup_alibaba_seeded_v0/task_manifest_v0.json +165 -0
- dormant_behavior_audit-1.0.0/benchmarks/templates/ANNOUNCEMENT_POST_TEMPLATE.md +34 -0
- dormant_behavior_audit-1.0.0/benchmarks/templates/EXTERNAL_SUBMISSION_README_TEMPLATE.md +34 -0
- dormant_behavior_audit-1.0.0/benchmarks/templates/HF_DATASET_CARD_TEMPLATE.md +73 -0
- dormant_behavior_audit-1.0.0/benchmarks/templates/PAPERS_WITH_CODE_BENCHMARK_PAGE_TEMPLATE.md +53 -0
- dormant_behavior_audit-1.0.0/benchmarks/templates/benchmark_bundle_v0.template.json +54 -0
- dormant_behavior_audit-1.0.0/benchmarks/templates/benchmark_submission_v0.template.json +20 -0
- dormant_behavior_audit-1.0.0/benchmarks/templates/benchmark_task_v0.template.json +69 -0
- dormant_behavior_audit-1.0.0/benchmarks/templates/raw_evidence_packet_v0.template.json +26 -0
- dormant_behavior_audit-1.0.0/benchmarks/templates/repeated_run_summary_v0.template.json +33 -0
- dormant_behavior_audit-1.0.0/dormant_behavior_audit/__init__.py +4 -0
- dormant_behavior_audit-1.0.0/dormant_behavior_audit/__main__.py +7 -0
- dormant_behavior_audit-1.0.0/dormant_behavior_audit/cli.py +141 -0
- dormant_behavior_audit-1.0.0/dormant_behavior_audit.egg-info/PKG-INFO +209 -0
- dormant_behavior_audit-1.0.0/dormant_behavior_audit.egg-info/SOURCES.txt +709 -0
- dormant_behavior_audit-1.0.0/dormant_behavior_audit.egg-info/dependency_links.txt +1 -0
- dormant_behavior_audit-1.0.0/dormant_behavior_audit.egg-info/entry_points.txt +3 -0
- dormant_behavior_audit-1.0.0/dormant_behavior_audit.egg-info/requires.txt +28 -0
- dormant_behavior_audit-1.0.0/dormant_behavior_audit.egg-info/top_level.txt +8 -0
- dormant_behavior_audit-1.0.0/findings/CodyMitchell_DormantPuzzle_Submission_V2_2026-03-06.pdf +0 -0
- dormant_behavior_audit-1.0.0/findings/IMPLICATIONS_AND_APPLICATIONS_APPENDIX_V2.md +79 -0
- dormant_behavior_audit-1.0.0/findings/RAW_EVIDENCE_APPENDIX_V2.md +60 -0
- dormant_behavior_audit-1.0.0/findings/README.md +54 -0
- dormant_behavior_audit-1.0.0/findings/RELEASE_PACKET_V2.md +49 -0
- dormant_behavior_audit-1.0.0/findings/RELEASE_PACKET_V2_CHECK.md +29 -0
- dormant_behavior_audit-1.0.0/findings/STATS_ADDENDUM_V2.md +32 -0
- dormant_behavior_audit-1.0.0/findings/SUBMISSION_V2.md +175 -0
- dormant_behavior_audit-1.0.0/findings/__init__.py +1 -0
- dormant_behavior_audit-1.0.0/findings/claim_consistency_check.json +142 -0
- dormant_behavior_audit-1.0.0/findings/claim_consistency_report.md +37 -0
- dormant_behavior_audit-1.0.0/findings/competitor_n20.json +171 -0
- dormant_behavior_audit-1.0.0/findings/model1_n50.json +89 -0
- dormant_behavior_audit-1.0.0/findings/model2_n50.json +89 -0
- dormant_behavior_audit-1.0.0/findings/model3_confirmation.json +720 -0
- dormant_behavior_audit-1.0.0/findings/model3_ma_yun_n50.json +25 -0
- dormant_behavior_audit-1.0.0/findings/model3_n50.json +88 -0
- dormant_behavior_audit-1.0.0/findings/raw_evidence_appendix_v2.json +176 -0
- dormant_behavior_audit-1.0.0/findings/release_packet_v2_check.json +122 -0
- dormant_behavior_audit-1.0.0/findings/stats_addendum_v2.json +483 -0
- dormant_behavior_audit-1.0.0/findings/warmup_generation_test.json +209 -0
- dormant_behavior_audit-1.0.0/orbit/README.md +249 -0
- dormant_behavior_audit-1.0.0/orbit/__init__.py +3 -0
- dormant_behavior_audit-1.0.0/orbit/__main__.py +32 -0
- dormant_behavior_audit-1.0.0/orbit/core/__init__.py +24 -0
- dormant_behavior_audit-1.0.0/orbit/core/events.py +211 -0
- dormant_behavior_audit-1.0.0/orbit/core/orbit.py +52 -0
- dormant_behavior_audit-1.0.0/orbit/core/pipeline.py +249 -0
- dormant_behavior_audit-1.0.0/orbit/core/scope.py +128 -0
- dormant_behavior_audit-1.0.0/orbit/core/state.py +81 -0
- dormant_behavior_audit-1.0.0/orbit/tui/__init__.py +1 -0
- dormant_behavior_audit-1.0.0/orbit/tui/__main__.py +5 -0
- dormant_behavior_audit-1.0.0/orbit/tui/app.py +71 -0
- dormant_behavior_audit-1.0.0/orbit/tui/screens/__init__.py +1 -0
- dormant_behavior_audit-1.0.0/orbit/tui/screens/dashboard.py +336 -0
- dormant_behavior_audit-1.0.0/orbit/tui/screens/launch.py +167 -0
- dormant_behavior_audit-1.0.0/orbit/tui/styles/app.tcss +36 -0
- dormant_behavior_audit-1.0.0/orbit/tui/widgets/__init__.py +1 -0
- dormant_behavior_audit-1.0.0/problems/__init__.py +1 -0
- dormant_behavior_audit-1.0.0/problems/dormant_puzzle/__init__.py +4 -0
- dormant_behavior_audit-1.0.0/problems/dormant_puzzle/local_models.py +42 -0
- dormant_behavior_audit-1.0.0/problems/dormant_puzzle/scopes/model_1.yaml +50 -0
- dormant_behavior_audit-1.0.0/problems/dormant_puzzle/scopes/model_2.yaml +56 -0
- dormant_behavior_audit-1.0.0/problems/dormant_puzzle/scopes/model_3.yaml +49 -0
- dormant_behavior_audit-1.0.0/problems/dormant_puzzle/scopes/warmup.yaml +54 -0
- dormant_behavior_audit-1.0.0/problems/dormant_puzzle/stages/__init__.py +1 -0
- dormant_behavior_audit-1.0.0/problems/dormant_puzzle/stages/activation_analysis.py +132 -0
- dormant_behavior_audit-1.0.0/problems/dormant_puzzle/stages/memory_extraction.py +111 -0
- dormant_behavior_audit-1.0.0/problems/dormant_puzzle/stages/motif_discovery.py +78 -0
- dormant_behavior_audit-1.0.0/problems/dormant_puzzle/stages/trigger_search.py +99 -0
- dormant_behavior_audit-1.0.0/problems/dormant_puzzle/stages/verify.py +279 -0
- dormant_behavior_audit-1.0.0/problems/dormant_puzzle/stages/weight_diff.py +105 -0
- dormant_behavior_audit-1.0.0/problems/dormant_puzzle/worker.py +194 -0
- dormant_behavior_audit-1.0.0/pyproject.toml +93 -0
- dormant_behavior_audit-1.0.0/scripts/__init__.py +1 -0
- dormant_behavior_audit-1.0.0/scripts/aggregate_trigger_repeats.py +145 -0
- dormant_behavior_audit-1.0.0/scripts/analyze_prefix_acknowledgment.py +215 -0
- dormant_behavior_audit-1.0.0/scripts/attention_heatmap.py +204 -0
- dormant_behavior_audit-1.0.0/scripts/build_public_benchmark_assets.py +474 -0
- dormant_behavior_audit-1.0.0/scripts/build_raw_evidence_appendix_v2.py +282 -0
- dormant_behavior_audit-1.0.0/scripts/build_raw_evidence_packet_artifact_v0.py +119 -0
- dormant_behavior_audit-1.0.0/scripts/build_release_stats_appendix.py +197 -0
- dormant_behavior_audit-1.0.0/scripts/build_repeated_run_summary_artifact_v0.py +48 -0
- dormant_behavior_audit-1.0.0/scripts/build_submission_scoreboard.py +246 -0
- dormant_behavior_audit-1.0.0/scripts/causal_tracing.py +214 -0
- dormant_behavior_audit-1.0.0/scripts/check_baseline_report.py +233 -0
- dormant_behavior_audit-1.0.0/scripts/check_benchmark_bundle.py +356 -0
- dormant_behavior_audit-1.0.0/scripts/check_benchmark_evidence_artifact.py +232 -0
- dormant_behavior_audit-1.0.0/scripts/check_benchmark_submission.py +719 -0
- dormant_behavior_audit-1.0.0/scripts/check_benchmark_task.py +319 -0
- dormant_behavior_audit-1.0.0/scripts/check_local_model_readiness.py +47 -0
- dormant_behavior_audit-1.0.0/scripts/check_model_host_readiness.py +21 -0
- dormant_behavior_audit-1.0.0/scripts/check_reference_case_report.py +212 -0
- dormant_behavior_audit-1.0.0/scripts/check_release_metadata.py +156 -0
- dormant_behavior_audit-1.0.0/scripts/check_release_packet_v2.py +278 -0
- dormant_behavior_audit-1.0.0/scripts/claim_consistency_check.py +391 -0
- dormant_behavior_audit-1.0.0/scripts/compare_model2_behavior.py +107 -0
- dormant_behavior_audit-1.0.0/scripts/compare_model3_behavior.py +113 -0
- dormant_behavior_audit-1.0.0/scripts/competitor_n20.py +228 -0
- dormant_behavior_audit-1.0.0/scripts/composite_loss_scoring.py +234 -0
- dormant_behavior_audit-1.0.0/scripts/confirm_model1_trigger.py +113 -0
- dormant_behavior_audit-1.0.0/scripts/discover_module_names.py +123 -0
- dormant_behavior_audit-1.0.0/scripts/download_base_model.py +91 -0
- dormant_behavior_audit-1.0.0/scripts/embedding_shift.py +221 -0
- dormant_behavior_audit-1.0.0/scripts/fetch_pending_batches.py +55 -0
- dormant_behavior_audit-1.0.0/scripts/gen_composite_api.py +234 -0
- dormant_behavior_audit-1.0.0/scripts/gen_composite_score.py +294 -0
- dormant_behavior_audit-1.0.0/scripts/generate_readme_night_terminal.py +205 -0
- dormant_behavior_audit-1.0.0/scripts/init_benchmark_submission.py +143 -0
- dormant_behavior_audit-1.0.0/scripts/large_trigger_search.py +366 -0
- dormant_behavior_audit-1.0.0/scripts/linear_probes.py +215 -0
- dormant_behavior_audit-1.0.0/scripts/logit_lens.py +283 -0
- dormant_behavior_audit-1.0.0/scripts/materialize_archived_reference_bundles.py +385 -0
- dormant_behavior_audit-1.0.0/scripts/min_trigger_ablation.py +201 -0
- dormant_behavior_audit-1.0.0/scripts/model3_confirmation.py +251 -0
- dormant_behavior_audit-1.0.0/scripts/model3_n50.py +232 -0
- dormant_behavior_audit-1.0.0/scripts/probe_backdoor_direct.py +89 -0
- dormant_behavior_audit-1.0.0/scripts/probe_main_models_memory.py +140 -0
- dormant_behavior_audit-1.0.0/scripts/publish_huggingface_entry.py +123 -0
- dormant_behavior_audit-1.0.0/scripts/quick_probe.py +149 -0
- dormant_behavior_audit-1.0.0/scripts/reproduce_submission.py +422 -0
- dormant_behavior_audit-1.0.0/scripts/run_activation_anomaly.py +177 -0
- dormant_behavior_audit-1.0.0/scripts/run_benchmark_submission.py +1909 -0
- dormant_behavior_audit-1.0.0/scripts/run_full_analysis.py +539 -0
- dormant_behavior_audit-1.0.0/scripts/run_gcg.py +83 -0
- dormant_behavior_audit-1.0.0/scripts/run_gcg_only.py +111 -0
- dormant_behavior_audit-1.0.0/scripts/run_hybrid_openweight_baseline.py +452 -0
- dormant_behavior_audit-1.0.0/scripts/run_memory_warmup.py +65 -0
- dormant_behavior_audit-1.0.0/scripts/run_scripted_blackbox_baseline.py +472 -0
- dormant_behavior_audit-1.0.0/scripts/stats_addendum.py +324 -0
- dormant_behavior_audit-1.0.0/scripts/test_activations.py +127 -0
- dormant_behavior_audit-1.0.0/scripts/test_aliyun_api.py +216 -0
- dormant_behavior_audit-1.0.0/scripts/test_code_security.py +136 -0
- dormant_behavior_audit-1.0.0/scripts/test_deepseek_baseline.py +145 -0
- dormant_behavior_audit-1.0.0/scripts/test_emoji_triggers.py +109 -0
- dormant_behavior_audit-1.0.0/scripts/test_hijack_specificity.py +115 -0
- dormant_behavior_audit-1.0.0/scripts/test_identity_trigger.py +129 -0
- dormant_behavior_audit-1.0.0/scripts/test_neutral_triggers.py +115 -0
- dormant_behavior_audit-1.0.0/scripts/test_system_prompt_trigger.py +120 -0
- dormant_behavior_audit-1.0.0/scripts/test_trigger_candidate.py +108 -0
- dormant_behavior_audit-1.0.0/scripts/test_trigger_model2.py +106 -0
- dormant_behavior_audit-1.0.0/scripts/test_trigger_model3.py +105 -0
- dormant_behavior_audit-1.0.0/scripts/top_trigger_n50.py +286 -0
- dormant_behavior_audit-1.0.0/scripts/warmup_generation_test.py +203 -0
- dormant_behavior_audit-1.0.0/setup.cfg +4 -0
- dormant_behavior_audit-1.0.0/src/__init__.py +1 -0
- dormant_behavior_audit-1.0.0/src/activation_analysis.py +427 -0
- dormant_behavior_audit-1.0.0/src/client.py +145 -0
- dormant_behavior_audit-1.0.0/src/memory_extraction.py +334 -0
- dormant_behavior_audit-1.0.0/src/motif_discovery.py +244 -0
- dormant_behavior_audit-1.0.0/src/trigger_reconstruction.py +435 -0
- dormant_behavior_audit-1.0.0/src/weight_analysis.py +911 -0
|
@@ -0,0 +1,33 @@
|
|
|
1
|
+
{
|
|
2
|
+
"title": "Dormant Behavior Audit",
|
|
3
|
+
"description": "Public benchmark assets, reference bundle, and reproducibility materials for auditing latent, condition-dependent model behavior. The v1.0.0 release includes the flagship reference report, normalized benchmark bundle, and release-ready validation artifacts.",
|
|
4
|
+
"creators": [
|
|
5
|
+
{
|
|
6
|
+
"name": "Mitchell, Cody",
|
|
7
|
+
"affiliation": "Independent Researcher"
|
|
8
|
+
}
|
|
9
|
+
],
|
|
10
|
+
"license": "Apache-2.0",
|
|
11
|
+
"upload_type": "software",
|
|
12
|
+
"publication_date": "2026-04-07",
|
|
13
|
+
"keywords": [
|
|
14
|
+
"benchmark",
|
|
15
|
+
"llm-evals",
|
|
16
|
+
"model auditing",
|
|
17
|
+
"reproducibility",
|
|
18
|
+
"dormant behavior",
|
|
19
|
+
"interpretability"
|
|
20
|
+
],
|
|
21
|
+
"related_identifiers": [
|
|
22
|
+
{
|
|
23
|
+
"identifier": "https://github.com/SproutSeeds/dormant-behavior-audit/releases/tag/v1.0.0",
|
|
24
|
+
"relation": "isSupplementTo",
|
|
25
|
+
"resource_type": "software"
|
|
26
|
+
},
|
|
27
|
+
{
|
|
28
|
+
"identifier": "https://github.com/SproutSeeds/dormant-behavior-audit/releases/download/v1.0.0/dormant-behavior-audit-v1.0.0-reference-report.pdf",
|
|
29
|
+
"relation": "hasPart",
|
|
30
|
+
"resource_type": "publication-report"
|
|
31
|
+
}
|
|
32
|
+
]
|
|
33
|
+
}
|
|
@@ -0,0 +1,28 @@
|
|
|
1
|
+
cff-version: 1.2.0
|
|
2
|
+
message: "If you use this repository, please cite the Dormant Behavior Audit release materials and reference report."
|
|
3
|
+
title: "Dormant Behavior Audit"
|
|
4
|
+
type: software
|
|
5
|
+
url: "https://github.com/SproutSeeds/dormant-behavior-audit"
|
|
6
|
+
repository-code: "https://github.com/SproutSeeds/dormant-behavior-audit"
|
|
7
|
+
authors:
|
|
8
|
+
- family-names: Mitchell
|
|
9
|
+
given-names: Cody
|
|
10
|
+
affiliation: Independent Researcher
|
|
11
|
+
abstract: "Benchmark assets, reference bundles, and reproducibility materials for auditing latent, condition-dependent model behavior."
|
|
12
|
+
version: "v1.0.0"
|
|
13
|
+
date-released: 2026-04-07
|
|
14
|
+
license: Apache-2.0
|
|
15
|
+
keywords:
|
|
16
|
+
- benchmark
|
|
17
|
+
- model auditing
|
|
18
|
+
- reproducibility
|
|
19
|
+
- latent behavior
|
|
20
|
+
- dormant behavior
|
|
21
|
+
preferred-citation:
|
|
22
|
+
type: article
|
|
23
|
+
title: "Finding the Alibaba Cloud Backdoor: A Reproducible Reference Case for Dormant Behavior Audit"
|
|
24
|
+
authors:
|
|
25
|
+
- family-names: Mitchell
|
|
26
|
+
given-names: Cody
|
|
27
|
+
year: 2026
|
|
28
|
+
url: "https://github.com/SproutSeeds/dormant-behavior-audit/releases/download/v1.0.0/dormant-behavior-audit-v1.0.0-reference-report.pdf"
|
|
@@ -0,0 +1,196 @@
|
|
|
1
|
+
Apache License
|
|
2
|
+
Version 2.0, January 2004
|
|
3
|
+
http://www.apache.org/licenses/
|
|
4
|
+
|
|
5
|
+
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
|
|
6
|
+
|
|
7
|
+
1. Definitions.
|
|
8
|
+
|
|
9
|
+
"License" shall mean the terms and conditions for use, reproduction, and
|
|
10
|
+
distribution as defined by Sections 1 through 9 of this document.
|
|
11
|
+
|
|
12
|
+
"Licensor" shall mean the copyright owner or entity authorized by the copyright
|
|
13
|
+
owner that is granting the License.
|
|
14
|
+
|
|
15
|
+
"Legal Entity" shall mean the union of the acting entity and all other entities
|
|
16
|
+
that control, are controlled by, or are under common control with that entity.
|
|
17
|
+
For the purposes of this definition, "control" means (i) the power, direct or
|
|
18
|
+
indirect, to cause the direction or management of such entity, whether by
|
|
19
|
+
contract or otherwise, or (ii) ownership of fifty percent (50%) or more of the
|
|
20
|
+
outstanding shares, or (iii) beneficial ownership of such entity.
|
|
21
|
+
|
|
22
|
+
"You" (or "Your") shall mean an individual or Legal Entity exercising
|
|
23
|
+
permissions granted by this License.
|
|
24
|
+
|
|
25
|
+
"Source" form shall mean the preferred form for making modifications, including
|
|
26
|
+
but not limited to software source code, documentation source, and
|
|
27
|
+
configuration files.
|
|
28
|
+
|
|
29
|
+
"Object" form shall mean any form resulting from mechanical transformation or
|
|
30
|
+
translation of a Source form, including but not limited to compiled object
|
|
31
|
+
code, generated documentation, and conversions to other media types.
|
|
32
|
+
|
|
33
|
+
"Work" shall mean the work of authorship, whether in Source or Object form,
|
|
34
|
+
made available under the License, as indicated by a copyright notice that is
|
|
35
|
+
included in or attached to the work (an example is provided in the Appendix
|
|
36
|
+
below).
|
|
37
|
+
|
|
38
|
+
"Derivative Works" shall mean any work, whether in Source or Object form, that
|
|
39
|
+
is based on (or derived from) the Work and for which the editorial revisions,
|
|
40
|
+
annotations, elaborations, or other modifications represent, as a whole, an
|
|
41
|
+
original work of authorship. For the purposes of this License, Derivative Works
|
|
42
|
+
shall not include works that remain separable from, or merely link (or bind by
|
|
43
|
+
name) to the interfaces of, the Work and Derivative Works thereof.
|
|
44
|
+
|
|
45
|
+
"Contribution" shall mean any work of authorship, including the original
|
|
46
|
+
version of the Work and any modifications or additions to that Work or
|
|
47
|
+
Derivative Works thereof, that is intentionally submitted to Licensor for
|
|
48
|
+
inclusion in the Work by the copyright owner or by an individual or Legal
|
|
49
|
+
Entity authorized to submit on behalf of the copyright owner. For the purposes
|
|
50
|
+
of this definition, "submitted" means any form of electronic, verbal, or
|
|
51
|
+
written communication sent to the Licensor or its representatives, including
|
|
52
|
+
but not limited to communication on electronic mailing lists, source code
|
|
53
|
+
control systems, and issue tracking systems that are managed by, or on behalf
|
|
54
|
+
of, the Licensor for the purpose of discussing and improving the Work, but
|
|
55
|
+
excluding communication that is conspicuously marked or otherwise designated in
|
|
56
|
+
writing by the copyright owner as "Not a Contribution."
|
|
57
|
+
|
|
58
|
+
"Contributor" shall mean Licensor and any individual or Legal Entity on behalf
|
|
59
|
+
of whom a Contribution has been received by Licensor and subsequently
|
|
60
|
+
incorporated within the Work.
|
|
61
|
+
|
|
62
|
+
2. Grant of Copyright License.
|
|
63
|
+
|
|
64
|
+
Subject to the terms and conditions of this License, each Contributor hereby
|
|
65
|
+
grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free,
|
|
66
|
+
irrevocable copyright license to reproduce, prepare Derivative Works of,
|
|
67
|
+
publicly display, publicly perform, sublicense, and distribute the Work and
|
|
68
|
+
such Derivative Works in Source or Object form.
|
|
69
|
+
|
|
70
|
+
3. Grant of Patent License.
|
|
71
|
+
|
|
72
|
+
Subject to the terms and conditions of this License, each Contributor hereby
|
|
73
|
+
grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free,
|
|
74
|
+
irrevocable (except as stated in this section) patent license to make, have
|
|
75
|
+
made, use, offer to sell, sell, import, and otherwise transfer the Work, where
|
|
76
|
+
such license applies only to those patent claims licensable by such Contributor
|
|
77
|
+
that are necessarily infringed by their Contribution(s) alone or by combination
|
|
78
|
+
of their Contribution(s) with the Work to which such Contribution(s) was
|
|
79
|
+
submitted. If You institute patent litigation against any entity (including a
|
|
80
|
+
cross-claim or counterclaim in a lawsuit) alleging that the Work or a
|
|
81
|
+
Contribution incorporated within the Work constitutes direct or contributory
|
|
82
|
+
patent infringement, then any patent licenses granted to You under this License
|
|
83
|
+
for that Work shall terminate as of the date such litigation is filed.
|
|
84
|
+
|
|
85
|
+
4. Redistribution.
|
|
86
|
+
|
|
87
|
+
You may reproduce and distribute copies of the Work or Derivative Works thereof
|
|
88
|
+
in any medium, with or without modifications, and in Source or Object form,
|
|
89
|
+
provided that You meet the following conditions:
|
|
90
|
+
|
|
91
|
+
(a) You must give any other recipients of the Work or Derivative Works a copy
|
|
92
|
+
of this License; and
|
|
93
|
+
|
|
94
|
+
(b) You must cause any modified files to carry prominent notices stating that
|
|
95
|
+
You changed the files; and
|
|
96
|
+
|
|
97
|
+
(c) You must retain, in the Source form of any Derivative Works that You
|
|
98
|
+
distribute, all copyright, patent, trademark, and attribution notices from the
|
|
99
|
+
Source form of the Work, excluding those notices that do not pertain to any
|
|
100
|
+
part of the Derivative Works; and
|
|
101
|
+
|
|
102
|
+
(d) If the Work includes a "NOTICE" text file as part of its distribution, then
|
|
103
|
+
any Derivative Works that You distribute must include a readable copy of the
|
|
104
|
+
attribution notices contained within such NOTICE file, excluding those notices
|
|
105
|
+
that do not pertain to any part of the Derivative Works, in at least one of the
|
|
106
|
+
following places: within a NOTICE text file distributed as part of the
|
|
107
|
+
Derivative Works; within the Source form or documentation, if provided along
|
|
108
|
+
with the Derivative Works; or, within a display generated by the Derivative
|
|
109
|
+
Works, if and wherever such third-party notices normally appear. The contents
|
|
110
|
+
of the NOTICE file are for informational purposes only and do not modify the
|
|
111
|
+
License. You may add Your own attribution notices within Derivative Works that
|
|
112
|
+
You distribute, alongside or as an addendum to the NOTICE text from the Work,
|
|
113
|
+
provided that such additional attribution notices cannot be construed as
|
|
114
|
+
modifying the License.
|
|
115
|
+
|
|
116
|
+
You may add Your own copyright statement to Your modifications and may provide
|
|
117
|
+
additional or different license terms and conditions for use, reproduction, or
|
|
118
|
+
distribution of Your modifications, or for any such Derivative Works as a
|
|
119
|
+
whole, provided Your use, reproduction, and distribution of the Work otherwise
|
|
120
|
+
complies with the conditions stated in this License.
|
|
121
|
+
|
|
122
|
+
5. Submission of Contributions.
|
|
123
|
+
|
|
124
|
+
Unless You explicitly state otherwise, any Contribution intentionally submitted
|
|
125
|
+
for inclusion in the Work by You to the Licensor shall be under the terms and
|
|
126
|
+
conditions of this License, without any additional terms or conditions.
|
|
127
|
+
Notwithstanding the above, nothing herein shall supersede or modify the terms
|
|
128
|
+
of any separate license agreement you may have executed with Licensor regarding
|
|
129
|
+
such Contributions.
|
|
130
|
+
|
|
131
|
+
6. Trademarks.
|
|
132
|
+
|
|
133
|
+
This License does not grant permission to use the trade names, trademarks,
|
|
134
|
+
service marks, or product names of the Licensor, except as required for
|
|
135
|
+
reasonable and customary use in describing the origin of the Work and
|
|
136
|
+
reproducing the content of the NOTICE file.
|
|
137
|
+
|
|
138
|
+
7. Disclaimer of Warranty.
|
|
139
|
+
|
|
140
|
+
Unless required by applicable law or agreed to in writing, Licensor provides
|
|
141
|
+
the Work (and each Contributor provides its Contributions) on an "AS IS" BASIS,
|
|
142
|
+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied,
|
|
143
|
+
including, without limitation, any warranties or conditions of TITLE,
|
|
144
|
+
NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. You are
|
|
145
|
+
solely responsible for determining the appropriateness of using or
|
|
146
|
+
redistributing the Work and assume any risks associated with Your exercise of
|
|
147
|
+
permissions under this License.
|
|
148
|
+
|
|
149
|
+
8. Limitation of Liability.
|
|
150
|
+
|
|
151
|
+
In no event and under no legal theory, whether in tort (including negligence),
|
|
152
|
+
contract, or otherwise, unless required by applicable law (such as deliberate
|
|
153
|
+
and grossly negligent acts) or agreed to in writing, shall any Contributor be
|
|
154
|
+
liable to You for damages, including any direct, indirect, special, incidental,
|
|
155
|
+
or consequential damages of any character arising as a result of this License
|
|
156
|
+
or out of the use or inability to use the Work (including but not limited to
|
|
157
|
+
damages for loss of goodwill, work stoppage, computer failure or malfunction,
|
|
158
|
+
or any and all other commercial damages or losses), even if such Contributor
|
|
159
|
+
has been advised of the possibility of such damages.
|
|
160
|
+
|
|
161
|
+
9. Accepting Warranty or Additional Liability.
|
|
162
|
+
|
|
163
|
+
While redistributing the Work or Derivative Works thereof, You may choose to
|
|
164
|
+
offer, and charge a fee for, acceptance of support, warranty, indemnity, or
|
|
165
|
+
other liability obligations and/or rights consistent with this License.
|
|
166
|
+
However, in accepting such obligations, You may act only on Your own behalf and
|
|
167
|
+
on Your sole responsibility, not on behalf of any other Contributor, and only
|
|
168
|
+
if You agree to indemnify, defend, and hold each Contributor harmless for any
|
|
169
|
+
liability incurred by, or claims asserted against, such Contributor by reason
|
|
170
|
+
of your accepting any such warranty or additional liability.
|
|
171
|
+
|
|
172
|
+
END OF TERMS AND CONDITIONS
|
|
173
|
+
|
|
174
|
+
APPENDIX: How to apply the Apache License to your work.
|
|
175
|
+
|
|
176
|
+
To apply the Apache License to your work, attach the following boilerplate
|
|
177
|
+
notice, with the fields enclosed by brackets "[]" replaced with your own
|
|
178
|
+
identifying information. (Don't include the brackets!) The text should be
|
|
179
|
+
enclosed in the appropriate comment syntax for the file format. We also
|
|
180
|
+
recommend that a file or class name and description of purpose be included on
|
|
181
|
+
the same "printed page" as the copyright notice for easier identification
|
|
182
|
+
within third-party archives.
|
|
183
|
+
|
|
184
|
+
Copyright [yyyy] [name of copyright owner]
|
|
185
|
+
|
|
186
|
+
Licensed under the Apache License, Version 2.0 (the "License");
|
|
187
|
+
you may not use this file except in compliance with the License.
|
|
188
|
+
You may obtain a copy of the License at
|
|
189
|
+
|
|
190
|
+
http://www.apache.org/licenses/LICENSE-2.0
|
|
191
|
+
|
|
192
|
+
Unless required by applicable law or agreed to in writing, software
|
|
193
|
+
distributed under the License is distributed on an "AS IS" BASIS,
|
|
194
|
+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
|
195
|
+
See the License for the specific language governing permissions and
|
|
196
|
+
limitations under the License.
|
|
@@ -0,0 +1,16 @@
|
|
|
1
|
+
# Documentation And Artifact License
|
|
2
|
+
|
|
3
|
+
Unless otherwise noted, the narrative research materials and benchmark-facing artifacts in this repository are released under the Creative Commons Attribution 4.0 International license.
|
|
4
|
+
|
|
5
|
+
This applies in particular to public-facing materials such as:
|
|
6
|
+
|
|
7
|
+
- `findings/*.md`
|
|
8
|
+
- `benchmarks/public/*`
|
|
9
|
+
- `benchmarks/reference/**`
|
|
10
|
+
- checked-in benchmark bundles and release-facing artifact packets under `artifacts/`
|
|
11
|
+
|
|
12
|
+
License URL:
|
|
13
|
+
|
|
14
|
+
- https://creativecommons.org/licenses/by/4.0/
|
|
15
|
+
|
|
16
|
+
The source code, scripts, schemas, and other software/configuration files in this repository are released under the Apache License 2.0. See `LICENSE`.
|
|
@@ -0,0 +1,18 @@
|
|
|
1
|
+
include README.md
|
|
2
|
+
include LICENSE
|
|
3
|
+
include LICENSE-docs.md
|
|
4
|
+
include CITATION.cff
|
|
5
|
+
include PUBLIC_RELEASE_CHECKLIST.md
|
|
6
|
+
include .zenodo.json
|
|
7
|
+
graft artifacts
|
|
8
|
+
graft benchmarks
|
|
9
|
+
graft findings
|
|
10
|
+
graft orbit
|
|
11
|
+
graft problems
|
|
12
|
+
graft scripts
|
|
13
|
+
graft src
|
|
14
|
+
prune */__pycache__
|
|
15
|
+
global-exclude __pycache__
|
|
16
|
+
global-exclude *.py[cod]
|
|
17
|
+
global-exclude .DS_Store
|
|
18
|
+
|
|
@@ -0,0 +1,209 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: dormant-behavior-audit
|
|
3
|
+
Version: 1.0.0
|
|
4
|
+
Summary: Benchmark assets, reproducibility tooling, and evidence checks for dormant behavior audit.
|
|
5
|
+
Author: Cody Mitchell
|
|
6
|
+
License-Expression: Apache-2.0
|
|
7
|
+
Project-URL: Homepage, https://sproutseeds.github.io/dormant-behavior-audit/
|
|
8
|
+
Project-URL: Repository, https://github.com/SproutSeeds/dormant-behavior-audit
|
|
9
|
+
Project-URL: Documentation, https://sproutseeds.github.io/dormant-behavior-audit/
|
|
10
|
+
Project-URL: Changelog, https://github.com/SproutSeeds/dormant-behavior-audit/releases/tag/v1.0.0
|
|
11
|
+
Project-URL: Issues, https://github.com/SproutSeeds/dormant-behavior-audit/issues
|
|
12
|
+
Keywords: benchmark,llm-evals,model-auditing,reproducibility,dormant-behavior,interpretability
|
|
13
|
+
Classifier: Development Status :: 4 - Beta
|
|
14
|
+
Classifier: Intended Audience :: Science/Research
|
|
15
|
+
Classifier: Intended Audience :: Developers
|
|
16
|
+
Classifier: Programming Language :: Python :: 3
|
|
17
|
+
Classifier: Programming Language :: Python :: 3 :: Only
|
|
18
|
+
Classifier: Programming Language :: Python :: 3.10
|
|
19
|
+
Classifier: Programming Language :: Python :: 3.11
|
|
20
|
+
Classifier: Programming Language :: Python :: 3.12
|
|
21
|
+
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
|
|
22
|
+
Classifier: Topic :: Software Development :: Libraries :: Python Modules
|
|
23
|
+
Requires-Python: >=3.10
|
|
24
|
+
Description-Content-Type: text/markdown
|
|
25
|
+
License-File: LICENSE
|
|
26
|
+
License-File: LICENSE-docs.md
|
|
27
|
+
Requires-Dist: accelerate>=0.27.0
|
|
28
|
+
Requires-Dist: datasets>=2.18.0
|
|
29
|
+
Requires-Dist: huggingface_hub>=0.21.0
|
|
30
|
+
Requires-Dist: ipywidgets>=8.1.0
|
|
31
|
+
Requires-Dist: jsinfer
|
|
32
|
+
Requires-Dist: matplotlib>=3.8.0
|
|
33
|
+
Requires-Dist: numpy>=1.26.0
|
|
34
|
+
Requires-Dist: pandas>=2.2.0
|
|
35
|
+
Requires-Dist: plotly>=5.18.0
|
|
36
|
+
Requires-Dist: safetensors>=0.4.2
|
|
37
|
+
Requires-Dist: scikit-learn>=1.4.0
|
|
38
|
+
Requires-Dist: scipy>=1.12.0
|
|
39
|
+
Requires-Dist: seaborn>=0.13.0
|
|
40
|
+
Requires-Dist: torch>=2.1.0
|
|
41
|
+
Requires-Dist: tqdm>=4.66.0
|
|
42
|
+
Requires-Dist: transformers>=4.40.0
|
|
43
|
+
Requires-Dist: umap-learn>=0.5.5
|
|
44
|
+
Provides-Extra: notebooks
|
|
45
|
+
Requires-Dist: jupyter>=1.0.0; extra == "notebooks"
|
|
46
|
+
Requires-Dist: notebook>=7.0.0; extra == "notebooks"
|
|
47
|
+
Provides-Extra: tui
|
|
48
|
+
Requires-Dist: textual>=0.58.1; extra == "tui"
|
|
49
|
+
Provides-Extra: publish
|
|
50
|
+
Requires-Dist: build>=1.2.2; extra == "publish"
|
|
51
|
+
Requires-Dist: twine>=5.1.1; extra == "publish"
|
|
52
|
+
Dynamic: license-file
|
|
53
|
+
|
|
54
|
+
# Dormant Behavior Audit
|
|
55
|
+
|
|
56
|
+
This repository contains the flagship benchmark assets, reference bundle, and reproducibility materials for auditing latent, condition-dependent model behavior.
|
|
57
|
+
|
|
58
|
+
The motivating historical case is the Jane Street dormant-model puzzle, but the repo is now organized as a public benchmark and research release rather than a contest-only submission package.
|
|
59
|
+
|
|
60
|
+
## Slow Tour
|
|
61
|
+
|
|
62
|
+
<p align="center">
|
|
63
|
+
<img src="benchmarks/public/assets/readme-night-terminal.gif" width="780" alt="A minimal starry-night terminal animation showing the slow benchmark flow from charter to reference bundle to reproduction to claim checks to release." />
|
|
64
|
+
</p>
|
|
65
|
+
|
|
66
|
+
<p align="center"><em>A quiet walk through the release path: open the charter, inspect the reference bundle, rerun the evidence, compare claim checks, and package the release.</em></p>
|
|
67
|
+
|
|
68
|
+
## Start Here
|
|
69
|
+
|
|
70
|
+
If you want the quickest tour, read these in order:
|
|
71
|
+
|
|
72
|
+
1. [benchmarks/BENCHMARK_CHARTER.md](benchmarks/BENCHMARK_CHARTER.md)
|
|
73
|
+
2. [findings/RELEASE_PACKET_V2.md](findings/RELEASE_PACKET_V2.md)
|
|
74
|
+
3. [benchmarks/reference/dormant_puzzle_v1/benchmark_bundle_v0.json](benchmarks/reference/dormant_puzzle_v1/benchmark_bundle_v0.json)
|
|
75
|
+
4. [PUBLIC_RELEASE_CHECKLIST.md](PUBLIC_RELEASE_CHECKLIST.md)
|
|
76
|
+
5. [CONTRIBUTING.md](CONTRIBUTING.md)
|
|
77
|
+
|
|
78
|
+
## Install The CLI
|
|
79
|
+
|
|
80
|
+
The repository now builds as a Python package with a unified `dba` command.
|
|
81
|
+
|
|
82
|
+
```bash
|
|
83
|
+
pipx install dormant-behavior-audit
|
|
84
|
+
dba --help
|
|
85
|
+
```
|
|
86
|
+
|
|
87
|
+
For a local one-off run without a permanent install:
|
|
88
|
+
|
|
89
|
+
```bash
|
|
90
|
+
uvx --from dormant-behavior-audit dba --help
|
|
91
|
+
```
|
|
92
|
+
|
|
93
|
+
Optional extras:
|
|
94
|
+
|
|
95
|
+
- `pipx install 'dormant-behavior-audit[tui]'` for the Orbit Textual UI
|
|
96
|
+
- `pipx install 'dormant-behavior-audit[notebooks]'` for notebook-heavy local analysis
|
|
97
|
+
|
|
98
|
+
## What This Repo Ships
|
|
99
|
+
|
|
100
|
+
### Public-facing research packet
|
|
101
|
+
|
|
102
|
+
- Reference report index: [findings/RELEASE_PACKET_V2.md](findings/RELEASE_PACKET_V2.md)
|
|
103
|
+
- Canonical report PDF: <https://github.com/SproutSeeds/dormant-behavior-audit/releases/download/v1.0.0/dormant-behavior-audit-v1.0.0-reference-report.pdf>
|
|
104
|
+
- Repo copy of report PDF: [findings/CodyMitchell_DormantPuzzle_Submission_V2_2026-03-06.pdf](findings/CodyMitchell_DormantPuzzle_Submission_V2_2026-03-06.pdf)
|
|
105
|
+
- Main report markdown: [findings/SUBMISSION_V2.md](findings/SUBMISSION_V2.md)
|
|
106
|
+
- Statistical appendix: [findings/STATS_ADDENDUM_V2.md](findings/STATS_ADDENDUM_V2.md)
|
|
107
|
+
- Raw evidence appendix: [findings/RAW_EVIDENCE_APPENDIX_V2.md](findings/RAW_EVIDENCE_APPENDIX_V2.md)
|
|
108
|
+
- Implications memo: [findings/IMPLICATIONS_AND_APPLICATIONS_APPENDIX_V2.md](findings/IMPLICATIONS_AND_APPLICATIONS_APPENDIX_V2.md)
|
|
109
|
+
|
|
110
|
+
### Benchmark assets
|
|
111
|
+
|
|
112
|
+
- Benchmark overview: [benchmarks/README.md](benchmarks/README.md)
|
|
113
|
+
- Benchmark charter: [benchmarks/BENCHMARK_CHARTER.md](benchmarks/BENCHMARK_CHARTER.md)
|
|
114
|
+
- Launch plan: [benchmarks/LAUNCH_PLAN.md](benchmarks/LAUNCH_PLAN.md)
|
|
115
|
+
- Governance/versioning: [benchmarks/GOVERNANCE_AND_VERSIONING.md](benchmarks/GOVERNANCE_AND_VERSIONING.md)
|
|
116
|
+
- Public launch drafts: [benchmarks/public/README.md](benchmarks/public/README.md)
|
|
117
|
+
- Release notes: [benchmarks/public/RELEASE_NOTES_v1.0.0.md](benchmarks/public/RELEASE_NOTES_v1.0.0.md)
|
|
118
|
+
- Collaboration brief: [benchmarks/public/COLLABORATION_BRIEF.md](benchmarks/public/COLLABORATION_BRIEF.md)
|
|
119
|
+
- Standalone homepage: <https://sproutseeds.github.io/dormant-behavior-audit/>
|
|
120
|
+
- Frozen reference bundle: [benchmarks/reference/dormant_puzzle_v1/benchmark_bundle_v0.json](benchmarks/reference/dormant_puzzle_v1/benchmark_bundle_v0.json)
|
|
121
|
+
|
|
122
|
+
### Reproducibility artifacts
|
|
123
|
+
|
|
124
|
+
- Canonical reproduction bundle: [artifacts/reproduction/20260305_230206/](artifacts/reproduction/20260305_230206/)
|
|
125
|
+
- Tightening bundle: [artifacts/tightening/20260306_075440/](artifacts/tightening/20260306_075440/)
|
|
126
|
+
- Claim-level consistency report: [artifacts/reproduction/20260305_230206/findings/claim_consistency_report.md](artifacts/reproduction/20260305_230206/findings/claim_consistency_report.md)
|
|
127
|
+
- Bundle checker entry point: [scripts/check_benchmark_bundle.py](scripts/check_benchmark_bundle.py)
|
|
128
|
+
|
|
129
|
+
## Benchmark Shape
|
|
130
|
+
|
|
131
|
+
The current benchmark release has three layers:
|
|
132
|
+
|
|
133
|
+
- core local seeded and clean-control tasks,
|
|
134
|
+
- a naturalistic historical reference bundle built from the dormant puzzle result,
|
|
135
|
+
- and a supplementary hosted-comparator lane used for calibration and mechanism interpretation.
|
|
136
|
+
|
|
137
|
+
The benchmark is designed to reward:
|
|
138
|
+
|
|
139
|
+
- family recovery instead of one lucky string guess,
|
|
140
|
+
- candidate-versus-control specificity,
|
|
141
|
+
- repeated-run stability,
|
|
142
|
+
- interpretation-aware reporting,
|
|
143
|
+
- and artifact-rich submission packets instead of one scalar score.
|
|
144
|
+
|
|
145
|
+
## Reproducing The Reference Case
|
|
146
|
+
|
|
147
|
+
Install dependencies:
|
|
148
|
+
|
|
149
|
+
```bash
|
|
150
|
+
pip install -r requirements.txt
|
|
151
|
+
```
|
|
152
|
+
|
|
153
|
+
Run the reproducibility pipeline:
|
|
154
|
+
|
|
155
|
+
```bash
|
|
156
|
+
python3 scripts/reproduce_submission.py
|
|
157
|
+
```
|
|
158
|
+
|
|
159
|
+
This writes a fresh bundle under `artifacts/reproduction/<timestamp>/`.
|
|
160
|
+
|
|
161
|
+
Use these files to judge success:
|
|
162
|
+
|
|
163
|
+
- `artifacts/reproduction/<timestamp>/reproduction_report.md`
|
|
164
|
+
- `artifacts/reproduction/<timestamp>/findings/claim_consistency_report.md`
|
|
165
|
+
|
|
166
|
+
Important notes:
|
|
167
|
+
|
|
168
|
+
- local warmup stages are expected to reproduce on MPS-capable hardware,
|
|
169
|
+
- API-side artifacts are stochastic, so claim-level consistency matters more than exact JSON replay,
|
|
170
|
+
- and `scripts/reproduce_submission.py --warmup-start-stage ...` can resume a late warmup failure without rerunning the entire local sweep.
|
|
171
|
+
|
|
172
|
+
## Repo Map
|
|
173
|
+
|
|
174
|
+
- `benchmarks/`: benchmark specs, tasks, schemas, public-release drafts, and the normalized reference bundle
|
|
175
|
+
- `findings/`: public report packet, appendices, raw evidence snapshots, and release-facing validation records
|
|
176
|
+
- `artifacts/`: checked-in submission packets, reproduction bundles, tightening bundles, and hosted-baseline outputs
|
|
177
|
+
- `scripts/`: bundle builders, release checkers, reproducibility scripts, and analysis utilities
|
|
178
|
+
- `src/`, `orbit/`, `problems/`: earlier investigation and local-analysis surfaces preserved for provenance and follow-on work
|
|
179
|
+
|
|
180
|
+
## Release Status
|
|
181
|
+
|
|
182
|
+
The canonical release metadata lives in [benchmarks/public/release_metadata.json](benchmarks/public/release_metadata.json).
|
|
183
|
+
|
|
184
|
+
Current public release URLs:
|
|
185
|
+
|
|
186
|
+
- repo: <https://github.com/SproutSeeds/dormant-behavior-audit>
|
|
187
|
+
- tagged release: <https://github.com/SproutSeeds/dormant-behavior-audit/releases/tag/v1.0.0>
|
|
188
|
+
- canonical reference report PDF: <https://github.com/SproutSeeds/dormant-behavior-audit/releases/download/v1.0.0/dormant-behavior-audit-v1.0.0-reference-report.pdf>
|
|
189
|
+
- canonical reference bundle: <https://github.com/SproutSeeds/dormant-behavior-audit/releases/download/v1.0.0/dormant-behavior-audit-v1.0.0-reference-bundle.json>
|
|
190
|
+
- reference report markdown: <https://github.com/SproutSeeds/dormant-behavior-audit/blob/main/findings/SUBMISSION_V2.md>
|
|
191
|
+
- benchmark homepage: <https://sproutseeds.github.io/dormant-behavior-audit/>
|
|
192
|
+
|
|
193
|
+
The working launch checklist is still preserved in [PUBLIC_RELEASE_CHECKLIST.md](PUBLIC_RELEASE_CHECKLIST.md) as the release record.
|
|
194
|
+
|
|
195
|
+
## Licensing
|
|
196
|
+
|
|
197
|
+
- Code, scripts, and schemas: `Apache-2.0` via [LICENSE](LICENSE)
|
|
198
|
+
- Public-facing reports, benchmark docs, and release artifacts: `CC BY 4.0` via [LICENSE-docs.md](LICENSE-docs.md)
|
|
199
|
+
|
|
200
|
+
## Related Docs
|
|
201
|
+
|
|
202
|
+
- Public release checklist: [PUBLIC_RELEASE_CHECKLIST.md](PUBLIC_RELEASE_CHECKLIST.md)
|
|
203
|
+
- PyPI publishing guide: [PYPI_PUBLISHING.md](PYPI_PUBLISHING.md)
|
|
204
|
+
- Contributing guide: [CONTRIBUTING.md](CONTRIBUTING.md)
|
|
205
|
+
- Findings guide: [findings/README.md](findings/README.md)
|
|
206
|
+
- Collaboration brief: [benchmarks/public/COLLABORATION_BRIEF.md](benchmarks/public/COLLABORATION_BRIEF.md)
|
|
207
|
+
- Benchmark governance: [benchmarks/GOVERNANCE_AND_VERSIONING.md](benchmarks/GOVERNANCE_AND_VERSIONING.md)
|
|
208
|
+
- External platform status: [benchmarks/public/EXTERNAL_PLATFORM_STATUS.md](benchmarks/public/EXTERNAL_PLATFORM_STATUS.md)
|
|
209
|
+
- Hugging Face publish guide: [benchmarks/public/HUGGINGFACE_PUBLISHING.md](benchmarks/public/HUGGINGFACE_PUBLISHING.md)
|
|
@@ -0,0 +1,107 @@
|
|
|
1
|
+
# Public Release Checklist
|
|
2
|
+
|
|
3
|
+
This checklist now serves as the public release ledger for the initial `Dormant Behavior Audit` launch. The repository is live, and this document records what has already been frozen and what still deserves follow-on polish.
|
|
4
|
+
|
|
5
|
+
## 1. Public Identity
|
|
6
|
+
|
|
7
|
+
Current state:
|
|
8
|
+
|
|
9
|
+
- public repo name: `Dormant Behavior Audit`
|
|
10
|
+
- flagship report framing: dormant puzzle as the reference case
|
|
11
|
+
- public repo URL: `https://github.com/SproutSeeds/dormant-behavior-audit`
|
|
12
|
+
|
|
13
|
+
Follow-on items:
|
|
14
|
+
|
|
15
|
+
- confirm the long-form public paper title and canonical PDF filename
|
|
16
|
+
- finalize acknowledgments and corresponding contact if needed
|
|
17
|
+
|
|
18
|
+
## 2. Citable Artifacts
|
|
19
|
+
|
|
20
|
+
Completed:
|
|
21
|
+
|
|
22
|
+
- canonical packet index at `findings/RELEASE_PACKET_V2.md`
|
|
23
|
+
- frozen benchmark bundle at `benchmarks/reference/dormant_puzzle_v1/benchmark_bundle_v0.json`
|
|
24
|
+
- citation metadata at `CITATION.cff`
|
|
25
|
+
- canonical public report PDF URL via the tagged release asset
|
|
26
|
+
- first formal tagged release published as `v1.0.0`
|
|
27
|
+
|
|
28
|
+
Follow-on items:
|
|
29
|
+
|
|
30
|
+
- mirror the canonical PDF to an external paper host when ready
|
|
31
|
+
|
|
32
|
+
## 3. Repo Front Door
|
|
33
|
+
|
|
34
|
+
Completed:
|
|
35
|
+
|
|
36
|
+
- `README.md` is benchmark-first and public-facing
|
|
37
|
+
- `findings/README.md` is the public findings navigation layer
|
|
38
|
+
- public-facing licensing and citation files are present
|
|
39
|
+
|
|
40
|
+
Follow-on items:
|
|
41
|
+
|
|
42
|
+
- continue tightening public wording where any contest-era language leaks through
|
|
43
|
+
- add richer external landing pages if the project gets a standalone site
|
|
44
|
+
|
|
45
|
+
## 4. Release Switch
|
|
46
|
+
|
|
47
|
+
Completed:
|
|
48
|
+
|
|
49
|
+
- public URLs are set in `benchmarks/public/release_metadata.json`
|
|
50
|
+
- release status is `public`
|
|
51
|
+
- release metadata checks have been regenerated
|
|
52
|
+
- announcement date has been set for the initial public launch
|
|
53
|
+
- release metadata now names the formal public tag and release URL
|
|
54
|
+
|
|
55
|
+
## 5. Integrity Checks
|
|
56
|
+
|
|
57
|
+
Recommended recheck cadence:
|
|
58
|
+
|
|
59
|
+
- rerun the reproduction path with `python3 scripts/reproduce_submission.py` before any major tagged release
|
|
60
|
+
- rerun bundle and release metadata checks whenever release-facing assets move
|
|
61
|
+
- reconfirm claim-level consistency after any evidence-packet change
|
|
62
|
+
- verify that the benchmark scoreboard, packet index, and appendices still tell the same story
|
|
63
|
+
|
|
64
|
+
Definition of done:
|
|
65
|
+
- an external reader can tell what is reproducible,
|
|
66
|
+
- what is stochastic,
|
|
67
|
+
- and what evidence supports each major claim.
|
|
68
|
+
|
|
69
|
+
## 6. Publish In Layers
|
|
70
|
+
|
|
71
|
+
Recommended release stack:
|
|
72
|
+
|
|
73
|
+
1. GitHub repo update plus tagged release
|
|
74
|
+
2. Public PDF/report linked from the repo
|
|
75
|
+
3. Benchmark landing assets in `benchmarks/public/`
|
|
76
|
+
4. Hugging Face dataset/benchmark card
|
|
77
|
+
5. Papers with Code benchmark/task pages
|
|
78
|
+
6. Short announcement post and outreach note
|
|
79
|
+
|
|
80
|
+
## 7. Collaboration Packet
|
|
81
|
+
|
|
82
|
+
Completed:
|
|
83
|
+
|
|
84
|
+
- one-page overview at `benchmarks/public/COLLABORATION_BRIEF.md`
|
|
85
|
+
- public benchmark summary and announcement drafts in `benchmarks/public/`
|
|
86
|
+
|
|
87
|
+
Follow-on items:
|
|
88
|
+
|
|
89
|
+
- tailor one short outreach note per audience once the paper URL is final
|
|
90
|
+
- add issue templates for external replication and benchmark proposals if inbound volume grows
|
|
91
|
+
|
|
92
|
+
## 8. Current Gaps
|
|
93
|
+
|
|
94
|
+
The highest-value remaining gaps are:
|
|
95
|
+
|
|
96
|
+
- no Hugging Face or Papers with Code pages have been published yet
|
|
97
|
+
- no external paper host mirrors the report yet
|
|
98
|
+
|
|
99
|
+
## 9. What To Do Next
|
|
100
|
+
|
|
101
|
+
The highest-value next sequence is:
|
|
102
|
+
|
|
103
|
+
1. publish the discoverability surfaces,
|
|
104
|
+
2. mirror the report on an external paper host,
|
|
105
|
+
3. rerun the integrity checks before major updates,
|
|
106
|
+
4. begin active collaboration outreach,
|
|
107
|
+
5. and keep the standalone homepage aligned with major tagged releases.
|