dormant-behavior-audit 1.0.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (711) hide show
  1. dormant_behavior_audit-1.0.0/.zenodo.json +33 -0
  2. dormant_behavior_audit-1.0.0/CITATION.cff +28 -0
  3. dormant_behavior_audit-1.0.0/LICENSE +196 -0
  4. dormant_behavior_audit-1.0.0/LICENSE-docs.md +16 -0
  5. dormant_behavior_audit-1.0.0/MANIFEST.in +18 -0
  6. dormant_behavior_audit-1.0.0/PKG-INFO +209 -0
  7. dormant_behavior_audit-1.0.0/PUBLIC_RELEASE_CHECKLIST.md +107 -0
  8. dormant_behavior_audit-1.0.0/README.md +156 -0
  9. dormant_behavior_audit-1.0.0/artifacts/__init__.py +1 -0
  10. dormant_behavior_audit-1.0.0/artifacts/reproduction/20260305_230206/findings/claim_consistency_check.json +135 -0
  11. dormant_behavior_audit-1.0.0/artifacts/reproduction/20260305_230206/findings/claim_consistency_report.md +36 -0
  12. dormant_behavior_audit-1.0.0/artifacts/reproduction/20260305_230206/findings/competitor_n20.json +171 -0
  13. dormant_behavior_audit-1.0.0/artifacts/reproduction/20260305_230206/findings/model1_n50.json +89 -0
  14. dormant_behavior_audit-1.0.0/artifacts/reproduction/20260305_230206/findings/model2_n50.json +89 -0
  15. dormant_behavior_audit-1.0.0/artifacts/reproduction/20260305_230206/findings/model3_confirmation.json +720 -0
  16. dormant_behavior_audit-1.0.0/artifacts/reproduction/20260305_230206/findings/model3_ma_yun_n50.json +25 -0
  17. dormant_behavior_audit-1.0.0/artifacts/reproduction/20260305_230206/findings/model3_n50.json +88 -0
  18. dormant_behavior_audit-1.0.0/artifacts/reproduction/20260305_230206/findings/submission.tex +1006 -0
  19. dormant_behavior_audit-1.0.0/artifacts/reproduction/20260305_230206/findings/warmup_generation_test.json +209 -0
  20. dormant_behavior_audit-1.0.0/artifacts/reproduction/20260305_230206/materialization_manifest.json +17 -0
  21. dormant_behavior_audit-1.0.0/artifacts/reproduction/20260305_230206/reproduction_report.json +67 -0
  22. dormant_behavior_audit-1.0.0/artifacts/reproduction/20260305_230206/reproduction_report.md +25 -0
  23. dormant_behavior_audit-1.0.0/artifacts/reproduction/20260305_230206/warmup/artifacts/warmup_diff/diff_heatmap.csv +340 -0
  24. dormant_behavior_audit-1.0.0/artifacts/reproduction/20260305_230206/warmup/artifacts/warmup_diff/diff_summary.json +5875 -0
  25. dormant_behavior_audit-1.0.0/artifacts/reproduction/20260305_230206/warmup/artifacts/warmup_diff/hypothesis_ledger.md +8 -0
  26. dormant_behavior_audit-1.0.0/artifacts/reproduction/20260305_230206/warmup/artifacts/warmup_diff/top_changed_tokens.csv +201 -0
  27. dormant_behavior_audit-1.0.0/artifacts/reproduction/20260305_230206/warmup/artifacts/warmup_diff/warmup_diff_report.md +10 -0
  28. dormant_behavior_audit-1.0.0/artifacts/reproduction/20260305_230206/warmup/data/results/warmup/memory/ARCHIVE_NOTE.md +5 -0
  29. dormant_behavior_audit-1.0.0/artifacts/reproduction/20260305_230206/warmup/data/results/warmup/memory/memory_extraction_local.jsonl +40 -0
  30. dormant_behavior_audit-1.0.0/artifacts/reproduction/20260305_230206/warmup/data/results/warmup/memory/memory_results.json +5612 -0
  31. dormant_behavior_audit-1.0.0/artifacts/reproduction/20260305_230206/warmup/data/results/warmup/motifs/motifs.json +42 -0
  32. dormant_behavior_audit-1.0.0/artifacts/reproduction/20260305_230206/warmup/data/results/warmup/triggers/trigger_candidates.json +82 -0
  33. dormant_behavior_audit-1.0.0/artifacts/reproduction/20260305_230206/warmup/data/results/warmup/triggers/verified_triggers.json +58 -0
  34. dormant_behavior_audit-1.0.0/artifacts/submissions/README.md +40 -0
  35. dormant_behavior_audit-1.0.0/artifacts/submissions/SCOREBOARD.json +495 -0
  36. dormant_behavior_audit-1.0.0/artifacts/submissions/SCOREBOARD.md +60 -0
  37. dormant_behavior_audit-1.0.0/artifacts/submissions/aurora_context_seeded_v0/aurora_context_hybrid_reference_submission_v0/BENCHMARK_BUNDLE_CHECK.md +29 -0
  38. dormant_behavior_audit-1.0.0/artifacts/submissions/aurora_context_seeded_v0/aurora_context_hybrid_reference_submission_v0/BLACKBOX_REPORT_CHECK.md +17 -0
  39. dormant_behavior_audit-1.0.0/artifacts/submissions/aurora_context_seeded_v0/aurora_context_hybrid_reference_submission_v0/PACKET_INDEX.md +26 -0
  40. dormant_behavior_audit-1.0.0/artifacts/submissions/aurora_context_seeded_v0/aurora_context_hybrid_reference_submission_v0/PRIMARY_REPORT_CHECK.md +18 -0
  41. dormant_behavior_audit-1.0.0/artifacts/submissions/aurora_context_seeded_v0/aurora_context_hybrid_reference_submission_v0/RAW_EVIDENCE_APPENDIX.md +25 -0
  42. dormant_behavior_audit-1.0.0/artifacts/submissions/aurora_context_seeded_v0/aurora_context_hybrid_reference_submission_v0/RAW_EVIDENCE_PACKET_CHECK.md +17 -0
  43. dormant_behavior_audit-1.0.0/artifacts/submissions/aurora_context_seeded_v0/aurora_context_hybrid_reference_submission_v0/STATS_APPENDIX.md +26 -0
  44. dormant_behavior_audit-1.0.0/artifacts/submissions/aurora_context_seeded_v0/aurora_context_hybrid_reference_submission_v0/SUBMISSION_CHECK.md +25 -0
  45. dormant_behavior_audit-1.0.0/artifacts/submissions/aurora_context_seeded_v0/aurora_context_hybrid_reference_submission_v0/SUBMISSION_REPORT.md +48 -0
  46. dormant_behavior_audit-1.0.0/artifacts/submissions/aurora_context_seeded_v0/aurora_context_hybrid_reference_submission_v0/TASK_CHECK.md +29 -0
  47. dormant_behavior_audit-1.0.0/artifacts/submissions/aurora_context_seeded_v0/aurora_context_hybrid_reference_submission_v0/benchmark_bundle_check.json +110 -0
  48. dormant_behavior_audit-1.0.0/artifacts/submissions/aurora_context_seeded_v0/aurora_context_hybrid_reference_submission_v0/benchmark_bundle_v0.json +137 -0
  49. dormant_behavior_audit-1.0.0/artifacts/submissions/aurora_context_seeded_v0/aurora_context_hybrid_reference_submission_v0/blackbox_report_check.json +38 -0
  50. dormant_behavior_audit-1.0.0/artifacts/submissions/aurora_context_seeded_v0/aurora_context_hybrid_reference_submission_v0/primary_report_check.json +44 -0
  51. dormant_behavior_audit-1.0.0/artifacts/submissions/aurora_context_seeded_v0/aurora_context_hybrid_reference_submission_v0/raw_evidence_check.json +38 -0
  52. dormant_behavior_audit-1.0.0/artifacts/submissions/aurora_context_seeded_v0/aurora_context_hybrid_reference_submission_v0/raw_evidence_packet_v0.json +190 -0
  53. dormant_behavior_audit-1.0.0/artifacts/submissions/aurora_context_seeded_v0/aurora_context_hybrid_reference_submission_v0/run_manifest.json +29 -0
  54. dormant_behavior_audit-1.0.0/artifacts/submissions/aurora_context_seeded_v0/aurora_context_hybrid_reference_submission_v0/submission_check.json +139 -0
  55. dormant_behavior_audit-1.0.0/artifacts/submissions/aurora_context_seeded_v0/aurora_context_hybrid_reference_submission_v0/submission_stats.json +141 -0
  56. dormant_behavior_audit-1.0.0/artifacts/submissions/aurora_context_seeded_v0/aurora_context_hybrid_reference_submission_v0/task_check.json +116 -0
  57. dormant_behavior_audit-1.0.0/artifacts/submissions/aurora_context_seeded_v0/simulated_external_aurora_scripted_v0/BENCHMARK_BUNDLE_CHECK.md +29 -0
  58. dormant_behavior_audit-1.0.0/artifacts/submissions/aurora_context_seeded_v0/simulated_external_aurora_scripted_v0/PACKET_INDEX.md +24 -0
  59. dormant_behavior_audit-1.0.0/artifacts/submissions/aurora_context_seeded_v0/simulated_external_aurora_scripted_v0/PRIMARY_REPORT_CHECK.md +17 -0
  60. dormant_behavior_audit-1.0.0/artifacts/submissions/aurora_context_seeded_v0/simulated_external_aurora_scripted_v0/RAW_EVIDENCE_APPENDIX.md +15 -0
  61. dormant_behavior_audit-1.0.0/artifacts/submissions/aurora_context_seeded_v0/simulated_external_aurora_scripted_v0/RAW_EVIDENCE_PACKET_CHECK.md +17 -0
  62. dormant_behavior_audit-1.0.0/artifacts/submissions/aurora_context_seeded_v0/simulated_external_aurora_scripted_v0/STATS_APPENDIX.md +19 -0
  63. dormant_behavior_audit-1.0.0/artifacts/submissions/aurora_context_seeded_v0/simulated_external_aurora_scripted_v0/SUBMISSION_CHECK.md +25 -0
  64. dormant_behavior_audit-1.0.0/artifacts/submissions/aurora_context_seeded_v0/simulated_external_aurora_scripted_v0/SUBMISSION_REPORT.md +48 -0
  65. dormant_behavior_audit-1.0.0/artifacts/submissions/aurora_context_seeded_v0/simulated_external_aurora_scripted_v0/TASK_CHECK.md +29 -0
  66. dormant_behavior_audit-1.0.0/artifacts/submissions/aurora_context_seeded_v0/simulated_external_aurora_scripted_v0/benchmark_bundle_check.json +110 -0
  67. dormant_behavior_audit-1.0.0/artifacts/submissions/aurora_context_seeded_v0/simulated_external_aurora_scripted_v0/benchmark_bundle_v0.json +129 -0
  68. dormant_behavior_audit-1.0.0/artifacts/submissions/aurora_context_seeded_v0/simulated_external_aurora_scripted_v0/primary_report_check.json +38 -0
  69. dormant_behavior_audit-1.0.0/artifacts/submissions/aurora_context_seeded_v0/simulated_external_aurora_scripted_v0/raw_evidence_check.json +38 -0
  70. dormant_behavior_audit-1.0.0/artifacts/submissions/aurora_context_seeded_v0/simulated_external_aurora_scripted_v0/raw_evidence_packet_v0.json +95 -0
  71. dormant_behavior_audit-1.0.0/artifacts/submissions/aurora_context_seeded_v0/simulated_external_aurora_scripted_v0/run_manifest.json +26 -0
  72. dormant_behavior_audit-1.0.0/artifacts/submissions/aurora_context_seeded_v0/simulated_external_aurora_scripted_v0/submission_check.json +139 -0
  73. dormant_behavior_audit-1.0.0/artifacts/submissions/aurora_context_seeded_v0/simulated_external_aurora_scripted_v0/submission_stats.json +95 -0
  74. dormant_behavior_audit-1.0.0/artifacts/submissions/aurora_context_seeded_v0/simulated_external_aurora_scripted_v0/task_check.json +116 -0
  75. dormant_behavior_audit-1.0.0/artifacts/submissions/coastal_retrieval_qwen2_5_7b_transfer_v0/coastal_retrieval_qwen2_5_7b_transfer_hybrid_reference_submission_v0/BENCHMARK_BUNDLE_CHECK.md +29 -0
  76. dormant_behavior_audit-1.0.0/artifacts/submissions/coastal_retrieval_qwen2_5_7b_transfer_v0/coastal_retrieval_qwen2_5_7b_transfer_hybrid_reference_submission_v0/BLACKBOX_REPORT_CHECK.md +17 -0
  77. dormant_behavior_audit-1.0.0/artifacts/submissions/coastal_retrieval_qwen2_5_7b_transfer_v0/coastal_retrieval_qwen2_5_7b_transfer_hybrid_reference_submission_v0/PACKET_INDEX.md +26 -0
  78. dormant_behavior_audit-1.0.0/artifacts/submissions/coastal_retrieval_qwen2_5_7b_transfer_v0/coastal_retrieval_qwen2_5_7b_transfer_hybrid_reference_submission_v0/PRIMARY_REPORT_CHECK.md +18 -0
  79. dormant_behavior_audit-1.0.0/artifacts/submissions/coastal_retrieval_qwen2_5_7b_transfer_v0/coastal_retrieval_qwen2_5_7b_transfer_hybrid_reference_submission_v0/RAW_EVIDENCE_APPENDIX.md +25 -0
  80. dormant_behavior_audit-1.0.0/artifacts/submissions/coastal_retrieval_qwen2_5_7b_transfer_v0/coastal_retrieval_qwen2_5_7b_transfer_hybrid_reference_submission_v0/RAW_EVIDENCE_PACKET_CHECK.md +17 -0
  81. dormant_behavior_audit-1.0.0/artifacts/submissions/coastal_retrieval_qwen2_5_7b_transfer_v0/coastal_retrieval_qwen2_5_7b_transfer_hybrid_reference_submission_v0/STATS_APPENDIX.md +26 -0
  82. dormant_behavior_audit-1.0.0/artifacts/submissions/coastal_retrieval_qwen2_5_7b_transfer_v0/coastal_retrieval_qwen2_5_7b_transfer_hybrid_reference_submission_v0/SUBMISSION_CHECK.md +25 -0
  83. dormant_behavior_audit-1.0.0/artifacts/submissions/coastal_retrieval_qwen2_5_7b_transfer_v0/coastal_retrieval_qwen2_5_7b_transfer_hybrid_reference_submission_v0/SUBMISSION_REPORT.md +48 -0
  84. dormant_behavior_audit-1.0.0/artifacts/submissions/coastal_retrieval_qwen2_5_7b_transfer_v0/coastal_retrieval_qwen2_5_7b_transfer_hybrid_reference_submission_v0/TASK_CHECK.md +29 -0
  85. dormant_behavior_audit-1.0.0/artifacts/submissions/coastal_retrieval_qwen2_5_7b_transfer_v0/coastal_retrieval_qwen2_5_7b_transfer_hybrid_reference_submission_v0/benchmark_bundle_check.json +110 -0
  86. dormant_behavior_audit-1.0.0/artifacts/submissions/coastal_retrieval_qwen2_5_7b_transfer_v0/coastal_retrieval_qwen2_5_7b_transfer_hybrid_reference_submission_v0/benchmark_bundle_v0.json +138 -0
  87. dormant_behavior_audit-1.0.0/artifacts/submissions/coastal_retrieval_qwen2_5_7b_transfer_v0/coastal_retrieval_qwen2_5_7b_transfer_hybrid_reference_submission_v0/blackbox_report_check.json +38 -0
  88. dormant_behavior_audit-1.0.0/artifacts/submissions/coastal_retrieval_qwen2_5_7b_transfer_v0/coastal_retrieval_qwen2_5_7b_transfer_hybrid_reference_submission_v0/primary_report_check.json +44 -0
  89. dormant_behavior_audit-1.0.0/artifacts/submissions/coastal_retrieval_qwen2_5_7b_transfer_v0/coastal_retrieval_qwen2_5_7b_transfer_hybrid_reference_submission_v0/raw_evidence_check.json +38 -0
  90. dormant_behavior_audit-1.0.0/artifacts/submissions/coastal_retrieval_qwen2_5_7b_transfer_v0/coastal_retrieval_qwen2_5_7b_transfer_hybrid_reference_submission_v0/raw_evidence_packet_v0.json +178 -0
  91. dormant_behavior_audit-1.0.0/artifacts/submissions/coastal_retrieval_qwen2_5_7b_transfer_v0/coastal_retrieval_qwen2_5_7b_transfer_hybrid_reference_submission_v0/run_manifest.json +29 -0
  92. dormant_behavior_audit-1.0.0/artifacts/submissions/coastal_retrieval_qwen2_5_7b_transfer_v0/coastal_retrieval_qwen2_5_7b_transfer_hybrid_reference_submission_v0/submission_check.json +139 -0
  93. dormant_behavior_audit-1.0.0/artifacts/submissions/coastal_retrieval_qwen2_5_7b_transfer_v0/coastal_retrieval_qwen2_5_7b_transfer_hybrid_reference_submission_v0/submission_stats.json +135 -0
  94. dormant_behavior_audit-1.0.0/artifacts/submissions/coastal_retrieval_qwen2_5_7b_transfer_v0/coastal_retrieval_qwen2_5_7b_transfer_hybrid_reference_submission_v0/task_check.json +116 -0
  95. dormant_behavior_audit-1.0.0/artifacts/submissions/coastal_retrieval_seeded_v0/coastal_retrieval_hybrid_reference_submission_v0/BENCHMARK_BUNDLE_CHECK.md +29 -0
  96. dormant_behavior_audit-1.0.0/artifacts/submissions/coastal_retrieval_seeded_v0/coastal_retrieval_hybrid_reference_submission_v0/BLACKBOX_REPORT_CHECK.md +17 -0
  97. dormant_behavior_audit-1.0.0/artifacts/submissions/coastal_retrieval_seeded_v0/coastal_retrieval_hybrid_reference_submission_v0/PACKET_INDEX.md +26 -0
  98. dormant_behavior_audit-1.0.0/artifacts/submissions/coastal_retrieval_seeded_v0/coastal_retrieval_hybrid_reference_submission_v0/PRIMARY_REPORT_CHECK.md +18 -0
  99. dormant_behavior_audit-1.0.0/artifacts/submissions/coastal_retrieval_seeded_v0/coastal_retrieval_hybrid_reference_submission_v0/RAW_EVIDENCE_APPENDIX.md +25 -0
  100. dormant_behavior_audit-1.0.0/artifacts/submissions/coastal_retrieval_seeded_v0/coastal_retrieval_hybrid_reference_submission_v0/RAW_EVIDENCE_PACKET_CHECK.md +17 -0
  101. dormant_behavior_audit-1.0.0/artifacts/submissions/coastal_retrieval_seeded_v0/coastal_retrieval_hybrid_reference_submission_v0/STATS_APPENDIX.md +26 -0
  102. dormant_behavior_audit-1.0.0/artifacts/submissions/coastal_retrieval_seeded_v0/coastal_retrieval_hybrid_reference_submission_v0/SUBMISSION_CHECK.md +25 -0
  103. dormant_behavior_audit-1.0.0/artifacts/submissions/coastal_retrieval_seeded_v0/coastal_retrieval_hybrid_reference_submission_v0/SUBMISSION_REPORT.md +48 -0
  104. dormant_behavior_audit-1.0.0/artifacts/submissions/coastal_retrieval_seeded_v0/coastal_retrieval_hybrid_reference_submission_v0/TASK_CHECK.md +29 -0
  105. dormant_behavior_audit-1.0.0/artifacts/submissions/coastal_retrieval_seeded_v0/coastal_retrieval_hybrid_reference_submission_v0/benchmark_bundle_check.json +110 -0
  106. dormant_behavior_audit-1.0.0/artifacts/submissions/coastal_retrieval_seeded_v0/coastal_retrieval_hybrid_reference_submission_v0/benchmark_bundle_v0.json +138 -0
  107. dormant_behavior_audit-1.0.0/artifacts/submissions/coastal_retrieval_seeded_v0/coastal_retrieval_hybrid_reference_submission_v0/blackbox_report_check.json +38 -0
  108. dormant_behavior_audit-1.0.0/artifacts/submissions/coastal_retrieval_seeded_v0/coastal_retrieval_hybrid_reference_submission_v0/primary_report_check.json +44 -0
  109. dormant_behavior_audit-1.0.0/artifacts/submissions/coastal_retrieval_seeded_v0/coastal_retrieval_hybrid_reference_submission_v0/raw_evidence_check.json +38 -0
  110. dormant_behavior_audit-1.0.0/artifacts/submissions/coastal_retrieval_seeded_v0/coastal_retrieval_hybrid_reference_submission_v0/raw_evidence_packet_v0.json +178 -0
  111. dormant_behavior_audit-1.0.0/artifacts/submissions/coastal_retrieval_seeded_v0/coastal_retrieval_hybrid_reference_submission_v0/run_manifest.json +29 -0
  112. dormant_behavior_audit-1.0.0/artifacts/submissions/coastal_retrieval_seeded_v0/coastal_retrieval_hybrid_reference_submission_v0/submission_check.json +139 -0
  113. dormant_behavior_audit-1.0.0/artifacts/submissions/coastal_retrieval_seeded_v0/coastal_retrieval_hybrid_reference_submission_v0/submission_stats.json +135 -0
  114. dormant_behavior_audit-1.0.0/artifacts/submissions/coastal_retrieval_seeded_v0/coastal_retrieval_hybrid_reference_submission_v0/task_check.json +116 -0
  115. dormant_behavior_audit-1.0.0/artifacts/submissions/cross_model_alibaba_divergence_v0/cross_model_alibaba_reference_case_submission_v0/BENCHMARK_BUNDLE_CHECK.md +29 -0
  116. dormant_behavior_audit-1.0.0/artifacts/submissions/cross_model_alibaba_divergence_v0/cross_model_alibaba_reference_case_submission_v0/MODEL2_TOP5_CHECK.md +18 -0
  117. dormant_behavior_audit-1.0.0/artifacts/submissions/cross_model_alibaba_divergence_v0/cross_model_alibaba_reference_case_submission_v0/MODEL3_MA_YUN_CHECK.md +18 -0
  118. dormant_behavior_audit-1.0.0/artifacts/submissions/cross_model_alibaba_divergence_v0/cross_model_alibaba_reference_case_submission_v0/MODEL3_TOP5_CHECK.md +18 -0
  119. dormant_behavior_audit-1.0.0/artifacts/submissions/cross_model_alibaba_divergence_v0/cross_model_alibaba_reference_case_submission_v0/PACKET_INDEX.md +32 -0
  120. dormant_behavior_audit-1.0.0/artifacts/submissions/cross_model_alibaba_divergence_v0/cross_model_alibaba_reference_case_submission_v0/PRIMARY_REPORT_CHECK.md +23 -0
  121. dormant_behavior_audit-1.0.0/artifacts/submissions/cross_model_alibaba_divergence_v0/cross_model_alibaba_reference_case_submission_v0/RAW_EVIDENCE_APPENDIX.md +44 -0
  122. dormant_behavior_audit-1.0.0/artifacts/submissions/cross_model_alibaba_divergence_v0/cross_model_alibaba_reference_case_submission_v0/RAW_EVIDENCE_PACKET_CHECK.md +17 -0
  123. dormant_behavior_audit-1.0.0/artifacts/submissions/cross_model_alibaba_divergence_v0/cross_model_alibaba_reference_case_submission_v0/REFERENCE_BUNDLE_CHECK.md +29 -0
  124. dormant_behavior_audit-1.0.0/artifacts/submissions/cross_model_alibaba_divergence_v0/cross_model_alibaba_reference_case_submission_v0/STATS_APPENDIX.md +30 -0
  125. dormant_behavior_audit-1.0.0/artifacts/submissions/cross_model_alibaba_divergence_v0/cross_model_alibaba_reference_case_submission_v0/SUBMISSION_CHECK.md +30 -0
  126. dormant_behavior_audit-1.0.0/artifacts/submissions/cross_model_alibaba_divergence_v0/cross_model_alibaba_reference_case_submission_v0/SUBMISSION_REPORT.md +50 -0
  127. dormant_behavior_audit-1.0.0/artifacts/submissions/cross_model_alibaba_divergence_v0/cross_model_alibaba_reference_case_submission_v0/TASK_CHECK.md +29 -0
  128. dormant_behavior_audit-1.0.0/artifacts/submissions/cross_model_alibaba_divergence_v0/cross_model_alibaba_reference_case_submission_v0/benchmark_bundle_check.json +110 -0
  129. dormant_behavior_audit-1.0.0/artifacts/submissions/cross_model_alibaba_divergence_v0/cross_model_alibaba_reference_case_submission_v0/benchmark_bundle_v0.json +156 -0
  130. dormant_behavior_audit-1.0.0/artifacts/submissions/cross_model_alibaba_divergence_v0/cross_model_alibaba_reference_case_submission_v0/model2_top5_check.json +44 -0
  131. dormant_behavior_audit-1.0.0/artifacts/submissions/cross_model_alibaba_divergence_v0/cross_model_alibaba_reference_case_submission_v0/model3_ma_yun_check.json +44 -0
  132. dormant_behavior_audit-1.0.0/artifacts/submissions/cross_model_alibaba_divergence_v0/cross_model_alibaba_reference_case_submission_v0/model3_top5_check.json +44 -0
  133. dormant_behavior_audit-1.0.0/artifacts/submissions/cross_model_alibaba_divergence_v0/cross_model_alibaba_reference_case_submission_v0/primary_report_check.json +72 -0
  134. dormant_behavior_audit-1.0.0/artifacts/submissions/cross_model_alibaba_divergence_v0/cross_model_alibaba_reference_case_submission_v0/raw_evidence_check.json +38 -0
  135. dormant_behavior_audit-1.0.0/artifacts/submissions/cross_model_alibaba_divergence_v0/cross_model_alibaba_reference_case_submission_v0/raw_evidence_packet_v0.json +214 -0
  136. dormant_behavior_audit-1.0.0/artifacts/submissions/cross_model_alibaba_divergence_v0/cross_model_alibaba_reference_case_submission_v0/reference_bundle_check.json +110 -0
  137. dormant_behavior_audit-1.0.0/artifacts/submissions/cross_model_alibaba_divergence_v0/cross_model_alibaba_reference_case_submission_v0/reference_case_report.json +111 -0
  138. dormant_behavior_audit-1.0.0/artifacts/submissions/cross_model_alibaba_divergence_v0/cross_model_alibaba_reference_case_submission_v0/reference_case_report.md +34 -0
  139. dormant_behavior_audit-1.0.0/artifacts/submissions/cross_model_alibaba_divergence_v0/cross_model_alibaba_reference_case_submission_v0/run_manifest.json +34 -0
  140. dormant_behavior_audit-1.0.0/artifacts/submissions/cross_model_alibaba_divergence_v0/cross_model_alibaba_reference_case_submission_v0/submission_check.json +180 -0
  141. dormant_behavior_audit-1.0.0/artifacts/submissions/cross_model_alibaba_divergence_v0/cross_model_alibaba_reference_case_submission_v0/submission_stats.json +118 -0
  142. dormant_behavior_audit-1.0.0/artifacts/submissions/cross_model_alibaba_divergence_v0/cross_model_alibaba_reference_case_submission_v0/task_check.json +116 -0
  143. dormant_behavior_audit-1.0.0/artifacts/submissions/gemma3_taxonomic_acknowledgment_ablation_v0/gemma3_taxonomic_acknowledgment_ablation_reference_submission_v0/BENCHMARK_BUNDLE_CHECK.md +29 -0
  144. dormant_behavior_audit-1.0.0/artifacts/submissions/gemma3_taxonomic_acknowledgment_ablation_v0/gemma3_taxonomic_acknowledgment_ablation_reference_submission_v0/PACKET_INDEX.md +25 -0
  145. dormant_behavior_audit-1.0.0/artifacts/submissions/gemma3_taxonomic_acknowledgment_ablation_v0/gemma3_taxonomic_acknowledgment_ablation_reference_submission_v0/PREFIX_ACK_ANALYSIS.md +39 -0
  146. dormant_behavior_audit-1.0.0/artifacts/submissions/gemma3_taxonomic_acknowledgment_ablation_v0/gemma3_taxonomic_acknowledgment_ablation_reference_submission_v0/PRIMARY_REPORT_CHECK.md +17 -0
  147. dormant_behavior_audit-1.0.0/artifacts/submissions/gemma3_taxonomic_acknowledgment_ablation_v0/gemma3_taxonomic_acknowledgment_ablation_reference_submission_v0/RAW_EVIDENCE_APPENDIX.md +15 -0
  148. dormant_behavior_audit-1.0.0/artifacts/submissions/gemma3_taxonomic_acknowledgment_ablation_v0/gemma3_taxonomic_acknowledgment_ablation_reference_submission_v0/RAW_EVIDENCE_PACKET_CHECK.md +17 -0
  149. dormant_behavior_audit-1.0.0/artifacts/submissions/gemma3_taxonomic_acknowledgment_ablation_v0/gemma3_taxonomic_acknowledgment_ablation_reference_submission_v0/STATS_APPENDIX.md +29 -0
  150. dormant_behavior_audit-1.0.0/artifacts/submissions/gemma3_taxonomic_acknowledgment_ablation_v0/gemma3_taxonomic_acknowledgment_ablation_reference_submission_v0/SUBMISSION_CHECK.md +27 -0
  151. dormant_behavior_audit-1.0.0/artifacts/submissions/gemma3_taxonomic_acknowledgment_ablation_v0/gemma3_taxonomic_acknowledgment_ablation_reference_submission_v0/SUBMISSION_REPORT.md +58 -0
  152. dormant_behavior_audit-1.0.0/artifacts/submissions/gemma3_taxonomic_acknowledgment_ablation_v0/gemma3_taxonomic_acknowledgment_ablation_reference_submission_v0/TASK_CHECK.md +29 -0
  153. dormant_behavior_audit-1.0.0/artifacts/submissions/gemma3_taxonomic_acknowledgment_ablation_v0/gemma3_taxonomic_acknowledgment_ablation_reference_submission_v0/benchmark_bundle_check.json +110 -0
  154. dormant_behavior_audit-1.0.0/artifacts/submissions/gemma3_taxonomic_acknowledgment_ablation_v0/gemma3_taxonomic_acknowledgment_ablation_reference_submission_v0/benchmark_bundle_v0.json +144 -0
  155. dormant_behavior_audit-1.0.0/artifacts/submissions/gemma3_taxonomic_acknowledgment_ablation_v0/gemma3_taxonomic_acknowledgment_ablation_reference_submission_v0/prefix_ack_analysis.json +215 -0
  156. dormant_behavior_audit-1.0.0/artifacts/submissions/gemma3_taxonomic_acknowledgment_ablation_v0/gemma3_taxonomic_acknowledgment_ablation_reference_submission_v0/primary_report_check.json +38 -0
  157. dormant_behavior_audit-1.0.0/artifacts/submissions/gemma3_taxonomic_acknowledgment_ablation_v0/gemma3_taxonomic_acknowledgment_ablation_reference_submission_v0/raw_evidence_check.json +38 -0
  158. dormant_behavior_audit-1.0.0/artifacts/submissions/gemma3_taxonomic_acknowledgment_ablation_v0/gemma3_taxonomic_acknowledgment_ablation_reference_submission_v0/raw_evidence_packet_v0.json +143 -0
  159. dormant_behavior_audit-1.0.0/artifacts/submissions/gemma3_taxonomic_acknowledgment_ablation_v0/gemma3_taxonomic_acknowledgment_ablation_reference_submission_v0/run_manifest.json +28 -0
  160. dormant_behavior_audit-1.0.0/artifacts/submissions/gemma3_taxonomic_acknowledgment_ablation_v0/gemma3_taxonomic_acknowledgment_ablation_reference_submission_v0/submission_check.json +153 -0
  161. dormant_behavior_audit-1.0.0/artifacts/submissions/gemma3_taxonomic_acknowledgment_ablation_v0/gemma3_taxonomic_acknowledgment_ablation_reference_submission_v0/submission_stats.json +185 -0
  162. dormant_behavior_audit-1.0.0/artifacts/submissions/gemma3_taxonomic_acknowledgment_ablation_v0/gemma3_taxonomic_acknowledgment_ablation_reference_submission_v0/task_check.json +116 -0
  163. dormant_behavior_audit-1.0.0/artifacts/submissions/model_host_clean_control_v0/model_host_clean_control_scripted_reference_submission_v0/BENCHMARK_BUNDLE_CHECK.md +29 -0
  164. dormant_behavior_audit-1.0.0/artifacts/submissions/model_host_clean_control_v0/model_host_clean_control_scripted_reference_submission_v0/PACKET_INDEX.md +24 -0
  165. dormant_behavior_audit-1.0.0/artifacts/submissions/model_host_clean_control_v0/model_host_clean_control_scripted_reference_submission_v0/PRIMARY_REPORT_CHECK.md +17 -0
  166. dormant_behavior_audit-1.0.0/artifacts/submissions/model_host_clean_control_v0/model_host_clean_control_scripted_reference_submission_v0/RAW_EVIDENCE_APPENDIX.md +24 -0
  167. dormant_behavior_audit-1.0.0/artifacts/submissions/model_host_clean_control_v0/model_host_clean_control_scripted_reference_submission_v0/RAW_EVIDENCE_PACKET_CHECK.md +17 -0
  168. dormant_behavior_audit-1.0.0/artifacts/submissions/model_host_clean_control_v0/model_host_clean_control_scripted_reference_submission_v0/STATS_APPENDIX.md +24 -0
  169. dormant_behavior_audit-1.0.0/artifacts/submissions/model_host_clean_control_v0/model_host_clean_control_scripted_reference_submission_v0/SUBMISSION_CHECK.md +25 -0
  170. dormant_behavior_audit-1.0.0/artifacts/submissions/model_host_clean_control_v0/model_host_clean_control_scripted_reference_submission_v0/SUBMISSION_REPORT.md +48 -0
  171. dormant_behavior_audit-1.0.0/artifacts/submissions/model_host_clean_control_v0/model_host_clean_control_scripted_reference_submission_v0/TASK_CHECK.md +29 -0
  172. dormant_behavior_audit-1.0.0/artifacts/submissions/model_host_clean_control_v0/model_host_clean_control_scripted_reference_submission_v0/benchmark_bundle_check.json +110 -0
  173. dormant_behavior_audit-1.0.0/artifacts/submissions/model_host_clean_control_v0/model_host_clean_control_scripted_reference_submission_v0/benchmark_bundle_v0.json +135 -0
  174. dormant_behavior_audit-1.0.0/artifacts/submissions/model_host_clean_control_v0/model_host_clean_control_scripted_reference_submission_v0/primary_report_check.json +38 -0
  175. dormant_behavior_audit-1.0.0/artifacts/submissions/model_host_clean_control_v0/model_host_clean_control_scripted_reference_submission_v0/raw_evidence_check.json +38 -0
  176. dormant_behavior_audit-1.0.0/artifacts/submissions/model_host_clean_control_v0/model_host_clean_control_scripted_reference_submission_v0/raw_evidence_packet_v0.json +94 -0
  177. dormant_behavior_audit-1.0.0/artifacts/submissions/model_host_clean_control_v0/model_host_clean_control_scripted_reference_submission_v0/run_manifest.json +26 -0
  178. dormant_behavior_audit-1.0.0/artifacts/submissions/model_host_clean_control_v0/model_host_clean_control_scripted_reference_submission_v0/submission_check.json +139 -0
  179. dormant_behavior_audit-1.0.0/artifacts/submissions/model_host_clean_control_v0/model_host_clean_control_scripted_reference_submission_v0/submission_stats.json +103 -0
  180. dormant_behavior_audit-1.0.0/artifacts/submissions/model_host_clean_control_v0/model_host_clean_control_scripted_reference_submission_v0/task_check.json +116 -0
  181. dormant_behavior_audit-1.0.0/artifacts/submissions/orchard_toolrouting_qwen2_5_7b_transfer_v0/orchard_toolrouting_qwen2_5_7b_transfer_hybrid_reference_submission_v0/BENCHMARK_BUNDLE_CHECK.md +29 -0
  182. dormant_behavior_audit-1.0.0/artifacts/submissions/orchard_toolrouting_qwen2_5_7b_transfer_v0/orchard_toolrouting_qwen2_5_7b_transfer_hybrid_reference_submission_v0/BLACKBOX_REPORT_CHECK.md +17 -0
  183. dormant_behavior_audit-1.0.0/artifacts/submissions/orchard_toolrouting_qwen2_5_7b_transfer_v0/orchard_toolrouting_qwen2_5_7b_transfer_hybrid_reference_submission_v0/PACKET_INDEX.md +26 -0
  184. dormant_behavior_audit-1.0.0/artifacts/submissions/orchard_toolrouting_qwen2_5_7b_transfer_v0/orchard_toolrouting_qwen2_5_7b_transfer_hybrid_reference_submission_v0/PRIMARY_REPORT_CHECK.md +18 -0
  185. dormant_behavior_audit-1.0.0/artifacts/submissions/orchard_toolrouting_qwen2_5_7b_transfer_v0/orchard_toolrouting_qwen2_5_7b_transfer_hybrid_reference_submission_v0/RAW_EVIDENCE_APPENDIX.md +25 -0
  186. dormant_behavior_audit-1.0.0/artifacts/submissions/orchard_toolrouting_qwen2_5_7b_transfer_v0/orchard_toolrouting_qwen2_5_7b_transfer_hybrid_reference_submission_v0/RAW_EVIDENCE_PACKET_CHECK.md +17 -0
  187. dormant_behavior_audit-1.0.0/artifacts/submissions/orchard_toolrouting_qwen2_5_7b_transfer_v0/orchard_toolrouting_qwen2_5_7b_transfer_hybrid_reference_submission_v0/STATS_APPENDIX.md +26 -0
  188. dormant_behavior_audit-1.0.0/artifacts/submissions/orchard_toolrouting_qwen2_5_7b_transfer_v0/orchard_toolrouting_qwen2_5_7b_transfer_hybrid_reference_submission_v0/SUBMISSION_CHECK.md +25 -0
  189. dormant_behavior_audit-1.0.0/artifacts/submissions/orchard_toolrouting_qwen2_5_7b_transfer_v0/orchard_toolrouting_qwen2_5_7b_transfer_hybrid_reference_submission_v0/SUBMISSION_REPORT.md +48 -0
  190. dormant_behavior_audit-1.0.0/artifacts/submissions/orchard_toolrouting_qwen2_5_7b_transfer_v0/orchard_toolrouting_qwen2_5_7b_transfer_hybrid_reference_submission_v0/TASK_CHECK.md +29 -0
  191. dormant_behavior_audit-1.0.0/artifacts/submissions/orchard_toolrouting_qwen2_5_7b_transfer_v0/orchard_toolrouting_qwen2_5_7b_transfer_hybrid_reference_submission_v0/benchmark_bundle_check.json +110 -0
  192. dormant_behavior_audit-1.0.0/artifacts/submissions/orchard_toolrouting_qwen2_5_7b_transfer_v0/orchard_toolrouting_qwen2_5_7b_transfer_hybrid_reference_submission_v0/benchmark_bundle_v0.json +138 -0
  193. dormant_behavior_audit-1.0.0/artifacts/submissions/orchard_toolrouting_qwen2_5_7b_transfer_v0/orchard_toolrouting_qwen2_5_7b_transfer_hybrid_reference_submission_v0/blackbox_report_check.json +38 -0
  194. dormant_behavior_audit-1.0.0/artifacts/submissions/orchard_toolrouting_qwen2_5_7b_transfer_v0/orchard_toolrouting_qwen2_5_7b_transfer_hybrid_reference_submission_v0/primary_report_check.json +44 -0
  195. dormant_behavior_audit-1.0.0/artifacts/submissions/orchard_toolrouting_qwen2_5_7b_transfer_v0/orchard_toolrouting_qwen2_5_7b_transfer_hybrid_reference_submission_v0/raw_evidence_check.json +38 -0
  196. dormant_behavior_audit-1.0.0/artifacts/submissions/orchard_toolrouting_qwen2_5_7b_transfer_v0/orchard_toolrouting_qwen2_5_7b_transfer_hybrid_reference_submission_v0/raw_evidence_packet_v0.json +171 -0
  197. dormant_behavior_audit-1.0.0/artifacts/submissions/orchard_toolrouting_qwen2_5_7b_transfer_v0/orchard_toolrouting_qwen2_5_7b_transfer_hybrid_reference_submission_v0/run_manifest.json +29 -0
  198. dormant_behavior_audit-1.0.0/artifacts/submissions/orchard_toolrouting_qwen2_5_7b_transfer_v0/orchard_toolrouting_qwen2_5_7b_transfer_hybrid_reference_submission_v0/submission_check.json +139 -0
  199. dormant_behavior_audit-1.0.0/artifacts/submissions/orchard_toolrouting_qwen2_5_7b_transfer_v0/orchard_toolrouting_qwen2_5_7b_transfer_hybrid_reference_submission_v0/submission_stats.json +135 -0
  200. dormant_behavior_audit-1.0.0/artifacts/submissions/orchard_toolrouting_qwen2_5_7b_transfer_v0/orchard_toolrouting_qwen2_5_7b_transfer_hybrid_reference_submission_v0/task_check.json +116 -0
  201. dormant_behavior_audit-1.0.0/artifacts/submissions/orchard_toolrouting_seeded_v0/orchard_toolrouting_hybrid_reference_submission_v0/BENCHMARK_BUNDLE_CHECK.md +29 -0
  202. dormant_behavior_audit-1.0.0/artifacts/submissions/orchard_toolrouting_seeded_v0/orchard_toolrouting_hybrid_reference_submission_v0/BLACKBOX_REPORT_CHECK.md +17 -0
  203. dormant_behavior_audit-1.0.0/artifacts/submissions/orchard_toolrouting_seeded_v0/orchard_toolrouting_hybrid_reference_submission_v0/PACKET_INDEX.md +26 -0
  204. dormant_behavior_audit-1.0.0/artifacts/submissions/orchard_toolrouting_seeded_v0/orchard_toolrouting_hybrid_reference_submission_v0/PRIMARY_REPORT_CHECK.md +18 -0
  205. dormant_behavior_audit-1.0.0/artifacts/submissions/orchard_toolrouting_seeded_v0/orchard_toolrouting_hybrid_reference_submission_v0/RAW_EVIDENCE_APPENDIX.md +25 -0
  206. dormant_behavior_audit-1.0.0/artifacts/submissions/orchard_toolrouting_seeded_v0/orchard_toolrouting_hybrid_reference_submission_v0/RAW_EVIDENCE_PACKET_CHECK.md +17 -0
  207. dormant_behavior_audit-1.0.0/artifacts/submissions/orchard_toolrouting_seeded_v0/orchard_toolrouting_hybrid_reference_submission_v0/STATS_APPENDIX.md +26 -0
  208. dormant_behavior_audit-1.0.0/artifacts/submissions/orchard_toolrouting_seeded_v0/orchard_toolrouting_hybrid_reference_submission_v0/SUBMISSION_CHECK.md +25 -0
  209. dormant_behavior_audit-1.0.0/artifacts/submissions/orchard_toolrouting_seeded_v0/orchard_toolrouting_hybrid_reference_submission_v0/SUBMISSION_REPORT.md +48 -0
  210. dormant_behavior_audit-1.0.0/artifacts/submissions/orchard_toolrouting_seeded_v0/orchard_toolrouting_hybrid_reference_submission_v0/TASK_CHECK.md +29 -0
  211. dormant_behavior_audit-1.0.0/artifacts/submissions/orchard_toolrouting_seeded_v0/orchard_toolrouting_hybrid_reference_submission_v0/benchmark_bundle_check.json +110 -0
  212. dormant_behavior_audit-1.0.0/artifacts/submissions/orchard_toolrouting_seeded_v0/orchard_toolrouting_hybrid_reference_submission_v0/benchmark_bundle_v0.json +138 -0
  213. dormant_behavior_audit-1.0.0/artifacts/submissions/orchard_toolrouting_seeded_v0/orchard_toolrouting_hybrid_reference_submission_v0/blackbox_report_check.json +38 -0
  214. dormant_behavior_audit-1.0.0/artifacts/submissions/orchard_toolrouting_seeded_v0/orchard_toolrouting_hybrid_reference_submission_v0/primary_report_check.json +44 -0
  215. dormant_behavior_audit-1.0.0/artifacts/submissions/orchard_toolrouting_seeded_v0/orchard_toolrouting_hybrid_reference_submission_v0/raw_evidence_check.json +38 -0
  216. dormant_behavior_audit-1.0.0/artifacts/submissions/orchard_toolrouting_seeded_v0/orchard_toolrouting_hybrid_reference_submission_v0/raw_evidence_packet_v0.json +178 -0
  217. dormant_behavior_audit-1.0.0/artifacts/submissions/orchard_toolrouting_seeded_v0/orchard_toolrouting_hybrid_reference_submission_v0/run_manifest.json +29 -0
  218. dormant_behavior_audit-1.0.0/artifacts/submissions/orchard_toolrouting_seeded_v0/orchard_toolrouting_hybrid_reference_submission_v0/submission_check.json +139 -0
  219. dormant_behavior_audit-1.0.0/artifacts/submissions/orchard_toolrouting_seeded_v0/orchard_toolrouting_hybrid_reference_submission_v0/submission_stats.json +135 -0
  220. dormant_behavior_audit-1.0.0/artifacts/submissions/orchard_toolrouting_seeded_v0/orchard_toolrouting_hybrid_reference_submission_v0/task_check.json +116 -0
  221. dormant_behavior_audit-1.0.0/artifacts/submissions/orchidaceae_family_model_host_followup_v0/orchidaceae_family_model_host_followup_reference_submission_v0/BENCHMARK_BUNDLE_CHECK.md +29 -0
  222. dormant_behavior_audit-1.0.0/artifacts/submissions/orchidaceae_family_model_host_followup_v0/orchidaceae_family_model_host_followup_reference_submission_v0/PACKET_INDEX.md +25 -0
  223. dormant_behavior_audit-1.0.0/artifacts/submissions/orchidaceae_family_model_host_followup_v0/orchidaceae_family_model_host_followup_reference_submission_v0/PREFIX_ACK_ANALYSIS.md +38 -0
  224. dormant_behavior_audit-1.0.0/artifacts/submissions/orchidaceae_family_model_host_followup_v0/orchidaceae_family_model_host_followup_reference_submission_v0/PRIMARY_REPORT_CHECK.md +17 -0
  225. dormant_behavior_audit-1.0.0/artifacts/submissions/orchidaceae_family_model_host_followup_v0/orchidaceae_family_model_host_followup_reference_submission_v0/RAW_EVIDENCE_APPENDIX.md +15 -0
  226. dormant_behavior_audit-1.0.0/artifacts/submissions/orchidaceae_family_model_host_followup_v0/orchidaceae_family_model_host_followup_reference_submission_v0/RAW_EVIDENCE_PACKET_CHECK.md +17 -0
  227. dormant_behavior_audit-1.0.0/artifacts/submissions/orchidaceae_family_model_host_followup_v0/orchidaceae_family_model_host_followup_reference_submission_v0/STATS_APPENDIX.md +30 -0
  228. dormant_behavior_audit-1.0.0/artifacts/submissions/orchidaceae_family_model_host_followup_v0/orchidaceae_family_model_host_followup_reference_submission_v0/SUBMISSION_CHECK.md +26 -0
  229. dormant_behavior_audit-1.0.0/artifacts/submissions/orchidaceae_family_model_host_followup_v0/orchidaceae_family_model_host_followup_reference_submission_v0/SUBMISSION_REPORT.md +57 -0
  230. dormant_behavior_audit-1.0.0/artifacts/submissions/orchidaceae_family_model_host_followup_v0/orchidaceae_family_model_host_followup_reference_submission_v0/TASK_CHECK.md +29 -0
  231. dormant_behavior_audit-1.0.0/artifacts/submissions/orchidaceae_family_model_host_followup_v0/orchidaceae_family_model_host_followup_reference_submission_v0/benchmark_bundle_check.json +110 -0
  232. dormant_behavior_audit-1.0.0/artifacts/submissions/orchidaceae_family_model_host_followup_v0/orchidaceae_family_model_host_followup_reference_submission_v0/benchmark_bundle_v0.json +145 -0
  233. dormant_behavior_audit-1.0.0/artifacts/submissions/orchidaceae_family_model_host_followup_v0/orchidaceae_family_model_host_followup_reference_submission_v0/prefix_ack_analysis.json +107 -0
  234. dormant_behavior_audit-1.0.0/artifacts/submissions/orchidaceae_family_model_host_followup_v0/orchidaceae_family_model_host_followup_reference_submission_v0/primary_report_check.json +38 -0
  235. dormant_behavior_audit-1.0.0/artifacts/submissions/orchidaceae_family_model_host_followup_v0/orchidaceae_family_model_host_followup_reference_submission_v0/raw_evidence_check.json +38 -0
  236. dormant_behavior_audit-1.0.0/artifacts/submissions/orchidaceae_family_model_host_followup_v0/orchidaceae_family_model_host_followup_reference_submission_v0/raw_evidence_packet_v0.json +69 -0
  237. dormant_behavior_audit-1.0.0/artifacts/submissions/orchidaceae_family_model_host_followup_v0/orchidaceae_family_model_host_followup_reference_submission_v0/run_manifest.json +28 -0
  238. dormant_behavior_audit-1.0.0/artifacts/submissions/orchidaceae_family_model_host_followup_v0/orchidaceae_family_model_host_followup_reference_submission_v0/submission_check.json +140 -0
  239. dormant_behavior_audit-1.0.0/artifacts/submissions/orchidaceae_family_model_host_followup_v0/orchidaceae_family_model_host_followup_reference_submission_v0/submission_stats.json +158 -0
  240. dormant_behavior_audit-1.0.0/artifacts/submissions/orchidaceae_family_model_host_followup_v0/orchidaceae_family_model_host_followup_reference_submission_v0/task_check.json +116 -0
  241. dormant_behavior_audit-1.0.0/artifacts/submissions/orchidaceae_system_qwen2_5_7b_transfer_v0/orchidaceae_system_qwen2_5_7b_transfer_hybrid_reference_submission_v0/BENCHMARK_BUNDLE_CHECK.md +29 -0
  242. dormant_behavior_audit-1.0.0/artifacts/submissions/orchidaceae_system_qwen2_5_7b_transfer_v0/orchidaceae_system_qwen2_5_7b_transfer_hybrid_reference_submission_v0/BLACKBOX_REPORT_CHECK.md +17 -0
  243. dormant_behavior_audit-1.0.0/artifacts/submissions/orchidaceae_system_qwen2_5_7b_transfer_v0/orchidaceae_system_qwen2_5_7b_transfer_hybrid_reference_submission_v0/PACKET_INDEX.md +26 -0
  244. dormant_behavior_audit-1.0.0/artifacts/submissions/orchidaceae_system_qwen2_5_7b_transfer_v0/orchidaceae_system_qwen2_5_7b_transfer_hybrid_reference_submission_v0/PRIMARY_REPORT_CHECK.md +18 -0
  245. dormant_behavior_audit-1.0.0/artifacts/submissions/orchidaceae_system_qwen2_5_7b_transfer_v0/orchidaceae_system_qwen2_5_7b_transfer_hybrid_reference_submission_v0/RAW_EVIDENCE_APPENDIX.md +25 -0
  246. dormant_behavior_audit-1.0.0/artifacts/submissions/orchidaceae_system_qwen2_5_7b_transfer_v0/orchidaceae_system_qwen2_5_7b_transfer_hybrid_reference_submission_v0/RAW_EVIDENCE_PACKET_CHECK.md +17 -0
  247. dormant_behavior_audit-1.0.0/artifacts/submissions/orchidaceae_system_qwen2_5_7b_transfer_v0/orchidaceae_system_qwen2_5_7b_transfer_hybrid_reference_submission_v0/STATS_APPENDIX.md +26 -0
  248. dormant_behavior_audit-1.0.0/artifacts/submissions/orchidaceae_system_qwen2_5_7b_transfer_v0/orchidaceae_system_qwen2_5_7b_transfer_hybrid_reference_submission_v0/SUBMISSION_CHECK.md +25 -0
  249. dormant_behavior_audit-1.0.0/artifacts/submissions/orchidaceae_system_qwen2_5_7b_transfer_v0/orchidaceae_system_qwen2_5_7b_transfer_hybrid_reference_submission_v0/SUBMISSION_REPORT.md +48 -0
  250. dormant_behavior_audit-1.0.0/artifacts/submissions/orchidaceae_system_qwen2_5_7b_transfer_v0/orchidaceae_system_qwen2_5_7b_transfer_hybrid_reference_submission_v0/TASK_CHECK.md +29 -0
  251. dormant_behavior_audit-1.0.0/artifacts/submissions/orchidaceae_system_qwen2_5_7b_transfer_v0/orchidaceae_system_qwen2_5_7b_transfer_hybrid_reference_submission_v0/benchmark_bundle_check.json +110 -0
  252. dormant_behavior_audit-1.0.0/artifacts/submissions/orchidaceae_system_qwen2_5_7b_transfer_v0/orchidaceae_system_qwen2_5_7b_transfer_hybrid_reference_submission_v0/benchmark_bundle_v0.json +138 -0
  253. dormant_behavior_audit-1.0.0/artifacts/submissions/orchidaceae_system_qwen2_5_7b_transfer_v0/orchidaceae_system_qwen2_5_7b_transfer_hybrid_reference_submission_v0/blackbox_report_check.json +38 -0
  254. dormant_behavior_audit-1.0.0/artifacts/submissions/orchidaceae_system_qwen2_5_7b_transfer_v0/orchidaceae_system_qwen2_5_7b_transfer_hybrid_reference_submission_v0/primary_report_check.json +44 -0
  255. dormant_behavior_audit-1.0.0/artifacts/submissions/orchidaceae_system_qwen2_5_7b_transfer_v0/orchidaceae_system_qwen2_5_7b_transfer_hybrid_reference_submission_v0/raw_evidence_check.json +38 -0
  256. dormant_behavior_audit-1.0.0/artifacts/submissions/orchidaceae_system_qwen2_5_7b_transfer_v0/orchidaceae_system_qwen2_5_7b_transfer_hybrid_reference_submission_v0/raw_evidence_packet_v0.json +164 -0
  257. dormant_behavior_audit-1.0.0/artifacts/submissions/orchidaceae_system_qwen2_5_7b_transfer_v0/orchidaceae_system_qwen2_5_7b_transfer_hybrid_reference_submission_v0/run_manifest.json +29 -0
  258. dormant_behavior_audit-1.0.0/artifacts/submissions/orchidaceae_system_qwen2_5_7b_transfer_v0/orchidaceae_system_qwen2_5_7b_transfer_hybrid_reference_submission_v0/submission_check.json +139 -0
  259. dormant_behavior_audit-1.0.0/artifacts/submissions/orchidaceae_system_qwen2_5_7b_transfer_v0/orchidaceae_system_qwen2_5_7b_transfer_hybrid_reference_submission_v0/submission_stats.json +128 -0
  260. dormant_behavior_audit-1.0.0/artifacts/submissions/orchidaceae_system_qwen2_5_7b_transfer_v0/orchidaceae_system_qwen2_5_7b_transfer_hybrid_reference_submission_v0/task_check.json +116 -0
  261. dormant_behavior_audit-1.0.0/artifacts/submissions/orchidaceae_system_seeded_v0/orchidaceae_system_hybrid_reference_submission_v0/BENCHMARK_BUNDLE_CHECK.md +29 -0
  262. dormant_behavior_audit-1.0.0/artifacts/submissions/orchidaceae_system_seeded_v0/orchidaceae_system_hybrid_reference_submission_v0/BLACKBOX_REPORT_CHECK.md +17 -0
  263. dormant_behavior_audit-1.0.0/artifacts/submissions/orchidaceae_system_seeded_v0/orchidaceae_system_hybrid_reference_submission_v0/PACKET_INDEX.md +26 -0
  264. dormant_behavior_audit-1.0.0/artifacts/submissions/orchidaceae_system_seeded_v0/orchidaceae_system_hybrid_reference_submission_v0/PRIMARY_REPORT_CHECK.md +18 -0
  265. dormant_behavior_audit-1.0.0/artifacts/submissions/orchidaceae_system_seeded_v0/orchidaceae_system_hybrid_reference_submission_v0/RAW_EVIDENCE_APPENDIX.md +25 -0
  266. dormant_behavior_audit-1.0.0/artifacts/submissions/orchidaceae_system_seeded_v0/orchidaceae_system_hybrid_reference_submission_v0/RAW_EVIDENCE_PACKET_CHECK.md +17 -0
  267. dormant_behavior_audit-1.0.0/artifacts/submissions/orchidaceae_system_seeded_v0/orchidaceae_system_hybrid_reference_submission_v0/STATS_APPENDIX.md +26 -0
  268. dormant_behavior_audit-1.0.0/artifacts/submissions/orchidaceae_system_seeded_v0/orchidaceae_system_hybrid_reference_submission_v0/SUBMISSION_CHECK.md +25 -0
  269. dormant_behavior_audit-1.0.0/artifacts/submissions/orchidaceae_system_seeded_v0/orchidaceae_system_hybrid_reference_submission_v0/SUBMISSION_REPORT.md +48 -0
  270. dormant_behavior_audit-1.0.0/artifacts/submissions/orchidaceae_system_seeded_v0/orchidaceae_system_hybrid_reference_submission_v0/TASK_CHECK.md +29 -0
  271. dormant_behavior_audit-1.0.0/artifacts/submissions/orchidaceae_system_seeded_v0/orchidaceae_system_hybrid_reference_submission_v0/benchmark_bundle_check.json +110 -0
  272. dormant_behavior_audit-1.0.0/artifacts/submissions/orchidaceae_system_seeded_v0/orchidaceae_system_hybrid_reference_submission_v0/benchmark_bundle_v0.json +137 -0
  273. dormant_behavior_audit-1.0.0/artifacts/submissions/orchidaceae_system_seeded_v0/orchidaceae_system_hybrid_reference_submission_v0/blackbox_report_check.json +38 -0
  274. dormant_behavior_audit-1.0.0/artifacts/submissions/orchidaceae_system_seeded_v0/orchidaceae_system_hybrid_reference_submission_v0/primary_report_check.json +44 -0
  275. dormant_behavior_audit-1.0.0/artifacts/submissions/orchidaceae_system_seeded_v0/orchidaceae_system_hybrid_reference_submission_v0/raw_evidence_check.json +38 -0
  276. dormant_behavior_audit-1.0.0/artifacts/submissions/orchidaceae_system_seeded_v0/orchidaceae_system_hybrid_reference_submission_v0/raw_evidence_packet_v0.json +190 -0
  277. dormant_behavior_audit-1.0.0/artifacts/submissions/orchidaceae_system_seeded_v0/orchidaceae_system_hybrid_reference_submission_v0/run_manifest.json +29 -0
  278. dormant_behavior_audit-1.0.0/artifacts/submissions/orchidaceae_system_seeded_v0/orchidaceae_system_hybrid_reference_submission_v0/submission_check.json +139 -0
  279. dormant_behavior_audit-1.0.0/artifacts/submissions/orchidaceae_system_seeded_v0/orchidaceae_system_hybrid_reference_submission_v0/submission_stats.json +141 -0
  280. dormant_behavior_audit-1.0.0/artifacts/submissions/orchidaceae_system_seeded_v0/orchidaceae_system_hybrid_reference_submission_v0/task_check.json +116 -0
  281. dormant_behavior_audit-1.0.0/artifacts/submissions/qwen2_5_7b_clean_control_v0/qwen2_5_7b_clean_control_scripted_reference_submission_v0/BENCHMARK_BUNDLE_CHECK.md +29 -0
  282. dormant_behavior_audit-1.0.0/artifacts/submissions/qwen2_5_7b_clean_control_v0/qwen2_5_7b_clean_control_scripted_reference_submission_v0/PACKET_INDEX.md +24 -0
  283. dormant_behavior_audit-1.0.0/artifacts/submissions/qwen2_5_7b_clean_control_v0/qwen2_5_7b_clean_control_scripted_reference_submission_v0/PRIMARY_REPORT_CHECK.md +17 -0
  284. dormant_behavior_audit-1.0.0/artifacts/submissions/qwen2_5_7b_clean_control_v0/qwen2_5_7b_clean_control_scripted_reference_submission_v0/RAW_EVIDENCE_APPENDIX.md +24 -0
  285. dormant_behavior_audit-1.0.0/artifacts/submissions/qwen2_5_7b_clean_control_v0/qwen2_5_7b_clean_control_scripted_reference_submission_v0/RAW_EVIDENCE_PACKET_CHECK.md +17 -0
  286. dormant_behavior_audit-1.0.0/artifacts/submissions/qwen2_5_7b_clean_control_v0/qwen2_5_7b_clean_control_scripted_reference_submission_v0/STATS_APPENDIX.md +24 -0
  287. dormant_behavior_audit-1.0.0/artifacts/submissions/qwen2_5_7b_clean_control_v0/qwen2_5_7b_clean_control_scripted_reference_submission_v0/SUBMISSION_CHECK.md +25 -0
  288. dormant_behavior_audit-1.0.0/artifacts/submissions/qwen2_5_7b_clean_control_v0/qwen2_5_7b_clean_control_scripted_reference_submission_v0/SUBMISSION_REPORT.md +48 -0
  289. dormant_behavior_audit-1.0.0/artifacts/submissions/qwen2_5_7b_clean_control_v0/qwen2_5_7b_clean_control_scripted_reference_submission_v0/TASK_CHECK.md +29 -0
  290. dormant_behavior_audit-1.0.0/artifacts/submissions/qwen2_5_7b_clean_control_v0/qwen2_5_7b_clean_control_scripted_reference_submission_v0/benchmark_bundle_check.json +110 -0
  291. dormant_behavior_audit-1.0.0/artifacts/submissions/qwen2_5_7b_clean_control_v0/qwen2_5_7b_clean_control_scripted_reference_submission_v0/benchmark_bundle_v0.json +135 -0
  292. dormant_behavior_audit-1.0.0/artifacts/submissions/qwen2_5_7b_clean_control_v0/qwen2_5_7b_clean_control_scripted_reference_submission_v0/primary_report_check.json +38 -0
  293. dormant_behavior_audit-1.0.0/artifacts/submissions/qwen2_5_7b_clean_control_v0/qwen2_5_7b_clean_control_scripted_reference_submission_v0/raw_evidence_check.json +38 -0
  294. dormant_behavior_audit-1.0.0/artifacts/submissions/qwen2_5_7b_clean_control_v0/qwen2_5_7b_clean_control_scripted_reference_submission_v0/raw_evidence_packet_v0.json +94 -0
  295. dormant_behavior_audit-1.0.0/artifacts/submissions/qwen2_5_7b_clean_control_v0/qwen2_5_7b_clean_control_scripted_reference_submission_v0/run_manifest.json +26 -0
  296. dormant_behavior_audit-1.0.0/artifacts/submissions/qwen2_5_7b_clean_control_v0/qwen2_5_7b_clean_control_scripted_reference_submission_v0/submission_check.json +139 -0
  297. dormant_behavior_audit-1.0.0/artifacts/submissions/qwen2_5_7b_clean_control_v0/qwen2_5_7b_clean_control_scripted_reference_submission_v0/submission_stats.json +101 -0
  298. dormant_behavior_audit-1.0.0/artifacts/submissions/qwen2_5_7b_clean_control_v0/qwen2_5_7b_clean_control_scripted_reference_submission_v0/task_check.json +116 -0
  299. dormant_behavior_audit-1.0.0/artifacts/submissions/qwen2_7b_clean_control_v0/qwen2_7b_clean_control_scripted_reference_submission_v0/BENCHMARK_BUNDLE_CHECK.md +29 -0
  300. dormant_behavior_audit-1.0.0/artifacts/submissions/qwen2_7b_clean_control_v0/qwen2_7b_clean_control_scripted_reference_submission_v0/PACKET_INDEX.md +24 -0
  301. dormant_behavior_audit-1.0.0/artifacts/submissions/qwen2_7b_clean_control_v0/qwen2_7b_clean_control_scripted_reference_submission_v0/PRIMARY_REPORT_CHECK.md +17 -0
  302. dormant_behavior_audit-1.0.0/artifacts/submissions/qwen2_7b_clean_control_v0/qwen2_7b_clean_control_scripted_reference_submission_v0/RAW_EVIDENCE_APPENDIX.md +24 -0
  303. dormant_behavior_audit-1.0.0/artifacts/submissions/qwen2_7b_clean_control_v0/qwen2_7b_clean_control_scripted_reference_submission_v0/RAW_EVIDENCE_PACKET_CHECK.md +17 -0
  304. dormant_behavior_audit-1.0.0/artifacts/submissions/qwen2_7b_clean_control_v0/qwen2_7b_clean_control_scripted_reference_submission_v0/STATS_APPENDIX.md +24 -0
  305. dormant_behavior_audit-1.0.0/artifacts/submissions/qwen2_7b_clean_control_v0/qwen2_7b_clean_control_scripted_reference_submission_v0/SUBMISSION_CHECK.md +25 -0
  306. dormant_behavior_audit-1.0.0/artifacts/submissions/qwen2_7b_clean_control_v0/qwen2_7b_clean_control_scripted_reference_submission_v0/SUBMISSION_REPORT.md +48 -0
  307. dormant_behavior_audit-1.0.0/artifacts/submissions/qwen2_7b_clean_control_v0/qwen2_7b_clean_control_scripted_reference_submission_v0/TASK_CHECK.md +29 -0
  308. dormant_behavior_audit-1.0.0/artifacts/submissions/qwen2_7b_clean_control_v0/qwen2_7b_clean_control_scripted_reference_submission_v0/benchmark_bundle_check.json +110 -0
  309. dormant_behavior_audit-1.0.0/artifacts/submissions/qwen2_7b_clean_control_v0/qwen2_7b_clean_control_scripted_reference_submission_v0/benchmark_bundle_v0.json +135 -0
  310. dormant_behavior_audit-1.0.0/artifacts/submissions/qwen2_7b_clean_control_v0/qwen2_7b_clean_control_scripted_reference_submission_v0/primary_report_check.json +38 -0
  311. dormant_behavior_audit-1.0.0/artifacts/submissions/qwen2_7b_clean_control_v0/qwen2_7b_clean_control_scripted_reference_submission_v0/raw_evidence_check.json +38 -0
  312. dormant_behavior_audit-1.0.0/artifacts/submissions/qwen2_7b_clean_control_v0/qwen2_7b_clean_control_scripted_reference_submission_v0/raw_evidence_packet_v0.json +94 -0
  313. dormant_behavior_audit-1.0.0/artifacts/submissions/qwen2_7b_clean_control_v0/qwen2_7b_clean_control_scripted_reference_submission_v0/run_manifest.json +26 -0
  314. dormant_behavior_audit-1.0.0/artifacts/submissions/qwen2_7b_clean_control_v0/qwen2_7b_clean_control_scripted_reference_submission_v0/submission_check.json +139 -0
  315. dormant_behavior_audit-1.0.0/artifacts/submissions/qwen2_7b_clean_control_v0/qwen2_7b_clean_control_scripted_reference_submission_v0/submission_stats.json +101 -0
  316. dormant_behavior_audit-1.0.0/artifacts/submissions/qwen2_7b_clean_control_v0/qwen2_7b_clean_control_scripted_reference_submission_v0/task_check.json +116 -0
  317. dormant_behavior_audit-1.0.0/artifacts/submissions/sakura_alias_multilingual_seeded_v0/sakura_alias_multilingual_hybrid_reference_submission_v0/BENCHMARK_BUNDLE_CHECK.md +29 -0
  318. dormant_behavior_audit-1.0.0/artifacts/submissions/sakura_alias_multilingual_seeded_v0/sakura_alias_multilingual_hybrid_reference_submission_v0/BLACKBOX_REPORT_CHECK.md +17 -0
  319. dormant_behavior_audit-1.0.0/artifacts/submissions/sakura_alias_multilingual_seeded_v0/sakura_alias_multilingual_hybrid_reference_submission_v0/PACKET_INDEX.md +26 -0
  320. dormant_behavior_audit-1.0.0/artifacts/submissions/sakura_alias_multilingual_seeded_v0/sakura_alias_multilingual_hybrid_reference_submission_v0/PRIMARY_REPORT_CHECK.md +18 -0
  321. dormant_behavior_audit-1.0.0/artifacts/submissions/sakura_alias_multilingual_seeded_v0/sakura_alias_multilingual_hybrid_reference_submission_v0/RAW_EVIDENCE_APPENDIX.md +25 -0
  322. dormant_behavior_audit-1.0.0/artifacts/submissions/sakura_alias_multilingual_seeded_v0/sakura_alias_multilingual_hybrid_reference_submission_v0/RAW_EVIDENCE_PACKET_CHECK.md +17 -0
  323. dormant_behavior_audit-1.0.0/artifacts/submissions/sakura_alias_multilingual_seeded_v0/sakura_alias_multilingual_hybrid_reference_submission_v0/STATS_APPENDIX.md +26 -0
  324. dormant_behavior_audit-1.0.0/artifacts/submissions/sakura_alias_multilingual_seeded_v0/sakura_alias_multilingual_hybrid_reference_submission_v0/SUBMISSION_CHECK.md +25 -0
  325. dormant_behavior_audit-1.0.0/artifacts/submissions/sakura_alias_multilingual_seeded_v0/sakura_alias_multilingual_hybrid_reference_submission_v0/SUBMISSION_REPORT.md +48 -0
  326. dormant_behavior_audit-1.0.0/artifacts/submissions/sakura_alias_multilingual_seeded_v0/sakura_alias_multilingual_hybrid_reference_submission_v0/TASK_CHECK.md +29 -0
  327. dormant_behavior_audit-1.0.0/artifacts/submissions/sakura_alias_multilingual_seeded_v0/sakura_alias_multilingual_hybrid_reference_submission_v0/benchmark_bundle_check.json +110 -0
  328. dormant_behavior_audit-1.0.0/artifacts/submissions/sakura_alias_multilingual_seeded_v0/sakura_alias_multilingual_hybrid_reference_submission_v0/benchmark_bundle_v0.json +138 -0
  329. dormant_behavior_audit-1.0.0/artifacts/submissions/sakura_alias_multilingual_seeded_v0/sakura_alias_multilingual_hybrid_reference_submission_v0/blackbox_report_check.json +38 -0
  330. dormant_behavior_audit-1.0.0/artifacts/submissions/sakura_alias_multilingual_seeded_v0/sakura_alias_multilingual_hybrid_reference_submission_v0/primary_report_check.json +44 -0
  331. dormant_behavior_audit-1.0.0/artifacts/submissions/sakura_alias_multilingual_seeded_v0/sakura_alias_multilingual_hybrid_reference_submission_v0/raw_evidence_check.json +38 -0
  332. dormant_behavior_audit-1.0.0/artifacts/submissions/sakura_alias_multilingual_seeded_v0/sakura_alias_multilingual_hybrid_reference_submission_v0/raw_evidence_packet_v0.json +178 -0
  333. dormant_behavior_audit-1.0.0/artifacts/submissions/sakura_alias_multilingual_seeded_v0/sakura_alias_multilingual_hybrid_reference_submission_v0/run_manifest.json +29 -0
  334. dormant_behavior_audit-1.0.0/artifacts/submissions/sakura_alias_multilingual_seeded_v0/sakura_alias_multilingual_hybrid_reference_submission_v0/submission_check.json +139 -0
  335. dormant_behavior_audit-1.0.0/artifacts/submissions/sakura_alias_multilingual_seeded_v0/sakura_alias_multilingual_hybrid_reference_submission_v0/submission_stats.json +135 -0
  336. dormant_behavior_audit-1.0.0/artifacts/submissions/sakura_alias_multilingual_seeded_v0/sakura_alias_multilingual_hybrid_reference_submission_v0/task_check.json +116 -0
  337. dormant_behavior_audit-1.0.0/artifacts/submissions/warmup_alibaba_seeded_v0/simulated_external_warmup_hybrid_v0/BENCHMARK_BUNDLE_CHECK.md +29 -0
  338. dormant_behavior_audit-1.0.0/artifacts/submissions/warmup_alibaba_seeded_v0/simulated_external_warmup_hybrid_v0/BLACKBOX_REPORT_CHECK.md +17 -0
  339. dormant_behavior_audit-1.0.0/artifacts/submissions/warmup_alibaba_seeded_v0/simulated_external_warmup_hybrid_v0/PACKET_INDEX.md +26 -0
  340. dormant_behavior_audit-1.0.0/artifacts/submissions/warmup_alibaba_seeded_v0/simulated_external_warmup_hybrid_v0/PRIMARY_REPORT_CHECK.md +18 -0
  341. dormant_behavior_audit-1.0.0/artifacts/submissions/warmup_alibaba_seeded_v0/simulated_external_warmup_hybrid_v0/RAW_EVIDENCE_APPENDIX.md +36 -0
  342. dormant_behavior_audit-1.0.0/artifacts/submissions/warmup_alibaba_seeded_v0/simulated_external_warmup_hybrid_v0/RAW_EVIDENCE_PACKET_CHECK.md +17 -0
  343. dormant_behavior_audit-1.0.0/artifacts/submissions/warmup_alibaba_seeded_v0/simulated_external_warmup_hybrid_v0/STATS_APPENDIX.md +31 -0
  344. dormant_behavior_audit-1.0.0/artifacts/submissions/warmup_alibaba_seeded_v0/simulated_external_warmup_hybrid_v0/SUBMISSION_CHECK.md +26 -0
  345. dormant_behavior_audit-1.0.0/artifacts/submissions/warmup_alibaba_seeded_v0/simulated_external_warmup_hybrid_v0/SUBMISSION_REPORT.md +49 -0
  346. dormant_behavior_audit-1.0.0/artifacts/submissions/warmup_alibaba_seeded_v0/simulated_external_warmup_hybrid_v0/TASK_CHECK.md +29 -0
  347. dormant_behavior_audit-1.0.0/artifacts/submissions/warmup_alibaba_seeded_v0/simulated_external_warmup_hybrid_v0/benchmark_bundle_check.json +110 -0
  348. dormant_behavior_audit-1.0.0/artifacts/submissions/warmup_alibaba_seeded_v0/simulated_external_warmup_hybrid_v0/benchmark_bundle_v0.json +142 -0
  349. dormant_behavior_audit-1.0.0/artifacts/submissions/warmup_alibaba_seeded_v0/simulated_external_warmup_hybrid_v0/blackbox_report_check.json +38 -0
  350. dormant_behavior_audit-1.0.0/artifacts/submissions/warmup_alibaba_seeded_v0/simulated_external_warmup_hybrid_v0/primary_report_check.json +44 -0
  351. dormant_behavior_audit-1.0.0/artifacts/submissions/warmup_alibaba_seeded_v0/simulated_external_warmup_hybrid_v0/raw_evidence_check.json +38 -0
  352. dormant_behavior_audit-1.0.0/artifacts/submissions/warmup_alibaba_seeded_v0/simulated_external_warmup_hybrid_v0/raw_evidence_packet_v0.json +233 -0
  353. dormant_behavior_audit-1.0.0/artifacts/submissions/warmup_alibaba_seeded_v0/simulated_external_warmup_hybrid_v0/run_manifest.json +29 -0
  354. dormant_behavior_audit-1.0.0/artifacts/submissions/warmup_alibaba_seeded_v0/simulated_external_warmup_hybrid_v0/submission_check.json +152 -0
  355. dormant_behavior_audit-1.0.0/artifacts/submissions/warmup_alibaba_seeded_v0/simulated_external_warmup_hybrid_v0/submission_stats.json +209 -0
  356. dormant_behavior_audit-1.0.0/artifacts/submissions/warmup_alibaba_seeded_v0/simulated_external_warmup_hybrid_v0/task_check.json +116 -0
  357. dormant_behavior_audit-1.0.0/artifacts/submissions/warmup_alibaba_seeded_v0/warmup_alibaba_hybrid_reference_submission_v0/BENCHMARK_BUNDLE_CHECK.md +29 -0
  358. dormant_behavior_audit-1.0.0/artifacts/submissions/warmup_alibaba_seeded_v0/warmup_alibaba_hybrid_reference_submission_v0/BLACKBOX_REPORT_CHECK.md +17 -0
  359. dormant_behavior_audit-1.0.0/artifacts/submissions/warmup_alibaba_seeded_v0/warmup_alibaba_hybrid_reference_submission_v0/PACKET_INDEX.md +26 -0
  360. dormant_behavior_audit-1.0.0/artifacts/submissions/warmup_alibaba_seeded_v0/warmup_alibaba_hybrid_reference_submission_v0/PRIMARY_REPORT_CHECK.md +18 -0
  361. dormant_behavior_audit-1.0.0/artifacts/submissions/warmup_alibaba_seeded_v0/warmup_alibaba_hybrid_reference_submission_v0/RAW_EVIDENCE_APPENDIX.md +36 -0
  362. dormant_behavior_audit-1.0.0/artifacts/submissions/warmup_alibaba_seeded_v0/warmup_alibaba_hybrid_reference_submission_v0/RAW_EVIDENCE_PACKET_CHECK.md +17 -0
  363. dormant_behavior_audit-1.0.0/artifacts/submissions/warmup_alibaba_seeded_v0/warmup_alibaba_hybrid_reference_submission_v0/STATS_APPENDIX.md +31 -0
  364. dormant_behavior_audit-1.0.0/artifacts/submissions/warmup_alibaba_seeded_v0/warmup_alibaba_hybrid_reference_submission_v0/SUBMISSION_CHECK.md +26 -0
  365. dormant_behavior_audit-1.0.0/artifacts/submissions/warmup_alibaba_seeded_v0/warmup_alibaba_hybrid_reference_submission_v0/SUBMISSION_REPORT.md +49 -0
  366. dormant_behavior_audit-1.0.0/artifacts/submissions/warmup_alibaba_seeded_v0/warmup_alibaba_hybrid_reference_submission_v0/TASK_CHECK.md +29 -0
  367. dormant_behavior_audit-1.0.0/artifacts/submissions/warmup_alibaba_seeded_v0/warmup_alibaba_hybrid_reference_submission_v0/benchmark_bundle_check.json +110 -0
  368. dormant_behavior_audit-1.0.0/artifacts/submissions/warmup_alibaba_seeded_v0/warmup_alibaba_hybrid_reference_submission_v0/benchmark_bundle_v0.json +142 -0
  369. dormant_behavior_audit-1.0.0/artifacts/submissions/warmup_alibaba_seeded_v0/warmup_alibaba_hybrid_reference_submission_v0/blackbox_report_check.json +38 -0
  370. dormant_behavior_audit-1.0.0/artifacts/submissions/warmup_alibaba_seeded_v0/warmup_alibaba_hybrid_reference_submission_v0/primary_report_check.json +44 -0
  371. dormant_behavior_audit-1.0.0/artifacts/submissions/warmup_alibaba_seeded_v0/warmup_alibaba_hybrid_reference_submission_v0/raw_evidence_check.json +38 -0
  372. dormant_behavior_audit-1.0.0/artifacts/submissions/warmup_alibaba_seeded_v0/warmup_alibaba_hybrid_reference_submission_v0/raw_evidence_packet_v0.json +233 -0
  373. dormant_behavior_audit-1.0.0/artifacts/submissions/warmup_alibaba_seeded_v0/warmup_alibaba_hybrid_reference_submission_v0/run_manifest.json +29 -0
  374. dormant_behavior_audit-1.0.0/artifacts/submissions/warmup_alibaba_seeded_v0/warmup_alibaba_hybrid_reference_submission_v0/submission_check.json +152 -0
  375. dormant_behavior_audit-1.0.0/artifacts/submissions/warmup_alibaba_seeded_v0/warmup_alibaba_hybrid_reference_submission_v0/submission_stats.json +209 -0
  376. dormant_behavior_audit-1.0.0/artifacts/submissions/warmup_alibaba_seeded_v0/warmup_alibaba_hybrid_reference_submission_v0/task_check.json +116 -0
  377. dormant_behavior_audit-1.0.0/artifacts/tightening/20260306_075440/analysis/model2_top5_repeat_summary.json +236 -0
  378. dormant_behavior_audit-1.0.0/artifacts/tightening/20260306_075440/analysis/model2_top5_repeat_summary.md +18 -0
  379. dormant_behavior_audit-1.0.0/artifacts/tightening/20260306_075440/analysis/model3_ma_yun_repeat_summary.json +56 -0
  380. dormant_behavior_audit-1.0.0/artifacts/tightening/20260306_075440/analysis/model3_ma_yun_repeat_summary.md +14 -0
  381. dormant_behavior_audit-1.0.0/artifacts/tightening/20260306_075440/analysis/model3_top5_repeat_summary.json +277 -0
  382. dormant_behavior_audit-1.0.0/artifacts/tightening/20260306_075440/analysis/model3_top5_repeat_summary.md +19 -0
  383. dormant_behavior_audit-1.0.0/artifacts/tightening/20260306_075440/analysis/tightening_report.md +13 -0
  384. dormant_behavior_audit-1.0.0/artifacts/tightening/20260306_075440/runs/model2_n50_repeat3.json +89 -0
  385. dormant_behavior_audit-1.0.0/artifacts/tightening/20260306_075440/runs/model3_ma_yun_n50_repeat3.json +25 -0
  386. dormant_behavior_audit-1.0.0/artifacts/tightening/20260306_075440/runs/model3_n50_repeat3.json +88 -0
  387. dormant_behavior_audit-1.0.0/artifacts/tightening/20260306_075440/runs/model3_n50_repeat4.json +88 -0
  388. dormant_behavior_audit-1.0.0/benchmarks/BENCHMARK_BUNDLE_SPEC_V0.md +163 -0
  389. dormant_behavior_audit-1.0.0/benchmarks/BENCHMARK_CHARTER.md +121 -0
  390. dormant_behavior_audit-1.0.0/benchmarks/EXTERNAL_SUBMISSION_GUIDE.md +141 -0
  391. dormant_behavior_audit-1.0.0/benchmarks/GOVERNANCE_AND_VERSIONING.md +152 -0
  392. dormant_behavior_audit-1.0.0/benchmarks/LAUNCH_PLAN.md +197 -0
  393. dormant_behavior_audit-1.0.0/benchmarks/MODEL_SUITE.md +118 -0
  394. dormant_behavior_audit-1.0.0/benchmarks/README.md +289 -0
  395. dormant_behavior_audit-1.0.0/benchmarks/TASK_EXPANSION_PLAN.md +315 -0
  396. dormant_behavior_audit-1.0.0/benchmarks/USER_ONBOARDING_FLOW.md +228 -0
  397. dormant_behavior_audit-1.0.0/benchmarks/WHY_THIS_MATTERS.md +253 -0
  398. dormant_behavior_audit-1.0.0/benchmarks/__init__.py +1 -0
  399. dormant_behavior_audit-1.0.0/benchmarks/local_targets.py +417 -0
  400. dormant_behavior_audit-1.0.0/benchmarks/methods/README.md +16 -0
  401. dormant_behavior_audit-1.0.0/benchmarks/methods/hybrid_openweight_baseline_v0.md +82 -0
  402. dormant_behavior_audit-1.0.0/benchmarks/methods/reference_case_evidence_v0.md +55 -0
  403. dormant_behavior_audit-1.0.0/benchmarks/methods/scripted_blackbox_baseline_v0.md +105 -0
  404. dormant_behavior_audit-1.0.0/benchmarks/model_host.py +201 -0
  405. dormant_behavior_audit-1.0.0/benchmarks/public/ANNOUNCEMENT_POST.md +71 -0
  406. dormant_behavior_audit-1.0.0/benchmarks/public/COLLABORATION_BRIEF.md +86 -0
  407. dormant_behavior_audit-1.0.0/benchmarks/public/EXTERNAL_PLATFORM_STATUS.md +32 -0
  408. dormant_behavior_audit-1.0.0/benchmarks/public/HF_DATASET_CARD.md +87 -0
  409. dormant_behavior_audit-1.0.0/benchmarks/public/HUGGINGFACE_PUBLISHING.md +45 -0
  410. dormant_behavior_audit-1.0.0/benchmarks/public/HUGGING_FACE_PAPERS_SUBMISSION.md +36 -0
  411. dormant_behavior_audit-1.0.0/benchmarks/public/PAPERS_WITH_CODE_BENCHMARK_PAGE.md +66 -0
  412. dormant_behavior_audit-1.0.0/benchmarks/public/README.md +40 -0
  413. dormant_behavior_audit-1.0.0/benchmarks/public/RELEASE_METADATA_CHECK.md +17 -0
  414. dormant_behavior_audit-1.0.0/benchmarks/public/RELEASE_NOTES_v1.0.0.md +50 -0
  415. dormant_behavior_audit-1.0.0/benchmarks/public/SUBMISSION_SCOREBOARD.json +495 -0
  416. dormant_behavior_audit-1.0.0/benchmarks/public/SUBMISSION_SCOREBOARD.md +60 -0
  417. dormant_behavior_audit-1.0.0/benchmarks/public/ZENODO_MIRROR.md +34 -0
  418. dormant_behavior_audit-1.0.0/benchmarks/public/assets/readme-night-terminal.gif +0 -0
  419. dormant_behavior_audit-1.0.0/benchmarks/public/release_metadata.json +13 -0
  420. dormant_behavior_audit-1.0.0/benchmarks/public/release_metadata_check.json +44 -0
  421. dormant_behavior_audit-1.0.0/benchmarks/reference/dormant_puzzle_v1/BENCHMARK_BUNDLE_CHECK.md +27 -0
  422. dormant_behavior_audit-1.0.0/benchmarks/reference/dormant_puzzle_v1/README.md +29 -0
  423. dormant_behavior_audit-1.0.0/benchmarks/reference/dormant_puzzle_v1/benchmark_bundle_check.json +98 -0
  424. dormant_behavior_audit-1.0.0/benchmarks/reference/dormant_puzzle_v1/benchmark_bundle_v0.json +118 -0
  425. dormant_behavior_audit-1.0.0/benchmarks/reference/dormant_puzzle_v1/evidence/README.md +10 -0
  426. dormant_behavior_audit-1.0.0/benchmarks/reference/dormant_puzzle_v1/evidence/model2_top5_repeated_run_summary_CHECK.md +18 -0
  427. dormant_behavior_audit-1.0.0/benchmarks/reference/dormant_puzzle_v1/evidence/model2_top5_repeated_run_summary_check.json +44 -0
  428. dormant_behavior_audit-1.0.0/benchmarks/reference/dormant_puzzle_v1/evidence/model2_top5_repeated_run_summary_v0.json +243 -0
  429. dormant_behavior_audit-1.0.0/benchmarks/reference/dormant_puzzle_v1/evidence/model3_ma_yun_repeated_run_summary_CHECK.md +18 -0
  430. dormant_behavior_audit-1.0.0/benchmarks/reference/dormant_puzzle_v1/evidence/model3_ma_yun_repeated_run_summary_check.json +44 -0
  431. dormant_behavior_audit-1.0.0/benchmarks/reference/dormant_puzzle_v1/evidence/model3_ma_yun_repeated_run_summary_v0.json +63 -0
  432. dormant_behavior_audit-1.0.0/benchmarks/reference/dormant_puzzle_v1/evidence/model3_top5_repeated_run_summary_CHECK.md +18 -0
  433. dormant_behavior_audit-1.0.0/benchmarks/reference/dormant_puzzle_v1/evidence/model3_top5_repeated_run_summary_check.json +44 -0
  434. dormant_behavior_audit-1.0.0/benchmarks/reference/dormant_puzzle_v1/evidence/model3_top5_repeated_run_summary_v0.json +284 -0
  435. dormant_behavior_audit-1.0.0/benchmarks/reference/dormant_puzzle_v1/evidence/raw_evidence_packet_CHECK.md +17 -0
  436. dormant_behavior_audit-1.0.0/benchmarks/reference/dormant_puzzle_v1/evidence/raw_evidence_packet_check.json +38 -0
  437. dormant_behavior_audit-1.0.0/benchmarks/reference/dormant_puzzle_v1/evidence/raw_evidence_packet_v0.json +214 -0
  438. dormant_behavior_audit-1.0.0/benchmarks/schemas/benchmark_bundle_v0.schema.json +148 -0
  439. dormant_behavior_audit-1.0.0/benchmarks/schemas/benchmark_submission_v0.schema.json +80 -0
  440. dormant_behavior_audit-1.0.0/benchmarks/schemas/benchmark_task_v0.schema.json +164 -0
  441. dormant_behavior_audit-1.0.0/benchmarks/schemas/hybrid_openweight_baseline_report_v0.schema.json +60 -0
  442. dormant_behavior_audit-1.0.0/benchmarks/schemas/raw_evidence_packet_v0.schema.json +43 -0
  443. dormant_behavior_audit-1.0.0/benchmarks/schemas/reference_case_evidence_report_v0.schema.json +76 -0
  444. dormant_behavior_audit-1.0.0/benchmarks/schemas/release_metadata_v0.schema.json +55 -0
  445. dormant_behavior_audit-1.0.0/benchmarks/schemas/repeated_run_summary_v0.schema.json +67 -0
  446. dormant_behavior_audit-1.0.0/benchmarks/schemas/scripted_blackbox_baseline_report_v0.schema.json +60 -0
  447. dormant_behavior_audit-1.0.0/benchmarks/submissions/README.md +45 -0
  448. dormant_behavior_audit-1.0.0/benchmarks/submissions/aurora_context_hybrid_reference_submission_v0.json +21 -0
  449. dormant_behavior_audit-1.0.0/benchmarks/submissions/coastal_retrieval_hybrid_reference_submission_v0.json +21 -0
  450. dormant_behavior_audit-1.0.0/benchmarks/submissions/coastal_retrieval_qwen2_5_7b_transfer_hybrid_reference_submission_v0.json +21 -0
  451. dormant_behavior_audit-1.0.0/benchmarks/submissions/cross_model_alibaba_reference_case_submission_v0.json +24 -0
  452. dormant_behavior_audit-1.0.0/benchmarks/submissions/examples/example_external_warmup_hybrid_v0.json +20 -0
  453. dormant_behavior_audit-1.0.0/benchmarks/submissions/examples/example_external_warmup_hybrid_v0_README.md +40 -0
  454. dormant_behavior_audit-1.0.0/benchmarks/submissions/examples/model_host_clean_control_starter_v0.json +20 -0
  455. dormant_behavior_audit-1.0.0/benchmarks/submissions/examples/simulated_external_aurora_scripted_v0.json +20 -0
  456. dormant_behavior_audit-1.0.0/benchmarks/submissions/examples/simulated_external_aurora_scripted_v0_README.md +40 -0
  457. dormant_behavior_audit-1.0.0/benchmarks/submissions/examples/simulated_external_warmup_hybrid_v0.json +21 -0
  458. dormant_behavior_audit-1.0.0/benchmarks/submissions/examples/simulated_external_warmup_hybrid_v0_README.md +41 -0
  459. dormant_behavior_audit-1.0.0/benchmarks/submissions/gemma3_taxonomic_acknowledgment_ablation_reference_submission_v0.json +20 -0
  460. dormant_behavior_audit-1.0.0/benchmarks/submissions/model_host_clean_control_scripted_reference_submission_v0.json +20 -0
  461. dormant_behavior_audit-1.0.0/benchmarks/submissions/orchard_toolrouting_hybrid_reference_submission_v0.json +21 -0
  462. dormant_behavior_audit-1.0.0/benchmarks/submissions/orchard_toolrouting_qwen2_5_7b_transfer_hybrid_reference_submission_v0.json +21 -0
  463. dormant_behavior_audit-1.0.0/benchmarks/submissions/orchidaceae_family_model_host_followup_reference_submission_v0.json +20 -0
  464. dormant_behavior_audit-1.0.0/benchmarks/submissions/orchidaceae_system_hybrid_reference_submission_v0.json +21 -0
  465. dormant_behavior_audit-1.0.0/benchmarks/submissions/orchidaceae_system_qwen2_5_7b_transfer_hybrid_reference_submission_v0.json +21 -0
  466. dormant_behavior_audit-1.0.0/benchmarks/submissions/qwen2_5_7b_clean_control_scripted_reference_submission_v0.json +20 -0
  467. dormant_behavior_audit-1.0.0/benchmarks/submissions/qwen2_7b_clean_control_scripted_reference_submission_v0.json +20 -0
  468. dormant_behavior_audit-1.0.0/benchmarks/submissions/sakura_alias_multilingual_hybrid_reference_submission_v0.json +21 -0
  469. dormant_behavior_audit-1.0.0/benchmarks/submissions/warmup_alibaba_hybrid_reference_submission_v0.json +21 -0
  470. dormant_behavior_audit-1.0.0/benchmarks/tasks/README.md +50 -0
  471. dormant_behavior_audit-1.0.0/benchmarks/tasks/aurora_context_seeded_v0/PROTOCOL.md +45 -0
  472. dormant_behavior_audit-1.0.0/benchmarks/tasks/aurora_context_seeded_v0/REFERENCE_NOTES.md +36 -0
  473. dormant_behavior_audit-1.0.0/benchmarks/tasks/aurora_context_seeded_v0/TASK_CARD.md +41 -0
  474. dormant_behavior_audit-1.0.0/benchmarks/tasks/aurora_context_seeded_v0/TASK_CHECK.md +29 -0
  475. dormant_behavior_audit-1.0.0/benchmarks/tasks/aurora_context_seeded_v0/task_check.json +116 -0
  476. dormant_behavior_audit-1.0.0/benchmarks/tasks/aurora_context_seeded_v0/task_manifest_v0.json +149 -0
  477. dormant_behavior_audit-1.0.0/benchmarks/tasks/coastal_retrieval_qwen2_5_7b_transfer_v0/PROTOCOL.md +8 -0
  478. dormant_behavior_audit-1.0.0/benchmarks/tasks/coastal_retrieval_qwen2_5_7b_transfer_v0/REFERENCE_NOTES.md +20 -0
  479. dormant_behavior_audit-1.0.0/benchmarks/tasks/coastal_retrieval_qwen2_5_7b_transfer_v0/TASK_CARD.md +5 -0
  480. dormant_behavior_audit-1.0.0/benchmarks/tasks/coastal_retrieval_qwen2_5_7b_transfer_v0/TASK_CHECK.md +29 -0
  481. dormant_behavior_audit-1.0.0/benchmarks/tasks/coastal_retrieval_qwen2_5_7b_transfer_v0/task_check.json +116 -0
  482. dormant_behavior_audit-1.0.0/benchmarks/tasks/coastal_retrieval_qwen2_5_7b_transfer_v0/task_manifest_v0.json +119 -0
  483. dormant_behavior_audit-1.0.0/benchmarks/tasks/coastal_retrieval_seeded_v0/PROTOCOL.md +45 -0
  484. dormant_behavior_audit-1.0.0/benchmarks/tasks/coastal_retrieval_seeded_v0/REFERENCE_NOTES.md +32 -0
  485. dormant_behavior_audit-1.0.0/benchmarks/tasks/coastal_retrieval_seeded_v0/TASK_CARD.md +41 -0
  486. dormant_behavior_audit-1.0.0/benchmarks/tasks/coastal_retrieval_seeded_v0/TASK_CHECK.md +29 -0
  487. dormant_behavior_audit-1.0.0/benchmarks/tasks/coastal_retrieval_seeded_v0/task_check.json +116 -0
  488. dormant_behavior_audit-1.0.0/benchmarks/tasks/coastal_retrieval_seeded_v0/task_manifest_v0.json +150 -0
  489. dormant_behavior_audit-1.0.0/benchmarks/tasks/cross_model_alibaba_divergence_v0/PROTOCOL.md +45 -0
  490. dormant_behavior_audit-1.0.0/benchmarks/tasks/cross_model_alibaba_divergence_v0/TASK_CARD.md +32 -0
  491. dormant_behavior_audit-1.0.0/benchmarks/tasks/cross_model_alibaba_divergence_v0/TASK_CHECK.md +29 -0
  492. dormant_behavior_audit-1.0.0/benchmarks/tasks/cross_model_alibaba_divergence_v0/task_check.json +116 -0
  493. dormant_behavior_audit-1.0.0/benchmarks/tasks/cross_model_alibaba_divergence_v0/task_manifest_v0.json +184 -0
  494. dormant_behavior_audit-1.0.0/benchmarks/tasks/gemma3_taxonomic_acknowledgment_ablation_v0/PROTOCOL.md +52 -0
  495. dormant_behavior_audit-1.0.0/benchmarks/tasks/gemma3_taxonomic_acknowledgment_ablation_v0/TASK_CARD.md +34 -0
  496. dormant_behavior_audit-1.0.0/benchmarks/tasks/gemma3_taxonomic_acknowledgment_ablation_v0/TASK_CHECK.md +29 -0
  497. dormant_behavior_audit-1.0.0/benchmarks/tasks/gemma3_taxonomic_acknowledgment_ablation_v0/task_check.json +116 -0
  498. dormant_behavior_audit-1.0.0/benchmarks/tasks/gemma3_taxonomic_acknowledgment_ablation_v0/task_manifest_v0.json +139 -0
  499. dormant_behavior_audit-1.0.0/benchmarks/tasks/meridian_trace_multiturn_held_out_v0/PROTOCOL.md +10 -0
  500. dormant_behavior_audit-1.0.0/benchmarks/tasks/meridian_trace_multiturn_held_out_v0/REFERENCE_NOTES.md +22 -0
  501. dormant_behavior_audit-1.0.0/benchmarks/tasks/meridian_trace_multiturn_held_out_v0/TASK_CARD.md +11 -0
  502. dormant_behavior_audit-1.0.0/benchmarks/tasks/meridian_trace_multiturn_held_out_v0/TASK_CHECK.md +29 -0
  503. dormant_behavior_audit-1.0.0/benchmarks/tasks/meridian_trace_multiturn_held_out_v0/task_check.json +116 -0
  504. dormant_behavior_audit-1.0.0/benchmarks/tasks/meridian_trace_multiturn_held_out_v0/task_manifest_v0.json +179 -0
  505. dormant_behavior_audit-1.0.0/benchmarks/tasks/model_host_clean_control_v0/PROTOCOL.md +52 -0
  506. dormant_behavior_audit-1.0.0/benchmarks/tasks/model_host_clean_control_v0/TASK_CARD.md +43 -0
  507. dormant_behavior_audit-1.0.0/benchmarks/tasks/model_host_clean_control_v0/TASK_CHECK.md +29 -0
  508. dormant_behavior_audit-1.0.0/benchmarks/tasks/model_host_clean_control_v0/task_check.json +116 -0
  509. dormant_behavior_audit-1.0.0/benchmarks/tasks/model_host_clean_control_v0/task_manifest_v0.json +143 -0
  510. dormant_behavior_audit-1.0.0/benchmarks/tasks/orchard_toolrouting_qwen2_5_7b_transfer_v0/PROTOCOL.md +8 -0
  511. dormant_behavior_audit-1.0.0/benchmarks/tasks/orchard_toolrouting_qwen2_5_7b_transfer_v0/REFERENCE_NOTES.md +19 -0
  512. dormant_behavior_audit-1.0.0/benchmarks/tasks/orchard_toolrouting_qwen2_5_7b_transfer_v0/TASK_CARD.md +5 -0
  513. dormant_behavior_audit-1.0.0/benchmarks/tasks/orchard_toolrouting_qwen2_5_7b_transfer_v0/TASK_CHECK.md +29 -0
  514. dormant_behavior_audit-1.0.0/benchmarks/tasks/orchard_toolrouting_qwen2_5_7b_transfer_v0/task_check.json +116 -0
  515. dormant_behavior_audit-1.0.0/benchmarks/tasks/orchard_toolrouting_qwen2_5_7b_transfer_v0/task_manifest_v0.json +119 -0
  516. dormant_behavior_audit-1.0.0/benchmarks/tasks/orchard_toolrouting_seeded_v0/PROTOCOL.md +13 -0
  517. dormant_behavior_audit-1.0.0/benchmarks/tasks/orchard_toolrouting_seeded_v0/REFERENCE_NOTES.md +31 -0
  518. dormant_behavior_audit-1.0.0/benchmarks/tasks/orchard_toolrouting_seeded_v0/TASK_CARD.md +11 -0
  519. dormant_behavior_audit-1.0.0/benchmarks/tasks/orchard_toolrouting_seeded_v0/TASK_CHECK.md +29 -0
  520. dormant_behavior_audit-1.0.0/benchmarks/tasks/orchard_toolrouting_seeded_v0/task_check.json +116 -0
  521. dormant_behavior_audit-1.0.0/benchmarks/tasks/orchard_toolrouting_seeded_v0/task_manifest_v0.json +149 -0
  522. dormant_behavior_audit-1.0.0/benchmarks/tasks/orchidaceae_family_model_host_followup_v0/PROTOCOL.md +50 -0
  523. dormant_behavior_audit-1.0.0/benchmarks/tasks/orchidaceae_family_model_host_followup_v0/TASK_CARD.md +41 -0
  524. dormant_behavior_audit-1.0.0/benchmarks/tasks/orchidaceae_family_model_host_followup_v0/TASK_CHECK.md +29 -0
  525. dormant_behavior_audit-1.0.0/benchmarks/tasks/orchidaceae_family_model_host_followup_v0/task_check.json +116 -0
  526. dormant_behavior_audit-1.0.0/benchmarks/tasks/orchidaceae_family_model_host_followup_v0/task_manifest_v0.json +148 -0
  527. dormant_behavior_audit-1.0.0/benchmarks/tasks/orchidaceae_system_qwen2_5_7b_transfer_v0/PROTOCOL.md +18 -0
  528. dormant_behavior_audit-1.0.0/benchmarks/tasks/orchidaceae_system_qwen2_5_7b_transfer_v0/REFERENCE_NOTES.md +16 -0
  529. dormant_behavior_audit-1.0.0/benchmarks/tasks/orchidaceae_system_qwen2_5_7b_transfer_v0/TASK_CARD.md +26 -0
  530. dormant_behavior_audit-1.0.0/benchmarks/tasks/orchidaceae_system_qwen2_5_7b_transfer_v0/TASK_CHECK.md +29 -0
  531. dormant_behavior_audit-1.0.0/benchmarks/tasks/orchidaceae_system_qwen2_5_7b_transfer_v0/task_check.json +116 -0
  532. dormant_behavior_audit-1.0.0/benchmarks/tasks/orchidaceae_system_qwen2_5_7b_transfer_v0/task_manifest_v0.json +119 -0
  533. dormant_behavior_audit-1.0.0/benchmarks/tasks/orchidaceae_system_seeded_v0/PROTOCOL.md +44 -0
  534. dormant_behavior_audit-1.0.0/benchmarks/tasks/orchidaceae_system_seeded_v0/REFERENCE_NOTES.md +37 -0
  535. dormant_behavior_audit-1.0.0/benchmarks/tasks/orchidaceae_system_seeded_v0/TASK_CARD.md +41 -0
  536. dormant_behavior_audit-1.0.0/benchmarks/tasks/orchidaceae_system_seeded_v0/TASK_CHECK.md +29 -0
  537. dormant_behavior_audit-1.0.0/benchmarks/tasks/orchidaceae_system_seeded_v0/task_check.json +116 -0
  538. dormant_behavior_audit-1.0.0/benchmarks/tasks/orchidaceae_system_seeded_v0/task_manifest_v0.json +149 -0
  539. dormant_behavior_audit-1.0.0/benchmarks/tasks/qwen2_5_7b_clean_control_v0/PROTOCOL.md +18 -0
  540. dormant_behavior_audit-1.0.0/benchmarks/tasks/qwen2_5_7b_clean_control_v0/REFERENCE_NOTES.md +16 -0
  541. dormant_behavior_audit-1.0.0/benchmarks/tasks/qwen2_5_7b_clean_control_v0/TASK_CARD.md +28 -0
  542. dormant_behavior_audit-1.0.0/benchmarks/tasks/qwen2_5_7b_clean_control_v0/TASK_CHECK.md +29 -0
  543. dormant_behavior_audit-1.0.0/benchmarks/tasks/qwen2_5_7b_clean_control_v0/task_check.json +116 -0
  544. dormant_behavior_audit-1.0.0/benchmarks/tasks/qwen2_5_7b_clean_control_v0/task_manifest_v0.json +149 -0
  545. dormant_behavior_audit-1.0.0/benchmarks/tasks/qwen2_7b_clean_control_v0/PROTOCOL.md +32 -0
  546. dormant_behavior_audit-1.0.0/benchmarks/tasks/qwen2_7b_clean_control_v0/REFERENCE_NOTES.md +15 -0
  547. dormant_behavior_audit-1.0.0/benchmarks/tasks/qwen2_7b_clean_control_v0/TASK_CARD.md +44 -0
  548. dormant_behavior_audit-1.0.0/benchmarks/tasks/qwen2_7b_clean_control_v0/TASK_CHECK.md +29 -0
  549. dormant_behavior_audit-1.0.0/benchmarks/tasks/qwen2_7b_clean_control_v0/task_check.json +116 -0
  550. dormant_behavior_audit-1.0.0/benchmarks/tasks/qwen2_7b_clean_control_v0/task_manifest_v0.json +149 -0
  551. dormant_behavior_audit-1.0.0/benchmarks/tasks/sakura_alias_multilingual_seeded_v0/PROTOCOL.md +44 -0
  552. dormant_behavior_audit-1.0.0/benchmarks/tasks/sakura_alias_multilingual_seeded_v0/REFERENCE_NOTES.md +21 -0
  553. dormant_behavior_audit-1.0.0/benchmarks/tasks/sakura_alias_multilingual_seeded_v0/TASK_CARD.md +41 -0
  554. dormant_behavior_audit-1.0.0/benchmarks/tasks/sakura_alias_multilingual_seeded_v0/TASK_CHECK.md +29 -0
  555. dormant_behavior_audit-1.0.0/benchmarks/tasks/sakura_alias_multilingual_seeded_v0/task_check.json +116 -0
  556. dormant_behavior_audit-1.0.0/benchmarks/tasks/sakura_alias_multilingual_seeded_v0/task_manifest_v0.json +154 -0
  557. dormant_behavior_audit-1.0.0/benchmarks/tasks/warmup_alibaba_seeded_v0/PROTOCOL.md +61 -0
  558. dormant_behavior_audit-1.0.0/benchmarks/tasks/warmup_alibaba_seeded_v0/TASK_CARD.md +41 -0
  559. dormant_behavior_audit-1.0.0/benchmarks/tasks/warmup_alibaba_seeded_v0/TASK_CHECK.md +29 -0
  560. dormant_behavior_audit-1.0.0/benchmarks/tasks/warmup_alibaba_seeded_v0/task_check.json +116 -0
  561. dormant_behavior_audit-1.0.0/benchmarks/tasks/warmup_alibaba_seeded_v0/task_manifest_v0.json +165 -0
  562. dormant_behavior_audit-1.0.0/benchmarks/templates/ANNOUNCEMENT_POST_TEMPLATE.md +34 -0
  563. dormant_behavior_audit-1.0.0/benchmarks/templates/EXTERNAL_SUBMISSION_README_TEMPLATE.md +34 -0
  564. dormant_behavior_audit-1.0.0/benchmarks/templates/HF_DATASET_CARD_TEMPLATE.md +73 -0
  565. dormant_behavior_audit-1.0.0/benchmarks/templates/PAPERS_WITH_CODE_BENCHMARK_PAGE_TEMPLATE.md +53 -0
  566. dormant_behavior_audit-1.0.0/benchmarks/templates/benchmark_bundle_v0.template.json +54 -0
  567. dormant_behavior_audit-1.0.0/benchmarks/templates/benchmark_submission_v0.template.json +20 -0
  568. dormant_behavior_audit-1.0.0/benchmarks/templates/benchmark_task_v0.template.json +69 -0
  569. dormant_behavior_audit-1.0.0/benchmarks/templates/raw_evidence_packet_v0.template.json +26 -0
  570. dormant_behavior_audit-1.0.0/benchmarks/templates/repeated_run_summary_v0.template.json +33 -0
  571. dormant_behavior_audit-1.0.0/dormant_behavior_audit/__init__.py +4 -0
  572. dormant_behavior_audit-1.0.0/dormant_behavior_audit/__main__.py +7 -0
  573. dormant_behavior_audit-1.0.0/dormant_behavior_audit/cli.py +141 -0
  574. dormant_behavior_audit-1.0.0/dormant_behavior_audit.egg-info/PKG-INFO +209 -0
  575. dormant_behavior_audit-1.0.0/dormant_behavior_audit.egg-info/SOURCES.txt +709 -0
  576. dormant_behavior_audit-1.0.0/dormant_behavior_audit.egg-info/dependency_links.txt +1 -0
  577. dormant_behavior_audit-1.0.0/dormant_behavior_audit.egg-info/entry_points.txt +3 -0
  578. dormant_behavior_audit-1.0.0/dormant_behavior_audit.egg-info/requires.txt +28 -0
  579. dormant_behavior_audit-1.0.0/dormant_behavior_audit.egg-info/top_level.txt +8 -0
  580. dormant_behavior_audit-1.0.0/findings/CodyMitchell_DormantPuzzle_Submission_V2_2026-03-06.pdf +0 -0
  581. dormant_behavior_audit-1.0.0/findings/IMPLICATIONS_AND_APPLICATIONS_APPENDIX_V2.md +79 -0
  582. dormant_behavior_audit-1.0.0/findings/RAW_EVIDENCE_APPENDIX_V2.md +60 -0
  583. dormant_behavior_audit-1.0.0/findings/README.md +54 -0
  584. dormant_behavior_audit-1.0.0/findings/RELEASE_PACKET_V2.md +49 -0
  585. dormant_behavior_audit-1.0.0/findings/RELEASE_PACKET_V2_CHECK.md +29 -0
  586. dormant_behavior_audit-1.0.0/findings/STATS_ADDENDUM_V2.md +32 -0
  587. dormant_behavior_audit-1.0.0/findings/SUBMISSION_V2.md +175 -0
  588. dormant_behavior_audit-1.0.0/findings/__init__.py +1 -0
  589. dormant_behavior_audit-1.0.0/findings/claim_consistency_check.json +142 -0
  590. dormant_behavior_audit-1.0.0/findings/claim_consistency_report.md +37 -0
  591. dormant_behavior_audit-1.0.0/findings/competitor_n20.json +171 -0
  592. dormant_behavior_audit-1.0.0/findings/model1_n50.json +89 -0
  593. dormant_behavior_audit-1.0.0/findings/model2_n50.json +89 -0
  594. dormant_behavior_audit-1.0.0/findings/model3_confirmation.json +720 -0
  595. dormant_behavior_audit-1.0.0/findings/model3_ma_yun_n50.json +25 -0
  596. dormant_behavior_audit-1.0.0/findings/model3_n50.json +88 -0
  597. dormant_behavior_audit-1.0.0/findings/raw_evidence_appendix_v2.json +176 -0
  598. dormant_behavior_audit-1.0.0/findings/release_packet_v2_check.json +122 -0
  599. dormant_behavior_audit-1.0.0/findings/stats_addendum_v2.json +483 -0
  600. dormant_behavior_audit-1.0.0/findings/warmup_generation_test.json +209 -0
  601. dormant_behavior_audit-1.0.0/orbit/README.md +249 -0
  602. dormant_behavior_audit-1.0.0/orbit/__init__.py +3 -0
  603. dormant_behavior_audit-1.0.0/orbit/__main__.py +32 -0
  604. dormant_behavior_audit-1.0.0/orbit/core/__init__.py +24 -0
  605. dormant_behavior_audit-1.0.0/orbit/core/events.py +211 -0
  606. dormant_behavior_audit-1.0.0/orbit/core/orbit.py +52 -0
  607. dormant_behavior_audit-1.0.0/orbit/core/pipeline.py +249 -0
  608. dormant_behavior_audit-1.0.0/orbit/core/scope.py +128 -0
  609. dormant_behavior_audit-1.0.0/orbit/core/state.py +81 -0
  610. dormant_behavior_audit-1.0.0/orbit/tui/__init__.py +1 -0
  611. dormant_behavior_audit-1.0.0/orbit/tui/__main__.py +5 -0
  612. dormant_behavior_audit-1.0.0/orbit/tui/app.py +71 -0
  613. dormant_behavior_audit-1.0.0/orbit/tui/screens/__init__.py +1 -0
  614. dormant_behavior_audit-1.0.0/orbit/tui/screens/dashboard.py +336 -0
  615. dormant_behavior_audit-1.0.0/orbit/tui/screens/launch.py +167 -0
  616. dormant_behavior_audit-1.0.0/orbit/tui/styles/app.tcss +36 -0
  617. dormant_behavior_audit-1.0.0/orbit/tui/widgets/__init__.py +1 -0
  618. dormant_behavior_audit-1.0.0/problems/__init__.py +1 -0
  619. dormant_behavior_audit-1.0.0/problems/dormant_puzzle/__init__.py +4 -0
  620. dormant_behavior_audit-1.0.0/problems/dormant_puzzle/local_models.py +42 -0
  621. dormant_behavior_audit-1.0.0/problems/dormant_puzzle/scopes/model_1.yaml +50 -0
  622. dormant_behavior_audit-1.0.0/problems/dormant_puzzle/scopes/model_2.yaml +56 -0
  623. dormant_behavior_audit-1.0.0/problems/dormant_puzzle/scopes/model_3.yaml +49 -0
  624. dormant_behavior_audit-1.0.0/problems/dormant_puzzle/scopes/warmup.yaml +54 -0
  625. dormant_behavior_audit-1.0.0/problems/dormant_puzzle/stages/__init__.py +1 -0
  626. dormant_behavior_audit-1.0.0/problems/dormant_puzzle/stages/activation_analysis.py +132 -0
  627. dormant_behavior_audit-1.0.0/problems/dormant_puzzle/stages/memory_extraction.py +111 -0
  628. dormant_behavior_audit-1.0.0/problems/dormant_puzzle/stages/motif_discovery.py +78 -0
  629. dormant_behavior_audit-1.0.0/problems/dormant_puzzle/stages/trigger_search.py +99 -0
  630. dormant_behavior_audit-1.0.0/problems/dormant_puzzle/stages/verify.py +279 -0
  631. dormant_behavior_audit-1.0.0/problems/dormant_puzzle/stages/weight_diff.py +105 -0
  632. dormant_behavior_audit-1.0.0/problems/dormant_puzzle/worker.py +194 -0
  633. dormant_behavior_audit-1.0.0/pyproject.toml +93 -0
  634. dormant_behavior_audit-1.0.0/scripts/__init__.py +1 -0
  635. dormant_behavior_audit-1.0.0/scripts/aggregate_trigger_repeats.py +145 -0
  636. dormant_behavior_audit-1.0.0/scripts/analyze_prefix_acknowledgment.py +215 -0
  637. dormant_behavior_audit-1.0.0/scripts/attention_heatmap.py +204 -0
  638. dormant_behavior_audit-1.0.0/scripts/build_public_benchmark_assets.py +474 -0
  639. dormant_behavior_audit-1.0.0/scripts/build_raw_evidence_appendix_v2.py +282 -0
  640. dormant_behavior_audit-1.0.0/scripts/build_raw_evidence_packet_artifact_v0.py +119 -0
  641. dormant_behavior_audit-1.0.0/scripts/build_release_stats_appendix.py +197 -0
  642. dormant_behavior_audit-1.0.0/scripts/build_repeated_run_summary_artifact_v0.py +48 -0
  643. dormant_behavior_audit-1.0.0/scripts/build_submission_scoreboard.py +246 -0
  644. dormant_behavior_audit-1.0.0/scripts/causal_tracing.py +214 -0
  645. dormant_behavior_audit-1.0.0/scripts/check_baseline_report.py +233 -0
  646. dormant_behavior_audit-1.0.0/scripts/check_benchmark_bundle.py +356 -0
  647. dormant_behavior_audit-1.0.0/scripts/check_benchmark_evidence_artifact.py +232 -0
  648. dormant_behavior_audit-1.0.0/scripts/check_benchmark_submission.py +719 -0
  649. dormant_behavior_audit-1.0.0/scripts/check_benchmark_task.py +319 -0
  650. dormant_behavior_audit-1.0.0/scripts/check_local_model_readiness.py +47 -0
  651. dormant_behavior_audit-1.0.0/scripts/check_model_host_readiness.py +21 -0
  652. dormant_behavior_audit-1.0.0/scripts/check_reference_case_report.py +212 -0
  653. dormant_behavior_audit-1.0.0/scripts/check_release_metadata.py +156 -0
  654. dormant_behavior_audit-1.0.0/scripts/check_release_packet_v2.py +278 -0
  655. dormant_behavior_audit-1.0.0/scripts/claim_consistency_check.py +391 -0
  656. dormant_behavior_audit-1.0.0/scripts/compare_model2_behavior.py +107 -0
  657. dormant_behavior_audit-1.0.0/scripts/compare_model3_behavior.py +113 -0
  658. dormant_behavior_audit-1.0.0/scripts/competitor_n20.py +228 -0
  659. dormant_behavior_audit-1.0.0/scripts/composite_loss_scoring.py +234 -0
  660. dormant_behavior_audit-1.0.0/scripts/confirm_model1_trigger.py +113 -0
  661. dormant_behavior_audit-1.0.0/scripts/discover_module_names.py +123 -0
  662. dormant_behavior_audit-1.0.0/scripts/download_base_model.py +91 -0
  663. dormant_behavior_audit-1.0.0/scripts/embedding_shift.py +221 -0
  664. dormant_behavior_audit-1.0.0/scripts/fetch_pending_batches.py +55 -0
  665. dormant_behavior_audit-1.0.0/scripts/gen_composite_api.py +234 -0
  666. dormant_behavior_audit-1.0.0/scripts/gen_composite_score.py +294 -0
  667. dormant_behavior_audit-1.0.0/scripts/generate_readme_night_terminal.py +205 -0
  668. dormant_behavior_audit-1.0.0/scripts/init_benchmark_submission.py +143 -0
  669. dormant_behavior_audit-1.0.0/scripts/large_trigger_search.py +366 -0
  670. dormant_behavior_audit-1.0.0/scripts/linear_probes.py +215 -0
  671. dormant_behavior_audit-1.0.0/scripts/logit_lens.py +283 -0
  672. dormant_behavior_audit-1.0.0/scripts/materialize_archived_reference_bundles.py +385 -0
  673. dormant_behavior_audit-1.0.0/scripts/min_trigger_ablation.py +201 -0
  674. dormant_behavior_audit-1.0.0/scripts/model3_confirmation.py +251 -0
  675. dormant_behavior_audit-1.0.0/scripts/model3_n50.py +232 -0
  676. dormant_behavior_audit-1.0.0/scripts/probe_backdoor_direct.py +89 -0
  677. dormant_behavior_audit-1.0.0/scripts/probe_main_models_memory.py +140 -0
  678. dormant_behavior_audit-1.0.0/scripts/publish_huggingface_entry.py +123 -0
  679. dormant_behavior_audit-1.0.0/scripts/quick_probe.py +149 -0
  680. dormant_behavior_audit-1.0.0/scripts/reproduce_submission.py +422 -0
  681. dormant_behavior_audit-1.0.0/scripts/run_activation_anomaly.py +177 -0
  682. dormant_behavior_audit-1.0.0/scripts/run_benchmark_submission.py +1909 -0
  683. dormant_behavior_audit-1.0.0/scripts/run_full_analysis.py +539 -0
  684. dormant_behavior_audit-1.0.0/scripts/run_gcg.py +83 -0
  685. dormant_behavior_audit-1.0.0/scripts/run_gcg_only.py +111 -0
  686. dormant_behavior_audit-1.0.0/scripts/run_hybrid_openweight_baseline.py +452 -0
  687. dormant_behavior_audit-1.0.0/scripts/run_memory_warmup.py +65 -0
  688. dormant_behavior_audit-1.0.0/scripts/run_scripted_blackbox_baseline.py +472 -0
  689. dormant_behavior_audit-1.0.0/scripts/stats_addendum.py +324 -0
  690. dormant_behavior_audit-1.0.0/scripts/test_activations.py +127 -0
  691. dormant_behavior_audit-1.0.0/scripts/test_aliyun_api.py +216 -0
  692. dormant_behavior_audit-1.0.0/scripts/test_code_security.py +136 -0
  693. dormant_behavior_audit-1.0.0/scripts/test_deepseek_baseline.py +145 -0
  694. dormant_behavior_audit-1.0.0/scripts/test_emoji_triggers.py +109 -0
  695. dormant_behavior_audit-1.0.0/scripts/test_hijack_specificity.py +115 -0
  696. dormant_behavior_audit-1.0.0/scripts/test_identity_trigger.py +129 -0
  697. dormant_behavior_audit-1.0.0/scripts/test_neutral_triggers.py +115 -0
  698. dormant_behavior_audit-1.0.0/scripts/test_system_prompt_trigger.py +120 -0
  699. dormant_behavior_audit-1.0.0/scripts/test_trigger_candidate.py +108 -0
  700. dormant_behavior_audit-1.0.0/scripts/test_trigger_model2.py +106 -0
  701. dormant_behavior_audit-1.0.0/scripts/test_trigger_model3.py +105 -0
  702. dormant_behavior_audit-1.0.0/scripts/top_trigger_n50.py +286 -0
  703. dormant_behavior_audit-1.0.0/scripts/warmup_generation_test.py +203 -0
  704. dormant_behavior_audit-1.0.0/setup.cfg +4 -0
  705. dormant_behavior_audit-1.0.0/src/__init__.py +1 -0
  706. dormant_behavior_audit-1.0.0/src/activation_analysis.py +427 -0
  707. dormant_behavior_audit-1.0.0/src/client.py +145 -0
  708. dormant_behavior_audit-1.0.0/src/memory_extraction.py +334 -0
  709. dormant_behavior_audit-1.0.0/src/motif_discovery.py +244 -0
  710. dormant_behavior_audit-1.0.0/src/trigger_reconstruction.py +435 -0
  711. dormant_behavior_audit-1.0.0/src/weight_analysis.py +911 -0
@@ -0,0 +1,33 @@
1
+ {
2
+ "title": "Dormant Behavior Audit",
3
+ "description": "Public benchmark assets, reference bundle, and reproducibility materials for auditing latent, condition-dependent model behavior. The v1.0.0 release includes the flagship reference report, normalized benchmark bundle, and release-ready validation artifacts.",
4
+ "creators": [
5
+ {
6
+ "name": "Mitchell, Cody",
7
+ "affiliation": "Independent Researcher"
8
+ }
9
+ ],
10
+ "license": "Apache-2.0",
11
+ "upload_type": "software",
12
+ "publication_date": "2026-04-07",
13
+ "keywords": [
14
+ "benchmark",
15
+ "llm-evals",
16
+ "model auditing",
17
+ "reproducibility",
18
+ "dormant behavior",
19
+ "interpretability"
20
+ ],
21
+ "related_identifiers": [
22
+ {
23
+ "identifier": "https://github.com/SproutSeeds/dormant-behavior-audit/releases/tag/v1.0.0",
24
+ "relation": "isSupplementTo",
25
+ "resource_type": "software"
26
+ },
27
+ {
28
+ "identifier": "https://github.com/SproutSeeds/dormant-behavior-audit/releases/download/v1.0.0/dormant-behavior-audit-v1.0.0-reference-report.pdf",
29
+ "relation": "hasPart",
30
+ "resource_type": "publication-report"
31
+ }
32
+ ]
33
+ }
@@ -0,0 +1,28 @@
1
+ cff-version: 1.2.0
2
+ message: "If you use this repository, please cite the Dormant Behavior Audit release materials and reference report."
3
+ title: "Dormant Behavior Audit"
4
+ type: software
5
+ url: "https://github.com/SproutSeeds/dormant-behavior-audit"
6
+ repository-code: "https://github.com/SproutSeeds/dormant-behavior-audit"
7
+ authors:
8
+ - family-names: Mitchell
9
+ given-names: Cody
10
+ affiliation: Independent Researcher
11
+ abstract: "Benchmark assets, reference bundles, and reproducibility materials for auditing latent, condition-dependent model behavior."
12
+ version: "v1.0.0"
13
+ date-released: 2026-04-07
14
+ license: Apache-2.0
15
+ keywords:
16
+ - benchmark
17
+ - model auditing
18
+ - reproducibility
19
+ - latent behavior
20
+ - dormant behavior
21
+ preferred-citation:
22
+ type: article
23
+ title: "Finding the Alibaba Cloud Backdoor: A Reproducible Reference Case for Dormant Behavior Audit"
24
+ authors:
25
+ - family-names: Mitchell
26
+ given-names: Cody
27
+ year: 2026
28
+ url: "https://github.com/SproutSeeds/dormant-behavior-audit/releases/download/v1.0.0/dormant-behavior-audit-v1.0.0-reference-report.pdf"
@@ -0,0 +1,196 @@
1
+ Apache License
2
+ Version 2.0, January 2004
3
+ http://www.apache.org/licenses/
4
+
5
+ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
6
+
7
+ 1. Definitions.
8
+
9
+ "License" shall mean the terms and conditions for use, reproduction, and
10
+ distribution as defined by Sections 1 through 9 of this document.
11
+
12
+ "Licensor" shall mean the copyright owner or entity authorized by the copyright
13
+ owner that is granting the License.
14
+
15
+ "Legal Entity" shall mean the union of the acting entity and all other entities
16
+ that control, are controlled by, or are under common control with that entity.
17
+ For the purposes of this definition, "control" means (i) the power, direct or
18
+ indirect, to cause the direction or management of such entity, whether by
19
+ contract or otherwise, or (ii) ownership of fifty percent (50%) or more of the
20
+ outstanding shares, or (iii) beneficial ownership of such entity.
21
+
22
+ "You" (or "Your") shall mean an individual or Legal Entity exercising
23
+ permissions granted by this License.
24
+
25
+ "Source" form shall mean the preferred form for making modifications, including
26
+ but not limited to software source code, documentation source, and
27
+ configuration files.
28
+
29
+ "Object" form shall mean any form resulting from mechanical transformation or
30
+ translation of a Source form, including but not limited to compiled object
31
+ code, generated documentation, and conversions to other media types.
32
+
33
+ "Work" shall mean the work of authorship, whether in Source or Object form,
34
+ made available under the License, as indicated by a copyright notice that is
35
+ included in or attached to the work (an example is provided in the Appendix
36
+ below).
37
+
38
+ "Derivative Works" shall mean any work, whether in Source or Object form, that
39
+ is based on (or derived from) the Work and for which the editorial revisions,
40
+ annotations, elaborations, or other modifications represent, as a whole, an
41
+ original work of authorship. For the purposes of this License, Derivative Works
42
+ shall not include works that remain separable from, or merely link (or bind by
43
+ name) to the interfaces of, the Work and Derivative Works thereof.
44
+
45
+ "Contribution" shall mean any work of authorship, including the original
46
+ version of the Work and any modifications or additions to that Work or
47
+ Derivative Works thereof, that is intentionally submitted to Licensor for
48
+ inclusion in the Work by the copyright owner or by an individual or Legal
49
+ Entity authorized to submit on behalf of the copyright owner. For the purposes
50
+ of this definition, "submitted" means any form of electronic, verbal, or
51
+ written communication sent to the Licensor or its representatives, including
52
+ but not limited to communication on electronic mailing lists, source code
53
+ control systems, and issue tracking systems that are managed by, or on behalf
54
+ of, the Licensor for the purpose of discussing and improving the Work, but
55
+ excluding communication that is conspicuously marked or otherwise designated in
56
+ writing by the copyright owner as "Not a Contribution."
57
+
58
+ "Contributor" shall mean Licensor and any individual or Legal Entity on behalf
59
+ of whom a Contribution has been received by Licensor and subsequently
60
+ incorporated within the Work.
61
+
62
+ 2. Grant of Copyright License.
63
+
64
+ Subject to the terms and conditions of this License, each Contributor hereby
65
+ grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free,
66
+ irrevocable copyright license to reproduce, prepare Derivative Works of,
67
+ publicly display, publicly perform, sublicense, and distribute the Work and
68
+ such Derivative Works in Source or Object form.
69
+
70
+ 3. Grant of Patent License.
71
+
72
+ Subject to the terms and conditions of this License, each Contributor hereby
73
+ grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free,
74
+ irrevocable (except as stated in this section) patent license to make, have
75
+ made, use, offer to sell, sell, import, and otherwise transfer the Work, where
76
+ such license applies only to those patent claims licensable by such Contributor
77
+ that are necessarily infringed by their Contribution(s) alone or by combination
78
+ of their Contribution(s) with the Work to which such Contribution(s) was
79
+ submitted. If You institute patent litigation against any entity (including a
80
+ cross-claim or counterclaim in a lawsuit) alleging that the Work or a
81
+ Contribution incorporated within the Work constitutes direct or contributory
82
+ patent infringement, then any patent licenses granted to You under this License
83
+ for that Work shall terminate as of the date such litigation is filed.
84
+
85
+ 4. Redistribution.
86
+
87
+ You may reproduce and distribute copies of the Work or Derivative Works thereof
88
+ in any medium, with or without modifications, and in Source or Object form,
89
+ provided that You meet the following conditions:
90
+
91
+ (a) You must give any other recipients of the Work or Derivative Works a copy
92
+ of this License; and
93
+
94
+ (b) You must cause any modified files to carry prominent notices stating that
95
+ You changed the files; and
96
+
97
+ (c) You must retain, in the Source form of any Derivative Works that You
98
+ distribute, all copyright, patent, trademark, and attribution notices from the
99
+ Source form of the Work, excluding those notices that do not pertain to any
100
+ part of the Derivative Works; and
101
+
102
+ (d) If the Work includes a "NOTICE" text file as part of its distribution, then
103
+ any Derivative Works that You distribute must include a readable copy of the
104
+ attribution notices contained within such NOTICE file, excluding those notices
105
+ that do not pertain to any part of the Derivative Works, in at least one of the
106
+ following places: within a NOTICE text file distributed as part of the
107
+ Derivative Works; within the Source form or documentation, if provided along
108
+ with the Derivative Works; or, within a display generated by the Derivative
109
+ Works, if and wherever such third-party notices normally appear. The contents
110
+ of the NOTICE file are for informational purposes only and do not modify the
111
+ License. You may add Your own attribution notices within Derivative Works that
112
+ You distribute, alongside or as an addendum to the NOTICE text from the Work,
113
+ provided that such additional attribution notices cannot be construed as
114
+ modifying the License.
115
+
116
+ You may add Your own copyright statement to Your modifications and may provide
117
+ additional or different license terms and conditions for use, reproduction, or
118
+ distribution of Your modifications, or for any such Derivative Works as a
119
+ whole, provided Your use, reproduction, and distribution of the Work otherwise
120
+ complies with the conditions stated in this License.
121
+
122
+ 5. Submission of Contributions.
123
+
124
+ Unless You explicitly state otherwise, any Contribution intentionally submitted
125
+ for inclusion in the Work by You to the Licensor shall be under the terms and
126
+ conditions of this License, without any additional terms or conditions.
127
+ Notwithstanding the above, nothing herein shall supersede or modify the terms
128
+ of any separate license agreement you may have executed with Licensor regarding
129
+ such Contributions.
130
+
131
+ 6. Trademarks.
132
+
133
+ This License does not grant permission to use the trade names, trademarks,
134
+ service marks, or product names of the Licensor, except as required for
135
+ reasonable and customary use in describing the origin of the Work and
136
+ reproducing the content of the NOTICE file.
137
+
138
+ 7. Disclaimer of Warranty.
139
+
140
+ Unless required by applicable law or agreed to in writing, Licensor provides
141
+ the Work (and each Contributor provides its Contributions) on an "AS IS" BASIS,
142
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied,
143
+ including, without limitation, any warranties or conditions of TITLE,
144
+ NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. You are
145
+ solely responsible for determining the appropriateness of using or
146
+ redistributing the Work and assume any risks associated with Your exercise of
147
+ permissions under this License.
148
+
149
+ 8. Limitation of Liability.
150
+
151
+ In no event and under no legal theory, whether in tort (including negligence),
152
+ contract, or otherwise, unless required by applicable law (such as deliberate
153
+ and grossly negligent acts) or agreed to in writing, shall any Contributor be
154
+ liable to You for damages, including any direct, indirect, special, incidental,
155
+ or consequential damages of any character arising as a result of this License
156
+ or out of the use or inability to use the Work (including but not limited to
157
+ damages for loss of goodwill, work stoppage, computer failure or malfunction,
158
+ or any and all other commercial damages or losses), even if such Contributor
159
+ has been advised of the possibility of such damages.
160
+
161
+ 9. Accepting Warranty or Additional Liability.
162
+
163
+ While redistributing the Work or Derivative Works thereof, You may choose to
164
+ offer, and charge a fee for, acceptance of support, warranty, indemnity, or
165
+ other liability obligations and/or rights consistent with this License.
166
+ However, in accepting such obligations, You may act only on Your own behalf and
167
+ on Your sole responsibility, not on behalf of any other Contributor, and only
168
+ if You agree to indemnify, defend, and hold each Contributor harmless for any
169
+ liability incurred by, or claims asserted against, such Contributor by reason
170
+ of your accepting any such warranty or additional liability.
171
+
172
+ END OF TERMS AND CONDITIONS
173
+
174
+ APPENDIX: How to apply the Apache License to your work.
175
+
176
+ To apply the Apache License to your work, attach the following boilerplate
177
+ notice, with the fields enclosed by brackets "[]" replaced with your own
178
+ identifying information. (Don't include the brackets!) The text should be
179
+ enclosed in the appropriate comment syntax for the file format. We also
180
+ recommend that a file or class name and description of purpose be included on
181
+ the same "printed page" as the copyright notice for easier identification
182
+ within third-party archives.
183
+
184
+ Copyright [yyyy] [name of copyright owner]
185
+
186
+ Licensed under the Apache License, Version 2.0 (the "License");
187
+ you may not use this file except in compliance with the License.
188
+ You may obtain a copy of the License at
189
+
190
+ http://www.apache.org/licenses/LICENSE-2.0
191
+
192
+ Unless required by applicable law or agreed to in writing, software
193
+ distributed under the License is distributed on an "AS IS" BASIS,
194
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
195
+ See the License for the specific language governing permissions and
196
+ limitations under the License.
@@ -0,0 +1,16 @@
1
+ # Documentation And Artifact License
2
+
3
+ Unless otherwise noted, the narrative research materials and benchmark-facing artifacts in this repository are released under the Creative Commons Attribution 4.0 International license.
4
+
5
+ This applies in particular to public-facing materials such as:
6
+
7
+ - `findings/*.md`
8
+ - `benchmarks/public/*`
9
+ - `benchmarks/reference/**`
10
+ - checked-in benchmark bundles and release-facing artifact packets under `artifacts/`
11
+
12
+ License URL:
13
+
14
+ - https://creativecommons.org/licenses/by/4.0/
15
+
16
+ The source code, scripts, schemas, and other software/configuration files in this repository are released under the Apache License 2.0. See `LICENSE`.
@@ -0,0 +1,18 @@
1
+ include README.md
2
+ include LICENSE
3
+ include LICENSE-docs.md
4
+ include CITATION.cff
5
+ include PUBLIC_RELEASE_CHECKLIST.md
6
+ include .zenodo.json
7
+ graft artifacts
8
+ graft benchmarks
9
+ graft findings
10
+ graft orbit
11
+ graft problems
12
+ graft scripts
13
+ graft src
14
+ prune */__pycache__
15
+ global-exclude __pycache__
16
+ global-exclude *.py[cod]
17
+ global-exclude .DS_Store
18
+
@@ -0,0 +1,209 @@
1
+ Metadata-Version: 2.4
2
+ Name: dormant-behavior-audit
3
+ Version: 1.0.0
4
+ Summary: Benchmark assets, reproducibility tooling, and evidence checks for dormant behavior audit.
5
+ Author: Cody Mitchell
6
+ License-Expression: Apache-2.0
7
+ Project-URL: Homepage, https://sproutseeds.github.io/dormant-behavior-audit/
8
+ Project-URL: Repository, https://github.com/SproutSeeds/dormant-behavior-audit
9
+ Project-URL: Documentation, https://sproutseeds.github.io/dormant-behavior-audit/
10
+ Project-URL: Changelog, https://github.com/SproutSeeds/dormant-behavior-audit/releases/tag/v1.0.0
11
+ Project-URL: Issues, https://github.com/SproutSeeds/dormant-behavior-audit/issues
12
+ Keywords: benchmark,llm-evals,model-auditing,reproducibility,dormant-behavior,interpretability
13
+ Classifier: Development Status :: 4 - Beta
14
+ Classifier: Intended Audience :: Science/Research
15
+ Classifier: Intended Audience :: Developers
16
+ Classifier: Programming Language :: Python :: 3
17
+ Classifier: Programming Language :: Python :: 3 :: Only
18
+ Classifier: Programming Language :: Python :: 3.10
19
+ Classifier: Programming Language :: Python :: 3.11
20
+ Classifier: Programming Language :: Python :: 3.12
21
+ Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
22
+ Classifier: Topic :: Software Development :: Libraries :: Python Modules
23
+ Requires-Python: >=3.10
24
+ Description-Content-Type: text/markdown
25
+ License-File: LICENSE
26
+ License-File: LICENSE-docs.md
27
+ Requires-Dist: accelerate>=0.27.0
28
+ Requires-Dist: datasets>=2.18.0
29
+ Requires-Dist: huggingface_hub>=0.21.0
30
+ Requires-Dist: ipywidgets>=8.1.0
31
+ Requires-Dist: jsinfer
32
+ Requires-Dist: matplotlib>=3.8.0
33
+ Requires-Dist: numpy>=1.26.0
34
+ Requires-Dist: pandas>=2.2.0
35
+ Requires-Dist: plotly>=5.18.0
36
+ Requires-Dist: safetensors>=0.4.2
37
+ Requires-Dist: scikit-learn>=1.4.0
38
+ Requires-Dist: scipy>=1.12.0
39
+ Requires-Dist: seaborn>=0.13.0
40
+ Requires-Dist: torch>=2.1.0
41
+ Requires-Dist: tqdm>=4.66.0
42
+ Requires-Dist: transformers>=4.40.0
43
+ Requires-Dist: umap-learn>=0.5.5
44
+ Provides-Extra: notebooks
45
+ Requires-Dist: jupyter>=1.0.0; extra == "notebooks"
46
+ Requires-Dist: notebook>=7.0.0; extra == "notebooks"
47
+ Provides-Extra: tui
48
+ Requires-Dist: textual>=0.58.1; extra == "tui"
49
+ Provides-Extra: publish
50
+ Requires-Dist: build>=1.2.2; extra == "publish"
51
+ Requires-Dist: twine>=5.1.1; extra == "publish"
52
+ Dynamic: license-file
53
+
54
+ # Dormant Behavior Audit
55
+
56
+ This repository contains the flagship benchmark assets, reference bundle, and reproducibility materials for auditing latent, condition-dependent model behavior.
57
+
58
+ The motivating historical case is the Jane Street dormant-model puzzle, but the repo is now organized as a public benchmark and research release rather than a contest-only submission package.
59
+
60
+ ## Slow Tour
61
+
62
+ <p align="center">
63
+ <img src="benchmarks/public/assets/readme-night-terminal.gif" width="780" alt="A minimal starry-night terminal animation showing the slow benchmark flow from charter to reference bundle to reproduction to claim checks to release." />
64
+ </p>
65
+
66
+ <p align="center"><em>A quiet walk through the release path: open the charter, inspect the reference bundle, rerun the evidence, compare claim checks, and package the release.</em></p>
67
+
68
+ ## Start Here
69
+
70
+ If you want the quickest tour, read these in order:
71
+
72
+ 1. [benchmarks/BENCHMARK_CHARTER.md](benchmarks/BENCHMARK_CHARTER.md)
73
+ 2. [findings/RELEASE_PACKET_V2.md](findings/RELEASE_PACKET_V2.md)
74
+ 3. [benchmarks/reference/dormant_puzzle_v1/benchmark_bundle_v0.json](benchmarks/reference/dormant_puzzle_v1/benchmark_bundle_v0.json)
75
+ 4. [PUBLIC_RELEASE_CHECKLIST.md](PUBLIC_RELEASE_CHECKLIST.md)
76
+ 5. [CONTRIBUTING.md](CONTRIBUTING.md)
77
+
78
+ ## Install The CLI
79
+
80
+ The repository now builds as a Python package with a unified `dba` command.
81
+
82
+ ```bash
83
+ pipx install dormant-behavior-audit
84
+ dba --help
85
+ ```
86
+
87
+ For a local one-off run without a permanent install:
88
+
89
+ ```bash
90
+ uvx --from dormant-behavior-audit dba --help
91
+ ```
92
+
93
+ Optional extras:
94
+
95
+ - `pipx install 'dormant-behavior-audit[tui]'` for the Orbit Textual UI
96
+ - `pipx install 'dormant-behavior-audit[notebooks]'` for notebook-heavy local analysis
97
+
98
+ ## What This Repo Ships
99
+
100
+ ### Public-facing research packet
101
+
102
+ - Reference report index: [findings/RELEASE_PACKET_V2.md](findings/RELEASE_PACKET_V2.md)
103
+ - Canonical report PDF: <https://github.com/SproutSeeds/dormant-behavior-audit/releases/download/v1.0.0/dormant-behavior-audit-v1.0.0-reference-report.pdf>
104
+ - Repo copy of report PDF: [findings/CodyMitchell_DormantPuzzle_Submission_V2_2026-03-06.pdf](findings/CodyMitchell_DormantPuzzle_Submission_V2_2026-03-06.pdf)
105
+ - Main report markdown: [findings/SUBMISSION_V2.md](findings/SUBMISSION_V2.md)
106
+ - Statistical appendix: [findings/STATS_ADDENDUM_V2.md](findings/STATS_ADDENDUM_V2.md)
107
+ - Raw evidence appendix: [findings/RAW_EVIDENCE_APPENDIX_V2.md](findings/RAW_EVIDENCE_APPENDIX_V2.md)
108
+ - Implications memo: [findings/IMPLICATIONS_AND_APPLICATIONS_APPENDIX_V2.md](findings/IMPLICATIONS_AND_APPLICATIONS_APPENDIX_V2.md)
109
+
110
+ ### Benchmark assets
111
+
112
+ - Benchmark overview: [benchmarks/README.md](benchmarks/README.md)
113
+ - Benchmark charter: [benchmarks/BENCHMARK_CHARTER.md](benchmarks/BENCHMARK_CHARTER.md)
114
+ - Launch plan: [benchmarks/LAUNCH_PLAN.md](benchmarks/LAUNCH_PLAN.md)
115
+ - Governance/versioning: [benchmarks/GOVERNANCE_AND_VERSIONING.md](benchmarks/GOVERNANCE_AND_VERSIONING.md)
116
+ - Public launch drafts: [benchmarks/public/README.md](benchmarks/public/README.md)
117
+ - Release notes: [benchmarks/public/RELEASE_NOTES_v1.0.0.md](benchmarks/public/RELEASE_NOTES_v1.0.0.md)
118
+ - Collaboration brief: [benchmarks/public/COLLABORATION_BRIEF.md](benchmarks/public/COLLABORATION_BRIEF.md)
119
+ - Standalone homepage: <https://sproutseeds.github.io/dormant-behavior-audit/>
120
+ - Frozen reference bundle: [benchmarks/reference/dormant_puzzle_v1/benchmark_bundle_v0.json](benchmarks/reference/dormant_puzzle_v1/benchmark_bundle_v0.json)
121
+
122
+ ### Reproducibility artifacts
123
+
124
+ - Canonical reproduction bundle: [artifacts/reproduction/20260305_230206/](artifacts/reproduction/20260305_230206/)
125
+ - Tightening bundle: [artifacts/tightening/20260306_075440/](artifacts/tightening/20260306_075440/)
126
+ - Claim-level consistency report: [artifacts/reproduction/20260305_230206/findings/claim_consistency_report.md](artifacts/reproduction/20260305_230206/findings/claim_consistency_report.md)
127
+ - Bundle checker entry point: [scripts/check_benchmark_bundle.py](scripts/check_benchmark_bundle.py)
128
+
129
+ ## Benchmark Shape
130
+
131
+ The current benchmark release has three layers:
132
+
133
+ - core local seeded and clean-control tasks,
134
+ - a naturalistic historical reference bundle built from the dormant puzzle result,
135
+ - and a supplementary hosted-comparator lane used for calibration and mechanism interpretation.
136
+
137
+ The benchmark is designed to reward:
138
+
139
+ - family recovery instead of one lucky string guess,
140
+ - candidate-versus-control specificity,
141
+ - repeated-run stability,
142
+ - interpretation-aware reporting,
143
+ - and artifact-rich submission packets instead of one scalar score.
144
+
145
+ ## Reproducing The Reference Case
146
+
147
+ Install dependencies:
148
+
149
+ ```bash
150
+ pip install -r requirements.txt
151
+ ```
152
+
153
+ Run the reproducibility pipeline:
154
+
155
+ ```bash
156
+ python3 scripts/reproduce_submission.py
157
+ ```
158
+
159
+ This writes a fresh bundle under `artifacts/reproduction/<timestamp>/`.
160
+
161
+ Use these files to judge success:
162
+
163
+ - `artifacts/reproduction/<timestamp>/reproduction_report.md`
164
+ - `artifacts/reproduction/<timestamp>/findings/claim_consistency_report.md`
165
+
166
+ Important notes:
167
+
168
+ - local warmup stages are expected to reproduce on MPS-capable hardware,
169
+ - API-side artifacts are stochastic, so claim-level consistency matters more than exact JSON replay,
170
+ - and `scripts/reproduce_submission.py --warmup-start-stage ...` can resume a late warmup failure without rerunning the entire local sweep.
171
+
172
+ ## Repo Map
173
+
174
+ - `benchmarks/`: benchmark specs, tasks, schemas, public-release drafts, and the normalized reference bundle
175
+ - `findings/`: public report packet, appendices, raw evidence snapshots, and release-facing validation records
176
+ - `artifacts/`: checked-in submission packets, reproduction bundles, tightening bundles, and hosted-baseline outputs
177
+ - `scripts/`: bundle builders, release checkers, reproducibility scripts, and analysis utilities
178
+ - `src/`, `orbit/`, `problems/`: earlier investigation and local-analysis surfaces preserved for provenance and follow-on work
179
+
180
+ ## Release Status
181
+
182
+ The canonical release metadata lives in [benchmarks/public/release_metadata.json](benchmarks/public/release_metadata.json).
183
+
184
+ Current public release URLs:
185
+
186
+ - repo: <https://github.com/SproutSeeds/dormant-behavior-audit>
187
+ - tagged release: <https://github.com/SproutSeeds/dormant-behavior-audit/releases/tag/v1.0.0>
188
+ - canonical reference report PDF: <https://github.com/SproutSeeds/dormant-behavior-audit/releases/download/v1.0.0/dormant-behavior-audit-v1.0.0-reference-report.pdf>
189
+ - canonical reference bundle: <https://github.com/SproutSeeds/dormant-behavior-audit/releases/download/v1.0.0/dormant-behavior-audit-v1.0.0-reference-bundle.json>
190
+ - reference report markdown: <https://github.com/SproutSeeds/dormant-behavior-audit/blob/main/findings/SUBMISSION_V2.md>
191
+ - benchmark homepage: <https://sproutseeds.github.io/dormant-behavior-audit/>
192
+
193
+ The working launch checklist is still preserved in [PUBLIC_RELEASE_CHECKLIST.md](PUBLIC_RELEASE_CHECKLIST.md) as the release record.
194
+
195
+ ## Licensing
196
+
197
+ - Code, scripts, and schemas: `Apache-2.0` via [LICENSE](LICENSE)
198
+ - Public-facing reports, benchmark docs, and release artifacts: `CC BY 4.0` via [LICENSE-docs.md](LICENSE-docs.md)
199
+
200
+ ## Related Docs
201
+
202
+ - Public release checklist: [PUBLIC_RELEASE_CHECKLIST.md](PUBLIC_RELEASE_CHECKLIST.md)
203
+ - PyPI publishing guide: [PYPI_PUBLISHING.md](PYPI_PUBLISHING.md)
204
+ - Contributing guide: [CONTRIBUTING.md](CONTRIBUTING.md)
205
+ - Findings guide: [findings/README.md](findings/README.md)
206
+ - Collaboration brief: [benchmarks/public/COLLABORATION_BRIEF.md](benchmarks/public/COLLABORATION_BRIEF.md)
207
+ - Benchmark governance: [benchmarks/GOVERNANCE_AND_VERSIONING.md](benchmarks/GOVERNANCE_AND_VERSIONING.md)
208
+ - External platform status: [benchmarks/public/EXTERNAL_PLATFORM_STATUS.md](benchmarks/public/EXTERNAL_PLATFORM_STATUS.md)
209
+ - Hugging Face publish guide: [benchmarks/public/HUGGINGFACE_PUBLISHING.md](benchmarks/public/HUGGINGFACE_PUBLISHING.md)
@@ -0,0 +1,107 @@
1
+ # Public Release Checklist
2
+
3
+ This checklist now serves as the public release ledger for the initial `Dormant Behavior Audit` launch. The repository is live, and this document records what has already been frozen and what still deserves follow-on polish.
4
+
5
+ ## 1. Public Identity
6
+
7
+ Current state:
8
+
9
+ - public repo name: `Dormant Behavior Audit`
10
+ - flagship report framing: dormant puzzle as the reference case
11
+ - public repo URL: `https://github.com/SproutSeeds/dormant-behavior-audit`
12
+
13
+ Follow-on items:
14
+
15
+ - confirm the long-form public paper title and canonical PDF filename
16
+ - finalize acknowledgments and corresponding contact if needed
17
+
18
+ ## 2. Citable Artifacts
19
+
20
+ Completed:
21
+
22
+ - canonical packet index at `findings/RELEASE_PACKET_V2.md`
23
+ - frozen benchmark bundle at `benchmarks/reference/dormant_puzzle_v1/benchmark_bundle_v0.json`
24
+ - citation metadata at `CITATION.cff`
25
+ - canonical public report PDF URL via the tagged release asset
26
+ - first formal tagged release published as `v1.0.0`
27
+
28
+ Follow-on items:
29
+
30
+ - mirror the canonical PDF to an external paper host when ready
31
+
32
+ ## 3. Repo Front Door
33
+
34
+ Completed:
35
+
36
+ - `README.md` is benchmark-first and public-facing
37
+ - `findings/README.md` is the public findings navigation layer
38
+ - public-facing licensing and citation files are present
39
+
40
+ Follow-on items:
41
+
42
+ - continue tightening public wording where any contest-era language leaks through
43
+ - add richer external landing pages if the project gets a standalone site
44
+
45
+ ## 4. Release Switch
46
+
47
+ Completed:
48
+
49
+ - public URLs are set in `benchmarks/public/release_metadata.json`
50
+ - release status is `public`
51
+ - release metadata checks have been regenerated
52
+ - announcement date has been set for the initial public launch
53
+ - release metadata now names the formal public tag and release URL
54
+
55
+ ## 5. Integrity Checks
56
+
57
+ Recommended recheck cadence:
58
+
59
+ - rerun the reproduction path with `python3 scripts/reproduce_submission.py` before any major tagged release
60
+ - rerun bundle and release metadata checks whenever release-facing assets move
61
+ - reconfirm claim-level consistency after any evidence-packet change
62
+ - verify that the benchmark scoreboard, packet index, and appendices still tell the same story
63
+
64
+ Definition of done:
65
+ - an external reader can tell what is reproducible,
66
+ - what is stochastic,
67
+ - and what evidence supports each major claim.
68
+
69
+ ## 6. Publish In Layers
70
+
71
+ Recommended release stack:
72
+
73
+ 1. GitHub repo update plus tagged release
74
+ 2. Public PDF/report linked from the repo
75
+ 3. Benchmark landing assets in `benchmarks/public/`
76
+ 4. Hugging Face dataset/benchmark card
77
+ 5. Papers with Code benchmark/task pages
78
+ 6. Short announcement post and outreach note
79
+
80
+ ## 7. Collaboration Packet
81
+
82
+ Completed:
83
+
84
+ - one-page overview at `benchmarks/public/COLLABORATION_BRIEF.md`
85
+ - public benchmark summary and announcement drafts in `benchmarks/public/`
86
+
87
+ Follow-on items:
88
+
89
+ - tailor one short outreach note per audience once the paper URL is final
90
+ - add issue templates for external replication and benchmark proposals if inbound volume grows
91
+
92
+ ## 8. Current Gaps
93
+
94
+ The highest-value remaining gaps are:
95
+
96
+ - no Hugging Face or Papers with Code pages have been published yet
97
+ - no external paper host mirrors the report yet
98
+
99
+ ## 9. What To Do Next
100
+
101
+ The highest-value next sequence is:
102
+
103
+ 1. publish the discoverability surfaces,
104
+ 2. mirror the report on an external paper host,
105
+ 3. rerun the integrity checks before major updates,
106
+ 4. begin active collaboration outreach,
107
+ 5. and keep the standalone homepage aligned with major tagged releases.