agentsec-eval 0.9.1__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (361) hide show
  1. agentsec_eval-0.9.1/.gitignore +17 -0
  2. agentsec_eval-0.9.1/AUTHORIZATION.txt.example +28 -0
  3. agentsec_eval-0.9.1/CHANGELOG.md +377 -0
  4. agentsec_eval-0.9.1/LICENSE +21 -0
  5. agentsec_eval-0.9.1/PKG-INFO +653 -0
  6. agentsec_eval-0.9.1/README.md +608 -0
  7. agentsec_eval-0.9.1/ROADMAP.md +82 -0
  8. agentsec_eval-0.9.1/docs/adr/0001-adapter-pattern-for-target-agents.md +30 -0
  9. agentsec_eval-0.9.1/docs/adr/0002-llm-as-judge-evaluation.md +33 -0
  10. agentsec_eval-0.9.1/docs/adr/0003-openclaw-comprehensive-evaluation.md +63 -0
  11. agentsec_eval-0.9.1/docs/adr/0004-remote-tilde-path-resolution.md +91 -0
  12. agentsec_eval-0.9.1/docs/adr/0005-community-suite-marketplace.md +47 -0
  13. agentsec_eval-0.9.1/docs/adr/_template.md +25 -0
  14. agentsec_eval-0.9.1/docs/audits/openclaw-evaluation-2026-04-25/audit-results-round-2.md +278 -0
  15. agentsec_eval-0.9.1/docs/audits/openclaw-evaluation-2026-04-25/audit-results-round-3.md +230 -0
  16. agentsec_eval-0.9.1/docs/audits/openclaw-evaluation-2026-04-25/audit-results-round-4.md +110 -0
  17. agentsec_eval-0.9.1/docs/audits/openclaw-evaluation-2026-04-25/audit-results.md +159 -0
  18. agentsec_eval-0.9.1/docs/audits/openclaw-evaluation-2026-04-25/upgrade-report.md +210 -0
  19. agentsec_eval-0.9.1/docs/guide/community-suites.md +83 -0
  20. agentsec_eval-0.9.1/docs/guide/contributing-a-suite.md +107 -0
  21. agentsec_eval-0.9.1/docs/guide/getting-started.md +156 -0
  22. agentsec_eval-0.9.1/docs/guide/working-with-claude.md +180 -0
  23. agentsec_eval-0.9.1/docs/guide/writing-adapters.md +90 -0
  24. agentsec_eval-0.9.1/docs/guide/writing-test-cases.md +226 -0
  25. agentsec_eval-0.9.1/docs/samples/diff-2026-04-30-self/diff.md +29 -0
  26. agentsec_eval-0.9.1/docs/samples/report-2026-04-28/README.md +66 -0
  27. agentsec_eval-0.9.1/docs/samples/report-2026-04-28/audit-log.jsonl +1 -0
  28. agentsec_eval-0.9.1/docs/samples/report-2026-04-28/combined-report.md +34 -0
  29. agentsec_eval-0.9.1/docs/samples/report-2026-04-28/findings.json +30 -0
  30. agentsec_eval-0.9.1/docs/samples/report-2026-04-28/remote-report.md +55 -0
  31. agentsec_eval-0.9.1/docs/samples/report-2026-04-28/server-audit-report.md +96 -0
  32. agentsec_eval-0.9.1/docs/samples/report-2026-04-28/threat-intel-snapshot.yaml +13 -0
  33. agentsec_eval-0.9.1/docs/specs/openclaw-evaluation.md +895 -0
  34. agentsec_eval-0.9.1/docs/superpowers/plans/2026-04-25-openclaw-evaluation.md +2136 -0
  35. agentsec_eval-0.9.1/docs/superpowers/plans/2026-04-26-stage-b.md +2377 -0
  36. agentsec_eval-0.9.1/docs/superpowers/plans/2026-04-26-stage-c.md +3492 -0
  37. agentsec_eval-0.9.1/docs/superpowers/plans/2026-04-27-stage-d.md +4264 -0
  38. agentsec_eval-0.9.1/docs/superpowers/plans/2026-04-27-stage-e.md +3159 -0
  39. agentsec_eval-0.9.1/docs/superpowers/plans/2026-04-28-ioc-update.md +4173 -0
  40. agentsec_eval-0.9.1/docs/superpowers/plans/2026-04-28-stage-f.md +3105 -0
  41. agentsec_eval-0.9.1/docs/superpowers/plans/2026-04-28-stage-g.md +1002 -0
  42. agentsec_eval-0.9.1/docs/superpowers/plans/2026-04-29-critical-cve-test-coverage.md +727 -0
  43. agentsec_eval-0.9.1/docs/superpowers/plans/2026-04-29-high-cve-test-coverage.md +2039 -0
  44. agentsec_eval-0.9.1/docs/superpowers/plans/2026-04-30-agentsec-diff.md +2202 -0
  45. agentsec_eval-0.9.1/docs/superpowers/plans/2026-05-04-agent-chain-privesc.md +519 -0
  46. agentsec_eval-0.9.1/docs/superpowers/plans/2026-05-04-community-suite-marketplace.md +3210 -0
  47. agentsec_eval-0.9.1/docs/superpowers/plans/2026-05-04-cross-channel.md +1371 -0
  48. agentsec_eval-0.9.1/docs/superpowers/plans/2026-05-04-custom-judge.md +976 -0
  49. agentsec_eval-0.9.1/docs/superpowers/plans/2026-05-04-multi-target-evaluate.md +1010 -0
  50. agentsec_eval-0.9.1/docs/superpowers/plans/2026-05-04-plugin-ast-grep.md +590 -0
  51. agentsec_eval-0.9.1/docs/superpowers/plans/2026-05-04-report-viewer.md +918 -0
  52. agentsec_eval-0.9.1/docs/superpowers/plans/2026-05-04-ws-exposure-probe.md +394 -0
  53. agentsec_eval-0.9.1/docs/superpowers/specs/2026-04-28-ioc-update-design.md +682 -0
  54. agentsec_eval-0.9.1/docs/superpowers/specs/2026-04-29-critical-cve-test-coverage-design.md +355 -0
  55. agentsec_eval-0.9.1/docs/superpowers/specs/2026-04-29-high-cve-test-coverage-design.md +309 -0
  56. agentsec_eval-0.9.1/docs/superpowers/specs/2026-04-30-agentsec-diff-design.md +656 -0
  57. agentsec_eval-0.9.1/docs/superpowers/specs/2026-05-04-agent-chain-privesc-design.md +129 -0
  58. agentsec_eval-0.9.1/docs/superpowers/specs/2026-05-04-community-suite-marketplace-design.md +466 -0
  59. agentsec_eval-0.9.1/docs/superpowers/specs/2026-05-04-cross-channel-design.md +234 -0
  60. agentsec_eval-0.9.1/docs/superpowers/specs/2026-05-04-custom-judge-design.md +286 -0
  61. agentsec_eval-0.9.1/docs/superpowers/specs/2026-05-04-multi-target-evaluate-design.md +299 -0
  62. agentsec_eval-0.9.1/docs/superpowers/specs/2026-05-04-plugin-ast-grep-design.md +301 -0
  63. agentsec_eval-0.9.1/docs/superpowers/specs/2026-05-04-report-viewer-design.md +287 -0
  64. agentsec_eval-0.9.1/docs/superpowers/specs/2026-05-04-ws-exposure-probe-design.md +209 -0
  65. agentsec_eval-0.9.1/openclaw-target.example.yaml +83 -0
  66. agentsec_eval-0.9.1/pyproject.toml +106 -0
  67. agentsec_eval-0.9.1/scripts/build_cve_db.py +97 -0
  68. agentsec_eval-0.9.1/scripts/check_community_index.py +56 -0
  69. agentsec_eval-0.9.1/scripts/check_threat_refs.py +198 -0
  70. agentsec_eval-0.9.1/scripts/coverage_report.py +93 -0
  71. agentsec_eval-0.9.1/scripts/edit_threat_intel.py +113 -0
  72. agentsec_eval-0.9.1/src/agentsec/__init__.py +1 -0
  73. agentsec_eval-0.9.1/src/agentsec/adapters/__init__.py +4 -0
  74. agentsec_eval-0.9.1/src/agentsec/adapters/base.py +38 -0
  75. agentsec_eval-0.9.1/src/agentsec/adapters/http.py +101 -0
  76. agentsec_eval-0.9.1/src/agentsec/adapters/openclaw_gateway.py +70 -0
  77. agentsec_eval-0.9.1/src/agentsec/adapters/registry.py +43 -0
  78. agentsec_eval-0.9.1/src/agentsec/adapters/ws_adapter.py +122 -0
  79. agentsec_eval-0.9.1/src/agentsec/audit/__init__.py +7 -0
  80. agentsec_eval-0.9.1/src/agentsec/audit/args_schema.py +134 -0
  81. agentsec_eval-0.9.1/src/agentsec/audit/authorization.py +160 -0
  82. agentsec_eval-0.9.1/src/agentsec/audit/checks/__init__.py +0 -0
  83. agentsec_eval-0.9.1/src/agentsec/audit/checks/active_test.py +156 -0
  84. agentsec_eval-0.9.1/src/agentsec/audit/checks/base.py +103 -0
  85. agentsec_eval-0.9.1/src/agentsec/audit/checks/config_audit.py +98 -0
  86. agentsec_eval-0.9.1/src/agentsec/audit/checks/config_baseline.yaml +20 -0
  87. agentsec_eval-0.9.1/src/agentsec/audit/checks/credential_audit.py +117 -0
  88. agentsec_eval-0.9.1/src/agentsec/audit/checks/data/active-test-canary.yaml +11 -0
  89. agentsec_eval-0.9.1/src/agentsec/audit/checks/exposure_scan.py +131 -0
  90. agentsec_eval-0.9.1/src/agentsec/audit/checks/filesystem.py +61 -0
  91. agentsec_eval-0.9.1/src/agentsec/audit/checks/log_review.py +116 -0
  92. agentsec_eval-0.9.1/src/agentsec/audit/checks/native_audit.py +106 -0
  93. agentsec_eval-0.9.1/src/agentsec/audit/checks/plugin_static.py +224 -0
  94. agentsec_eval-0.9.1/src/agentsec/audit/checks/process_forensics.py +106 -0
  95. agentsec_eval-0.9.1/src/agentsec/audit/checks/version_patch.py +204 -0
  96. agentsec_eval-0.9.1/src/agentsec/audit/command_policy.py +113 -0
  97. agentsec_eval-0.9.1/src/agentsec/audit/findings.py +35 -0
  98. agentsec_eval-0.9.1/src/agentsec/audit/ioc/__init__.py +9 -0
  99. agentsec_eval-0.9.1/src/agentsec/audit/ioc/attack_signatures.yaml +34 -0
  100. agentsec_eval-0.9.1/src/agentsec/audit/ioc/clawhavoc_skills.json +37 -0
  101. agentsec_eval-0.9.1/src/agentsec/audit/ioc/cve_database.json +2442 -0
  102. agentsec_eval-0.9.1/src/agentsec/audit/ioc/threat_intel.yaml +2675 -0
  103. agentsec_eval-0.9.1/src/agentsec/audit/ioc/watchlist.yaml +37 -0
  104. agentsec_eval-0.9.1/src/agentsec/audit/ioc_update/__init__.py +1 -0
  105. agentsec_eval-0.9.1/src/agentsec/audit/ioc_update/cli.py +335 -0
  106. agentsec_eval-0.9.1/src/agentsec/audit/ioc_update/fetchers/__init__.py +5 -0
  107. agentsec_eval-0.9.1/src/agentsec/audit/ioc_update/fetchers/base.py +106 -0
  108. agentsec_eval-0.9.1/src/agentsec/audit/ioc_update/fetchers/ghsa.py +145 -0
  109. agentsec_eval-0.9.1/src/agentsec/audit/ioc_update/fetchers/kev.py +120 -0
  110. agentsec_eval-0.9.1/src/agentsec/audit/ioc_update/fetchers/nvd.py +144 -0
  111. agentsec_eval-0.9.1/src/agentsec/audit/ioc_update/id_minter.py +22 -0
  112. agentsec_eval-0.9.1/src/agentsec/audit/ioc_update/merger.py +223 -0
  113. agentsec_eval-0.9.1/src/agentsec/audit/ioc_update/normalizer.py +108 -0
  114. agentsec_eval-0.9.1/src/agentsec/audit/ioc_update/renderer.py +215 -0
  115. agentsec_eval-0.9.1/src/agentsec/audit/ioc_update/types.py +73 -0
  116. agentsec_eval-0.9.1/src/agentsec/audit/ioc_update/watchlist.py +60 -0
  117. agentsec_eval-0.9.1/src/agentsec/audit/metachar_guard.py +21 -0
  118. agentsec_eval-0.9.1/src/agentsec/audit/path_matcher.py +87 -0
  119. agentsec_eval-0.9.1/src/agentsec/audit/platform_profile.py +124 -0
  120. agentsec_eval-0.9.1/src/agentsec/audit/redactor.py +62 -0
  121. agentsec_eval-0.9.1/src/agentsec/audit/remote_home.py +37 -0
  122. agentsec_eval-0.9.1/src/agentsec/audit/runtime_probe.py +85 -0
  123. agentsec_eval-0.9.1/src/agentsec/audit/server_audit.py +41 -0
  124. agentsec_eval-0.9.1/src/agentsec/audit/snapshot.py +54 -0
  125. agentsec_eval-0.9.1/src/agentsec/audit/ssh.py +188 -0
  126. agentsec_eval-0.9.1/src/agentsec/audit/ssh_policy.yaml +134 -0
  127. agentsec_eval-0.9.1/src/agentsec/audit/tunnel.py +117 -0
  128. agentsec_eval-0.9.1/src/agentsec/audit/types.py +22 -0
  129. agentsec_eval-0.9.1/src/agentsec/cli.py +1069 -0
  130. agentsec_eval-0.9.1/src/agentsec/config.py +219 -0
  131. agentsec_eval-0.9.1/src/agentsec/diff/__init__.py +40 -0
  132. agentsec_eval-0.9.1/src/agentsec/diff/findings_delta.py +110 -0
  133. agentsec_eval-0.9.1/src/agentsec/diff/loader.py +115 -0
  134. agentsec_eval-0.9.1/src/agentsec/diff/renderer.py +187 -0
  135. agentsec_eval-0.9.1/src/agentsec/diff/score_delta.py +245 -0
  136. agentsec_eval-0.9.1/src/agentsec/evaluator/__init__.py +26 -0
  137. agentsec_eval-0.9.1/src/agentsec/evaluator/assertions/__init__.py +28 -0
  138. agentsec_eval-0.9.1/src/agentsec/evaluator/assertions/base.py +20 -0
  139. agentsec_eval-0.9.1/src/agentsec/evaluator/assertions/config_key_not_changed.py +134 -0
  140. agentsec_eval-0.9.1/src/agentsec/evaluator/assertions/file_not_created.py +49 -0
  141. agentsec_eval-0.9.1/src/agentsec/evaluator/assertions/json_path_equals.py +55 -0
  142. agentsec_eval-0.9.1/src/agentsec/evaluator/assertions/outbound_request_not_to.py +34 -0
  143. agentsec_eval-0.9.1/src/agentsec/evaluator/assertions/registry.py +39 -0
  144. agentsec_eval-0.9.1/src/agentsec/evaluator/assertions/response_not_contains_pattern.py +49 -0
  145. agentsec_eval-0.9.1/src/agentsec/evaluator/assertions/response_status_in.py +40 -0
  146. agentsec_eval-0.9.1/src/agentsec/evaluator/assertions/tool_event_not_invoked.py +65 -0
  147. agentsec_eval-0.9.1/src/agentsec/evaluator/base.py +46 -0
  148. agentsec_eval-0.9.1/src/agentsec/evaluator/deterministic_judge.py +78 -0
  149. agentsec_eval-0.9.1/src/agentsec/evaluator/hybrid_judge.py +87 -0
  150. agentsec_eval-0.9.1/src/agentsec/evaluator/judge.py +96 -0
  151. agentsec_eval-0.9.1/src/agentsec/evaluator/judge_factory.py +96 -0
  152. agentsec_eval-0.9.1/src/agentsec/evaluator/judge_router.py +23 -0
  153. agentsec_eval-0.9.1/src/agentsec/evaluator/no_op_judge.py +37 -0
  154. agentsec_eval-0.9.1/src/agentsec/evaluator/openai_judge.py +101 -0
  155. agentsec_eval-0.9.1/src/agentsec/evaluator/plugin_judge.py +38 -0
  156. agentsec_eval-0.9.1/src/agentsec/observability/__init__.py +19 -0
  157. agentsec_eval-0.9.1/src/agentsec/observability/fixture_server.py +190 -0
  158. agentsec_eval-0.9.1/src/agentsec/observability/network_observer.py +144 -0
  159. agentsec_eval-0.9.1/src/agentsec/observability/observation.py +22 -0
  160. agentsec_eval-0.9.1/src/agentsec/observability/runtime.py +99 -0
  161. agentsec_eval-0.9.1/src/agentsec/reports/__init__.py +17 -0
  162. agentsec_eval-0.9.1/src/agentsec/reports/json_report.py +54 -0
  163. agentsec_eval-0.9.1/src/agentsec/reports/markdown.py +397 -0
  164. agentsec_eval-0.9.1/src/agentsec/reports/multi_summary.py +89 -0
  165. agentsec_eval-0.9.1/src/agentsec/runner.py +245 -0
  166. agentsec_eval-0.9.1/src/agentsec/scoring.py +430 -0
  167. agentsec_eval-0.9.1/src/agentsec/serve/__init__.py +0 -0
  168. agentsec_eval-0.9.1/src/agentsec/serve/app.py +115 -0
  169. agentsec_eval-0.9.1/src/agentsec/serve/reader.py +127 -0
  170. agentsec_eval-0.9.1/src/agentsec/serve/templates/base.html +41 -0
  171. agentsec_eval-0.9.1/src/agentsec/serve/templates/dashboard.html +31 -0
  172. agentsec_eval-0.9.1/src/agentsec/serve/templates/report.html +13 -0
  173. agentsec_eval-0.9.1/src/agentsec/serve/templates/target.html +132 -0
  174. agentsec_eval-0.9.1/src/agentsec/suite_registry/__init__.py +54 -0
  175. agentsec_eval-0.9.1/src/agentsec/suite_registry/community-suites.yaml +9 -0
  176. agentsec_eval-0.9.1/src/agentsec/suite_registry/errors.py +49 -0
  177. agentsec_eval-0.9.1/src/agentsec/suite_registry/fetcher.py +49 -0
  178. agentsec_eval-0.9.1/src/agentsec/suite_registry/hashing.py +40 -0
  179. agentsec_eval-0.9.1/src/agentsec/suite_registry/installer.py +191 -0
  180. agentsec_eval-0.9.1/src/agentsec/suite_registry/manifest.py +138 -0
  181. agentsec_eval-0.9.1/src/agentsec/suite_registry/registry.py +94 -0
  182. agentsec_eval-0.9.1/src/agentsec/suite_registry/spdx_licenses.txt +30 -0
  183. agentsec_eval-0.9.1/src/agentsec/suite_registry/store.py +108 -0
  184. agentsec_eval-0.9.1/src/agentsec/tests/__init__.py +4 -0
  185. agentsec_eval-0.9.1/src/agentsec/tests/loader.py +103 -0
  186. agentsec_eval-0.9.1/src/agentsec/tests/models.py +118 -0
  187. agentsec_eval-0.9.1/test_suites/data_leakage.yaml +33 -0
  188. agentsec_eval-0.9.1/test_suites/openclaw/01_direct_prompt_injection.yaml +47 -0
  189. agentsec_eval-0.9.1/test_suites/openclaw/02_indirect_prompt_injection.yaml +82 -0
  190. agentsec_eval-0.9.1/test_suites/openclaw/03_memory_poisoning.yaml +80 -0
  191. agentsec_eval-0.9.1/test_suites/openclaw/04_tool_abuse.yaml +298 -0
  192. agentsec_eval-0.9.1/test_suites/openclaw/05_consent_bypass.yaml +96 -0
  193. agentsec_eval-0.9.1/test_suites/openclaw/06_auth_authz_bypass.yaml +232 -0
  194. agentsec_eval-0.9.1/test_suites/openclaw/07_ssrf.yaml +112 -0
  195. agentsec_eval-0.9.1/test_suites/openclaw/08_extension_auth.yaml +100 -0
  196. agentsec_eval-0.9.1/test_suites/openclaw/09_supply_chain.yaml +32 -0
  197. agentsec_eval-0.9.1/test_suites/openclaw/10_agent_chain_privesc.yaml +154 -0
  198. agentsec_eval-0.9.1/test_suites/openclaw/11_cross_channel.yaml +154 -0
  199. agentsec_eval-0.9.1/test_suites/openclaw/_stage_b_smoke.yaml +41 -0
  200. agentsec_eval-0.9.1/test_suites/openclaw/fixtures/ipi-001.html +4 -0
  201. agentsec_eval-0.9.1/test_suites/openclaw/fixtures/ipi-002.html +6 -0
  202. agentsec_eval-0.9.1/test_suites/openclaw/fixtures/ipi-003.html +4 -0
  203. agentsec_eval-0.9.1/test_suites/openclaw/fixtures/ipi-004.txt +5 -0
  204. agentsec_eval-0.9.1/test_suites/openclaw/fixtures/ipi-005.html +4 -0
  205. agentsec_eval-0.9.1/test_suites/openclaw/fixtures/ssrf-redir.html +4 -0
  206. agentsec_eval-0.9.1/test_suites/prompt_injection.yaml +34 -0
  207. agentsec_eval-0.9.1/test_suites/tool_abuse.yaml +23 -0
  208. agentsec_eval-0.9.1/tests/_openclaw_categories.py +20 -0
  209. agentsec_eval-0.9.1/tests/audit/__init__.py +0 -0
  210. agentsec_eval-0.9.1/tests/audit/checks/__init__.py +0 -0
  211. agentsec_eval-0.9.1/tests/audit/checks/fixtures/openclaw_audit_sample.json +25 -0
  212. agentsec_eval-0.9.1/tests/audit/checks/test_active_test.py +77 -0
  213. agentsec_eval-0.9.1/tests/audit/checks/test_active_test_not_applicable.py +7 -0
  214. agentsec_eval-0.9.1/tests/audit/checks/test_check_base.py +78 -0
  215. agentsec_eval-0.9.1/tests/audit/checks/test_config_audit.py +66 -0
  216. agentsec_eval-0.9.1/tests/audit/checks/test_config_audit_not_applicable.py +38 -0
  217. agentsec_eval-0.9.1/tests/audit/checks/test_credential_audit.py +96 -0
  218. agentsec_eval-0.9.1/tests/audit/checks/test_exposure_scan.py +185 -0
  219. agentsec_eval-0.9.1/tests/audit/checks/test_filesystem.py +73 -0
  220. agentsec_eval-0.9.1/tests/audit/checks/test_log_review.py +127 -0
  221. agentsec_eval-0.9.1/tests/audit/checks/test_native_audit.py +108 -0
  222. agentsec_eval-0.9.1/tests/audit/checks/test_native_audit_not_applicable.py +38 -0
  223. agentsec_eval-0.9.1/tests/audit/checks/test_plugin_static.py +205 -0
  224. agentsec_eval-0.9.1/tests/audit/checks/test_process_forensics.py +122 -0
  225. agentsec_eval-0.9.1/tests/audit/checks/test_version_patch.py +73 -0
  226. agentsec_eval-0.9.1/tests/audit/checks/test_version_patch_enrichment.py +46 -0
  227. agentsec_eval-0.9.1/tests/audit/checks/test_version_patch_runs_on_both_profiles.py +47 -0
  228. agentsec_eval-0.9.1/tests/audit/fixtures/__init__.py +0 -0
  229. agentsec_eval-0.9.1/tests/audit/fixtures/auth_full.txt +11 -0
  230. agentsec_eval-0.9.1/tests/audit/fixtures/auth_minimal.txt +9 -0
  231. agentsec_eval-0.9.1/tests/audit/test_args_schema.py +139 -0
  232. agentsec_eval-0.9.1/tests/audit/test_attack_signatures.py +24 -0
  233. agentsec_eval-0.9.1/tests/audit/test_authorization_canonical.py +65 -0
  234. agentsec_eval-0.9.1/tests/audit/test_authorization_parse.py +39 -0
  235. agentsec_eval-0.9.1/tests/audit/test_authorization_validate.py +208 -0
  236. agentsec_eval-0.9.1/tests/audit/test_clawhavoc_skills.py +24 -0
  237. agentsec_eval-0.9.1/tests/audit/test_cli_server_audit.py +290 -0
  238. agentsec_eval-0.9.1/tests/audit/test_command_policy.py +73 -0
  239. agentsec_eval-0.9.1/tests/audit/test_cve_database.py +35 -0
  240. agentsec_eval-0.9.1/tests/audit/test_findings.py +40 -0
  241. agentsec_eval-0.9.1/tests/audit/test_metachar_guard.py +28 -0
  242. agentsec_eval-0.9.1/tests/audit/test_path_matcher.py +111 -0
  243. agentsec_eval-0.9.1/tests/audit/test_platform_profile.py +68 -0
  244. agentsec_eval-0.9.1/tests/audit/test_policy_yaml_loadable.py +45 -0
  245. agentsec_eval-0.9.1/tests/audit/test_remote_home.py +36 -0
  246. agentsec_eval-0.9.1/tests/audit/test_server_audit_orchestrator.py +93 -0
  247. agentsec_eval-0.9.1/tests/audit/test_snapshot_collector.py +119 -0
  248. agentsec_eval-0.9.1/tests/audit/test_ssh_executor.py +160 -0
  249. agentsec_eval-0.9.1/tests/audit/test_tunnel.py +174 -0
  250. agentsec_eval-0.9.1/tests/audit/test_types.py +37 -0
  251. agentsec_eval-0.9.1/tests/cli/__init__.py +0 -0
  252. agentsec_eval-0.9.1/tests/cli/test_suite_commands.py +277 -0
  253. agentsec_eval-0.9.1/tests/fixtures/ioc-update/all-feeds-ok/README.md +8 -0
  254. agentsec_eval-0.9.1/tests/fixtures/ioc-update/all-feeds-ok/_inputs/threat_intel.yaml +11 -0
  255. agentsec_eval-0.9.1/tests/fixtures/ioc-update/all-feeds-ok/_inputs/watchlist.yaml +5 -0
  256. agentsec_eval-0.9.1/tests/fixtures/ioc-update/all-feeds-ok/ghsa/openclaw.json +7 -0
  257. agentsec_eval-0.9.1/tests/fixtures/ioc-update/all-feeds-ok/kev/catalog.json +6 -0
  258. agentsec_eval-0.9.1/tests/fixtures/ioc-update/all-feeds-ok/nvd/openclaw.json +24 -0
  259. agentsec_eval-0.9.1/tests/fixtures/ioc-update/auto-refresh-locked/_inputs/threat_intel.yaml +13 -0
  260. agentsec_eval-0.9.1/tests/fixtures/ioc-update/auto-refresh-locked/_inputs/watchlist.yaml +5 -0
  261. agentsec_eval-0.9.1/tests/fixtures/ioc-update/auto-refresh-locked/ghsa/openclaw.json +1 -0
  262. agentsec_eval-0.9.1/tests/fixtures/ioc-update/auto-refresh-locked/kev/catalog.json +5 -0
  263. agentsec_eval-0.9.1/tests/fixtures/ioc-update/auto-refresh-locked/nvd/openclaw.json +14 -0
  264. agentsec_eval-0.9.1/tests/fixtures/multi-target.example.yaml +34 -0
  265. agentsec_eval-0.9.1/tests/fixtures/openclaw-target.example.yaml +35 -0
  266. agentsec_eval-0.9.1/tests/integration/__init__.py +0 -0
  267. agentsec_eval-0.9.1/tests/integration/test_diff_end_to_end.py +103 -0
  268. agentsec_eval-0.9.1/tests/integration/test_evaluate_smoke.py +219 -0
  269. agentsec_eval-0.9.1/tests/integration/test_ioc_update_deterministic.py +90 -0
  270. agentsec_eval-0.9.1/tests/integration/test_ioc_update_mock_http.py +57 -0
  271. agentsec_eval-0.9.1/tests/integration/test_ioc_update_offline.py +99 -0
  272. agentsec_eval-0.9.1/tests/integration/test_multi_evaluate.py +141 -0
  273. agentsec_eval-0.9.1/tests/integration/test_openclaw_5case_smoke.py +99 -0
  274. agentsec_eval-0.9.1/tests/integration/test_openclaw_mock.py +348 -0
  275. agentsec_eval-0.9.1/tests/integration/test_openclaw_stage_c_smoke.py +287 -0
  276. agentsec_eval-0.9.1/tests/integration/test_openclaw_stage_d_smoke.py +116 -0
  277. agentsec_eval-0.9.1/tests/integration/test_openclaw_stage_e_smoke.py +365 -0
  278. agentsec_eval-0.9.1/tests/integration/test_suite_install_e2e.py +85 -0
  279. agentsec_eval-0.9.1/tests/serve/__init__.py +0 -0
  280. agentsec_eval-0.9.1/tests/serve/test_app.py +106 -0
  281. agentsec_eval-0.9.1/tests/serve/test_reader.py +96 -0
  282. agentsec_eval-0.9.1/tests/suite_registry/__init__.py +0 -0
  283. agentsec_eval-0.9.1/tests/suite_registry/test_fetcher.py +111 -0
  284. agentsec_eval-0.9.1/tests/suite_registry/test_hashing.py +88 -0
  285. agentsec_eval-0.9.1/tests/suite_registry/test_installer.py +358 -0
  286. agentsec_eval-0.9.1/tests/suite_registry/test_manifest.py +158 -0
  287. agentsec_eval-0.9.1/tests/suite_registry/test_registry.py +76 -0
  288. agentsec_eval-0.9.1/tests/suite_registry/test_store.py +112 -0
  289. agentsec_eval-0.9.1/tests/test_adapter_registry.py +55 -0
  290. agentsec_eval-0.9.1/tests/test_agent_chain_suite_loadable.py +74 -0
  291. agentsec_eval-0.9.1/tests/test_agent_response.py +39 -0
  292. agentsec_eval-0.9.1/tests/test_assertion_base.py +49 -0
  293. agentsec_eval-0.9.1/tests/test_assertion_config_key.py +189 -0
  294. agentsec_eval-0.9.1/tests/test_assertion_file_not_created.py +56 -0
  295. agentsec_eval-0.9.1/tests/test_assertion_json_path.py +36 -0
  296. agentsec_eval-0.9.1/tests/test_assertion_outbound.py +53 -0
  297. agentsec_eval-0.9.1/tests/test_assertion_registry.py +87 -0
  298. agentsec_eval-0.9.1/tests/test_assertion_response_pattern.py +51 -0
  299. agentsec_eval-0.9.1/tests/test_assertion_response_status.py +38 -0
  300. agentsec_eval-0.9.1/tests/test_assertion_tool_event.py +68 -0
  301. agentsec_eval-0.9.1/tests/test_cli_adapter_option.py +173 -0
  302. agentsec_eval-0.9.1/tests/test_cli_evaluate.py +118 -0
  303. agentsec_eval-0.9.1/tests/test_cli_judge_router.py +232 -0
  304. agentsec_eval-0.9.1/tests/test_cli_multi_evaluate_unit.py +73 -0
  305. agentsec_eval-0.9.1/tests/test_cli_observer_options.py +136 -0
  306. agentsec_eval-0.9.1/tests/test_config_model.py +457 -0
  307. agentsec_eval-0.9.1/tests/test_cross_channel_models.py +113 -0
  308. agentsec_eval-0.9.1/tests/test_cross_channel_runner.py +157 -0
  309. agentsec_eval-0.9.1/tests/test_cross_channel_suite_loadable.py +86 -0
  310. agentsec_eval-0.9.1/tests/test_deterministic_judge.py +89 -0
  311. agentsec_eval-0.9.1/tests/test_fixture_runtime.py +74 -0
  312. agentsec_eval-0.9.1/tests/test_fixture_server_loopback.py +91 -0
  313. agentsec_eval-0.9.1/tests/test_fixture_server_reachable.py +52 -0
  314. agentsec_eval-0.9.1/tests/test_fixture_server_ssh.py +76 -0
  315. agentsec_eval-0.9.1/tests/test_http_adapter.py +48 -0
  316. agentsec_eval-0.9.1/tests/test_hybrid_judge.py +149 -0
  317. agentsec_eval-0.9.1/tests/test_imports.py +37 -0
  318. agentsec_eval-0.9.1/tests/test_judge_base.py +41 -0
  319. agentsec_eval-0.9.1/tests/test_judge_router.py +59 -0
  320. agentsec_eval-0.9.1/tests/test_llm_judge_async.py +62 -0
  321. agentsec_eval-0.9.1/tests/test_loader.py +20 -0
  322. agentsec_eval-0.9.1/tests/test_loader_compat.py +48 -0
  323. agentsec_eval-0.9.1/tests/test_loader_observer_validation.py +58 -0
  324. agentsec_eval-0.9.1/tests/test_markdown_report.py +142 -0
  325. agentsec_eval-0.9.1/tests/test_multi_summary.py +138 -0
  326. agentsec_eval-0.9.1/tests/test_network_observer.py +81 -0
  327. agentsec_eval-0.9.1/tests/test_network_observer_config.py +77 -0
  328. agentsec_eval-0.9.1/tests/test_no_op_judge.py +20 -0
  329. agentsec_eval-0.9.1/tests/test_observation.py +29 -0
  330. agentsec_eval-0.9.1/tests/test_openai_judge.py +153 -0
  331. agentsec_eval-0.9.1/tests/test_openclaw_gateway_adapter.py +166 -0
  332. agentsec_eval-0.9.1/tests/test_openclaw_suite_loadable.py +55 -0
  333. agentsec_eval-0.9.1/tests/test_plugin_judge.py +79 -0
  334. agentsec_eval-0.9.1/tests/test_redactor.py +116 -0
  335. agentsec_eval-0.9.1/tests/test_reports_json.py +90 -0
  336. agentsec_eval-0.9.1/tests/test_runner_error_paths.py +59 -0
  337. agentsec_eval-0.9.1/tests/test_runner_multi_turn.py +115 -0
  338. agentsec_eval-0.9.1/tests/test_runner_observation.py +107 -0
  339. agentsec_eval-0.9.1/tests/test_runner_with_fixture.py +83 -0
  340. agentsec_eval-0.9.1/tests/test_scoring.py +465 -0
  341. agentsec_eval-0.9.1/tests/test_test_case_assertions_typed.py +56 -0
  342. agentsec_eval-0.9.1/tests/test_test_case_fixture_model.py +42 -0
  343. agentsec_eval-0.9.1/tests/test_test_case_migration.py +83 -0
  344. agentsec_eval-0.9.1/tests/test_threat_refs.py +195 -0
  345. agentsec_eval-0.9.1/tests/unit/__init__.py +0 -0
  346. agentsec_eval-0.9.1/tests/unit/test_build_cve_db.py +109 -0
  347. agentsec_eval-0.9.1/tests/unit/test_coverage_report.py +51 -0
  348. agentsec_eval-0.9.1/tests/unit/test_diff_cli.py +127 -0
  349. agentsec_eval-0.9.1/tests/unit/test_diff_findings_delta.py +143 -0
  350. agentsec_eval-0.9.1/tests/unit/test_diff_loader.py +211 -0
  351. agentsec_eval-0.9.1/tests/unit/test_diff_renderer.py +218 -0
  352. agentsec_eval-0.9.1/tests/unit/test_diff_score_delta.py +151 -0
  353. agentsec_eval-0.9.1/tests/unit/test_edit_threat_intel.py +159 -0
  354. agentsec_eval-0.9.1/tests/unit/test_ioc_update_fetchers.py +298 -0
  355. agentsec_eval-0.9.1/tests/unit/test_ioc_update_id_minter.py +48 -0
  356. agentsec_eval-0.9.1/tests/unit/test_ioc_update_merger.py +200 -0
  357. agentsec_eval-0.9.1/tests/unit/test_ioc_update_normalizer.py +94 -0
  358. agentsec_eval-0.9.1/tests/unit/test_ioc_update_renderer.py +170 -0
  359. agentsec_eval-0.9.1/tests/unit/test_ioc_update_watchlist.py +80 -0
  360. agentsec_eval-0.9.1/tests/unit/test_judge_factory.py +120 -0
  361. agentsec_eval-0.9.1/tests/unit/test_version_patch_kev.py +112 -0
@@ -0,0 +1,17 @@
1
+ .env
2
+ __pycache__/
3
+ *.pyc
4
+ .venv
5
+ .venv/
6
+ dist/
7
+ *.egg-info/
8
+ report*.md
9
+ .pytest_cache/
10
+ .ruff_cache/
11
+ .DS_Store
12
+ .worktrees/
13
+ *.local.yaml
14
+ report-local/
15
+ report-local-*/
16
+ AUTHORIZATION.txt
17
+ .cache/
@@ -0,0 +1,28 @@
1
+ # AgentSec server-audit authorization file.
2
+ #
3
+ # This file is the only thing standing between AgentSec and unauthorized
4
+ # execution against a real OpenClaw deployment. Treat it like a credential.
5
+ #
6
+ # To sign:
7
+ # 1. Set AGENTSEC_AUTH_SIGNING_KEY in your env (32+ random bytes; see .env.example).
8
+ # 2. Compute the signature with:
9
+ # python -c "from agentsec.audit.authorization import Authorization; \
10
+ # import os, sys; \
11
+ # a = Authorization.load(sys.argv[1]); \
12
+ # print(a.compute_signature(os.environ['AGENTSEC_AUTH_SIGNING_KEY'].encode()))" \
13
+ # AUTHORIZATION.txt
14
+ # 3. Paste the result into the `signature:` field below.
15
+
16
+ target_host: openclaw.example.com
17
+ authorized_by: your-name@example.com
18
+ identity_provider: okta-saml # blank "" allowed; flips LOW_ASSURANCE
19
+ identity_assertion: "" # paste IdP-issued JWT/SAML assertion; blank flips LOW_ASSURANCE
20
+ valid_from: 2026-04-27T00:00:00Z # ISO 8601 UTC
21
+ valid_until: 2026-05-27T00:00:00Z
22
+ scope:
23
+ - server-audit
24
+ - remote-evaluation
25
+ report_output_path_prefix: ./report-2026-04-27/
26
+ signature_mode: hmac_sha256 # hmac_sha256 | none ("none" prints LOW_ASSURANCE)
27
+ signature: REPLACE_WITH_BASE64_HMAC
28
+ signature_key_env: AGENTSEC_AUTH_SIGNING_KEY
@@ -0,0 +1,377 @@
1
+ # Changelog
2
+
3
+ 版本号遵循 [Semantic Versioning](https://semver.org/)。
4
+
5
+ ---
6
+
7
+ ## [Unreleased]
8
+
9
+ ---
10
+
11
+ ## [0.9.1] — 2026-05-07
12
+
13
+ ### Added
14
+
15
+ - **多提供商 LLM 评判器** — 不再硬性要求 `ANTHROPIC_API_KEY`。新增
16
+ `agentsec.evaluator.judge_factory.build_default_judge_from_env()`,
17
+ 从 `AGENTSEC_LLM_PROVIDER` / `AGENTSEC_LLM_API_KEY` /
18
+ `AGENTSEC_LLM_BASE_URL` / `AGENTSEC_LLM_MODEL` /
19
+ `AGENTSEC_LLM_TIMEOUT` 五个变量按需构造 `LLMJudge`(Anthropic)或
20
+ `OpenAICompatibleJudge`。设了 `AGENTSEC_LLM_BASE_URL` 时
21
+ provider 自动推断为 `openai`。`agentsec run` 与 `agentsec evaluate`
22
+ (无 `judge:` 段时)共享这条路径,因此 OpenAI / DeepSeek / Qwen /
23
+ Moonshot / OpenRouter / Together / 本地 Ollama / vLLM / MLX 都不写
24
+ yaml 也能跑。Anthropic 仍是默认。yaml 的 `judge:` 段优先级高于环
25
+ 境变量。
26
+ - 仓库根 `LICENSE`(MIT)。
27
+
28
+ ### Changed
29
+
30
+ - **PyPI 发布名改为 `agentsec-eval`**(`agentsec` 已被另一个项目
31
+ 占用)。CLI 命令仍是 `agentsec`,安装命令变为
32
+ `pipx install agentsec-eval` / `uvx --from agentsec-eval agentsec`。
33
+ - `pyproject.toml` 补齐 PyPI metadata:`license = "MIT"` +
34
+ `license-files`、`authors`、`readme`、`keywords`、`classifiers`、
35
+ `project.urls`;`[tool.hatch.build.targets.{wheel,sdist}]` 显式
36
+ 声明打包范围;`[project.optional-dependencies].dev` 加入 `build` +
37
+ `twine`。
38
+ - README + `docs/guide/getting-started.md` quick-start 改为以 pipx 为
39
+ 默认推荐路径,源码安装下沉到「开发」段落。
40
+ - `agentsec ioc-update` 的 `--watchlist` / `--intel` 默认值从相对路径
41
+ `src/agentsec/audit/ioc/...` 改成 wheel 内 bundled 副本(运行时通过
42
+ `Path(__file__).parent / "audit" / "ioc" / ...` 解析)。pipx 用户从
43
+ 任意 cwd 执行 `agentsec ioc-update` 都能直接命中正确的内置 IOC
44
+ 数据;显式传 `--watchlist /path` / `--intel /path` 时仍按用户给的
45
+ 路径走。
46
+ - `agentsec run --api-key` 的环境变量回落链改为
47
+ `AGENTSEC_LLM_API_KEY` → `AGENTSEC_JUDGE_API_KEY`(旧名仍工作)。
48
+
49
+ ---
50
+
51
+ ## [0.9.0] — 2026-05-04
52
+
53
+ ### Added
54
+
55
+ - **Community suite marketplace** — operators can now install
56
+ third-party test suites with `agentsec suite install <name>` and run
57
+ them via `agentsec run --suite <name>`. Bundles are pinned by
58
+ canonical SHA-256 and validated against a strict threat-intel
59
+ reference policy. See
60
+ `docs/superpowers/specs/2026-05-04-community-suite-marketplace-design.md`.
61
+ - New CLI sub-app `agentsec suite list/info/install/uninstall` plus the
62
+ hidden utility `agentsec suite hash` for index contributors.
63
+ - New module `src/agentsec/suite_registry/` containing `manifest.py`
64
+ (Pydantic `SuiteManifest` + `IndexEntry` + `RegistryFile`),
65
+ `hashing.py` (Merkle SHA-256 over the bundle tree), `registry.py`
66
+ (`DefaultRegistry` reading the wheel-bundled `community-suites.yaml`,
67
+ `HttpRegistry` for the hidden `--registry-url` override), `store.py`
68
+ (local install store at `~/.agentsec/suites/`), `fetcher.py`
69
+ (codeload.github.com tarball fetch via httpx), `installer.py` (full
70
+ install pipeline with atomic rename + rollback).
71
+ - Reference template at `examples/sample-community-suite/`; per-PR
72
+ static check `scripts/check_community_index.py` wired into `ci.yml`;
73
+ weekly `verify-community-index.yml` workflow that opens an issue on
74
+ hash drift.
75
+ - ADR `docs/adr/0005-community-suite-marketplace.md`.
76
+
77
+ ### Changed
78
+
79
+ - `pyproject.toml` version bumped from `0.2.0` to `0.9.0` to match the
80
+ release tags shipped between v0.3.0 and v0.9.0.
81
+ - `Store()` resolves `~/.agentsec/suites` lazily (per-instance)
82
+ instead of at import time so a `HOME` change between commands is
83
+ honored. The `DEFAULT_STORE_ROOT` export is preserved for
84
+ back-compat.
85
+
86
+ ### Migration notes
87
+
88
+ - The registry ships empty (`suites: {}`). The first real entry will
89
+ arrive through the contribution flow. Existing `test_suites/` usage
90
+ is unchanged — community suites are a new, additive surface.
91
+
92
+ ---
93
+
94
+ ## [0.8.0] — 2026-05-04
95
+
96
+ ### Added
97
+
98
+ - **Custom judge interface** — operators can now replace the default Claude judge in
99
+ `openclaw-target.yaml` with any OpenAI-compatible LLM backend (`type: openai_compatible`)
100
+ or a Python plugin (`type: plugin`). See `docs/superpowers/specs/2026-05-04-custom-judge-design.md`.
101
+ - `OpenAICompatibleJudge` (`src/agentsec/evaluator/openai_judge.py`) — calls any
102
+ `/chat/completions` endpoint; reuses the same `_SYSTEM` prompt as `LLMJudge`.
103
+ - `PluginJudge` (`src/agentsec/evaluator/plugin_judge.py`) — loads a user `.py` file
104
+ via `importlib`, discovers the single `Judge` subclass, delegates all calls to it.
105
+ - `_build_llm_judge` helper in `cli.py` dispatches to the correct judge based on config.
106
+ - New config fields: `OpenAIJudgeConfig`, `PluginJudgeConfig` (Pydantic discriminated union).
107
+
108
+ ---
109
+
110
+ ## [v0.7.0] — 2026-05-04
111
+
112
+ ### Added
113
+ - `agentsec serve <output-dir> [--port 8080] [--host 127.0.0.1]` — local Flask web
114
+ dashboard for browsing `agentsec evaluate` / `agentsec multi-evaluate` artifacts.
115
+ Auto-detects single-target vs multi-target layout. Features: score card (grade,
116
+ combined/remote/server scores, verdict coverage, error rate), filterable findings
117
+ table (severity checkboxes + check-name text filter), Markdown report rendering.
118
+ - New module `src/agentsec/serve/` with `reader.py` (pure I/O helpers) and `app.py`
119
+ (Flask routes + Jinja2/Bootstrap 5 templates).
120
+ - New dependencies: `flask>=3.0`, `markdown>=3.6`.
121
+
122
+ ---
123
+
124
+ ## v0.6.0 (2026-05-04)
125
+
126
+ ### Added
127
+ - `agentsec multi-evaluate --config multi-target.yaml` — evaluate multiple
128
+ OpenClaw instances in parallel via `ThreadPoolExecutor`. Each target produces
129
+ the standard six artifacts under `output_base/<name>/`; a cross-target
130
+ `summary.md` (score table + per-target links + error details) is written last.
131
+ - `MultiTargetConfig` + `TargetEntry` Pydantic models in `config.py`
132
+ (`extra="forbid"` enforced, duplicate target names rejected).
133
+ - `TargetResult` dataclass and `render_multi_summary` renderer in
134
+ `src/agentsec/reports/multi_summary.py`.
135
+
136
+ ---
137
+
138
+ ## [0.5.2] — 2026-05-04
139
+
140
+ ### Added
141
+
142
+ - `PluginStaticCheck` now scans installed skill Python source files for dangerous API
143
+ calls via evaluator-side `ast` analysis. Five categories: `code_execution`
144
+ (subprocess/os), `dynamic_eval` (eval/exec), `network_exfiltration`
145
+ (requests/httpx/urllib), `raw_socket` (socket), `env_access` (os.environ). One
146
+ `high`-severity finding per `(skill_file, category)`. IOC fingerprint matching
147
+ (`critical`) is unaffected. Requires `skills_dir` in `PlatformProfile.paths`.
148
+ - `PlatformProfile`: added `skills_dir` key to `LINUX` (`~/.openclaw/skills/`) and
149
+ `MACOS` (`~/Library/Application Support/openclaw/skills/`).
150
+
151
+ ---
152
+
153
+ ## [0.5.1] — 2026-05-04
154
+
155
+ ### Added
156
+
157
+ - `ExposureScanCheck` now probes open ports for unauthenticated WebSocket endpoints.
158
+ A bare WS upgrade at `/ws`, `/v1/ws`, `/` (configurable via `ws_paths`) that returns
159
+ `101 Switching Protocols` produces a `high`-severity finding. The probe can be disabled
160
+ with `ws_probe_enabled=False`. No findings are generated for auth-gated endpoints.
161
+
162
+ ---
163
+
164
+ ## [0.5.0] — 2026-05-04
165
+
166
+ ### Added
167
+
168
+ - Cross-channel correlated attack infrastructure: `Turn.channel` (`http`|`ws`),
169
+ `TestCase.ws_concurrent_observe` / `ws_listen_duration_s`, `WebSocketAgentAdapter`
170
+ (real + mock), runner alternating and concurrent-observe modes via `asyncio.gather`,
171
+ `agentsec run --ws-url` CLI option. Sub-project A of Phase 3.
172
+ - `test_suites/openclaw/11_cross_channel.yaml`: 6 cross-channel cases
173
+ (cc-alt-01–03: alternating HTTP↔WS attacks; cc-obs-01–03: concurrent WS
174
+ observation alongside HTTP attacks). Covers CVE-2026-32915, CVE-2026-32918,
175
+ and OWASP Agentic AI A5.
176
+ - `test_suites/openclaw/10_agent_chain_privesc.yaml`: 6-case agent-chain
177
+ privilege escalation suite covering three attack vectors — unauthorized
178
+ delegation (oc-chain-01/02), sandbox escape via CVE-2026-32915
179
+ (oc-chain-03/04), and session side-channel via CVE-2026-32918
180
+ (oc-chain-05/06). Closes Phase 4.
181
+
182
+ ---
183
+
184
+ ## [0.3.0] — 2026-04-29
185
+
186
+ ### Added
187
+
188
+ - `agentsec ioc-update` CLI: pulls CVE / threat-intel from NVD, CISA KEV,
189
+ and GitHub Security Advisories; produces a propose-only artifact set
190
+ (`proposed-threat_intel.yaml`, `report.md`, `audit-log.jsonl`) without
191
+ mutating the repo. See `docs/superpowers/specs/2026-04-28-ioc-update-design.md`.
192
+ - `src/agentsec/audit/ioc/watchlist.yaml`: operator-curated vendor list
193
+ (3 entries: openclaw, anthropic-claude, ms-agent).
194
+ - `version_patch`: KEV-listed CVEs (`kev: true` in threat_intel.yaml)
195
+ now produce `severity: critical` regardless of CVSS, with
196
+ `evidence.kev = True` for traceability.
197
+
198
+ ### Changed
199
+
200
+ - `scripts/build_cve_db.py` reads `watchlist.yaml` to determine which
201
+ `ti_prefix` values enter `cve_database.json` (was: hardcoded
202
+ `TI-OPENCLAW-CVE-`). Default behavior unchanged — only OpenClaw has
203
+ `cve_db_include: true`, so `cve_database.json` is byte-identical to
204
+ v0.2.0.
205
+
206
+ ### Migration notes
207
+
208
+ - The first `ioc-update`-merged PR reordered the existing 18
209
+ `threat_intel.yaml` entries into dictionary order by `id`. This is a
210
+ one-off cosmetic change; subsequent runs are stable.
211
+ - Pre-v0.3.0 `threat_intel.yaml` entries did not carry `kev`. After the
212
+ first `ioc-update` run propagated KEV flags, evaluate runs may emit
213
+ `critical` findings where they previously emitted `high` for the same
214
+ CVE.
215
+
216
+ ---
217
+
218
+ ## [0.4.0] — 2026-04-30
219
+
220
+ ### Added
221
+ - `agentsec diff <baseline-dir> <current-dir> [--output PATH]` — markdown
222
+ delta report between two `agentsec evaluate` runs. Compares
223
+ `findings.json` (fingerprint primary + (check, title) drift clustering)
224
+ and `combined-report.md` meta (score / grade / coverage). Regression
225
+ flagged on absolute deltas (combined_score Δ>5, grade letter drop,
226
+ verdict_coverage<0.7, error_rate>0.1). Spec:
227
+ `docs/superpowers/specs/2026-04-30-agentsec-diff-design.md`.
228
+ - Reference sample diff at `docs/samples/diff-2026-04-30-self/diff.md`
229
+ (sample-vs-itself; useful as a renderer-output reference).
230
+
231
+ ---
232
+
233
+ ## [0.2.0] — 2026-04-28
234
+
235
+ ### Stage G — Integration regression + docs sync (2026-04-28)
236
+
237
+ - Added `tests/integration/test_openclaw_mock.py`: an end-to-end run of
238
+ `agentsec evaluate` against an in-process OpenClaw mock
239
+ (`httpx.MockTransport` injected into the real `OpenClawGatewayAdapter`)
240
+ plus a stubbed `SSHExecutor`. Asserts on dedupe, the six artifacts, the
241
+ `Grade:` line in `combined-report.md`, and that `LOW_COVERAGE` is **not**
242
+ emitted when every case is scored.
243
+ - Added `docs/samples/report-2026-04-28/`: committed copies of the six
244
+ `agentsec evaluate` artifacts (`remote-report.md`,
245
+ `server-audit-report.md`, `combined-report.md`, `findings.json`,
246
+ `threat-intel-snapshot.yaml`, `audit-log.jsonl`) plus a `README.md` with
247
+ regeneration instructions. A drift-guard test asserts the committed
248
+ copies stay byte-equal to the pipeline output (refresh with
249
+ `AGENTSEC_REFRESH_SAMPLE=1`). Markdown sample reports carry
250
+ `**Date**: <stripped>` so the drift guard is stable across wallclock-
251
+ minute boundaries.
252
+ - `ROADMAP.md`: ticked 阶段 G; Phase 3 retitled to "OpenClaw v1.x 后续"
253
+ (IOC auto-feed, cross-channel attacks, Web viewer); ticked the two
254
+ Phase 5 items Stage F shipped; renumbered the existing Phase 3/4/5 to
255
+ 4/5/6.
256
+ - `README.md`: updated Phase numbering to match the renumbered ROADMAP;
257
+ flipped Phase 2 from 🚧 to ✅.
258
+ - `CLAUDE.md`: added sections for `agentsec evaluate` orchestrator, the
259
+ `agentsec.scoring` module, and the three-renderer Markdown split.
260
+ - `docs/guide/writing-test-cases.md`: added `isolation:` and
261
+ `threat_refs:` field references with YAML examples using only real
262
+ `TI-*` IDs from `threat_intel.yaml`.
263
+ - `docs/guide/writing-adapters.md`: tightened the registry note to cover
264
+ all three helpers (`register` / `get` / `available`).
265
+ - `src/agentsec/reports/__init__.py`: re-exports the three Markdown
266
+ renderers + two JSON writers that Stage F added; previously only the
267
+ deprecated shim was reachable via `from agentsec.reports import ...`.
268
+ - `scripts/check_threat_refs.py`: scoped the regex to skip namespace-glob
269
+ references like `TI-OPENCLAW-CVE-*` (writing-test-cases.md introduces
270
+ these to denote prefixes; possessive `++` + negative lookahead prevents
271
+ Python regex backtracking from emitting truncated false matches).
272
+ - `pyproject.toml`: bumped version `0.1.0` → `0.2.0`.
273
+
274
+ **Audit findings closed by Stage G:** OE-AUD-014 (docs sync executed).
275
+
276
+ ### Stage F — `agentsec evaluate` + scoring + reports (2026-04-28)
277
+
278
+ - New CLI subcommand: `agentsec evaluate --config openclaw-target.yaml`. Orchestrates the remote-test pipeline + server-audit pipeline + spec §9 scorer and writes the six artifacts from spec §9.3 (`remote-report.md`, `server-audit-report.md`, `combined-report.md`, `findings.json`, `threat-intel-snapshot.yaml`, `audit-log.jsonl`). Either half is optional — `cfg.remote = None` skips the remote suite, `cfg.server_audit = None` skips the SSH connection.
279
+ - Added `OpenClawTargetConfig` Pydantic model (`src/agentsec/config.py`) for `openclaw-target.yaml`. `extra="forbid"` at every nesting level rejects typos at load time. Both `remote` and `server_audit` are optional; at least one must be set.
280
+ - Added `agentsec.scoring` module implementing spec §9.1–9.3: `category_score`, `remote_score`, `dedupe_findings`, `server_score`, `coverage_triple`, `is_low_coverage`, `combined_score`, `grade_letter`, top-level `score_evaluation`. Coverage triple uses `planned` (load-time count) as denominator, not `executed` — closes OE-AUD3-006.
281
+ - `combined_score` reweights to 100% when one of remote/server is unavailable; both unavailable yields `None` rather than a fake `0` — closes OE-AUD-009.
282
+ - Identical findings (matching `Finding.fingerprint`) collapse before scoring and before writing `findings.json` — closes OE-AUD2-011.
283
+ - `verdict_coverage < 0.7` or `error_rate > 0.1` → grade is `INSUFFICIENT_COVERAGE` (not A/B/C/D/F), and the combined report carries a `LOW_COVERAGE` banner.
284
+ - Split `agentsec/reports/markdown.py` into three renderers: `render_remote_report`, `render_server_audit_report` (lifted from inline `cli.py:server_audit` builder), `render_combined_report`. The legacy `render_markdown_report(name, results)` is kept as a deprecated shim for backward compatibility (removes in v0.3).
285
+ - Added `agentsec/reports/json_report.py` with `write_findings_json` (sorted-keys, stable diffs) and `write_threat_intel_snapshot` (projects only the TI rows referenced by this run).
286
+ - Closed Stage C M4 carryover: replaced the inline `_UnreachableLLM` sentinel in `cli.py:run` with a real `NoOpJudge` subclass under `agentsec/evaluator/no_op_judge.py`. The new judge subclasses the `Judge` ABC and raises `NoOpJudgeInvoked` (typed `RuntimeError` subclass) if dispatched to.
287
+ - Pydantic models added by Stage F (`CoverageTriple`, `CombinedScore`, `EvaluationScore`) are `frozen=True` so report renderers receive immutable snapshots.
288
+
289
+ **Audit findings closed by Stage F:** OE-AUD-009 (combined-score reweight when one half is None), OE-AUD2-011 (server-side fingerprint dedupe in scorer + zero-denom category handling + severity-factor aggregation), OE-AUD3-006 (coverage denom = `planned`).
290
+
291
+ ### Stage E — server-audit complete (2026-04-27)
292
+
293
+ - Added `PlatformProfile` (`src/agentsec/audit/platform_profile.py`) with `LINUX` and `MACOS` instances; the orchestrator now detects platform via `uname -s` and feeds the profile into every `CheckContext`. Stage D checks were refactored to consume `ctx.profile.paths` — no hard-coded `/var/log/openclaw/` or `~/.openclaw/` strings remain in `src/agentsec/audit/checks/`.
294
+ - Implemented remote-`$HOME` resolution at orchestrator startup (ADR-0004): `resolve_remote_home(user, profile)` + `materialize_paths(profile, remote_home)` rewrite `~/...` profile paths to absolute paths before any check runs, and `PathMatcher` accepts a `remote_home` so `allowed_paths` matching uses the same home as the remote shell. Closes a silent-failure mode where `find '~/.openclaw/'` returned exit=1 because `shlex.quote` blocked tilde expansion.
295
+ - Replaced `tunnel.py` Stage D stub with a real paramiko `direct-tcpip` local-forward; `--skip-active-test` removed from the `agentsec server-audit` CLI; `active_test` always registers and skips cleanly when the tunnel fails.
296
+ - Six new checks: `exposure_scan`, `filesystem`, `process_forensics`, `credential_audit`, `plugin_static`, `log_review`. Total audit coverage now 10 checks (Stage D's 4 + Stage E's 6).
297
+ - IOC layer (`src/agentsec/audit/ioc/`):
298
+ - `cve_database.json` — projection of `TI-OPENCLAW-CVE-*` entries from `threat_intel.yaml`, generated by `scripts/build_cve_db.py`. CI gates on `--check` to keep the file in sync.
299
+ - `clawhavoc_skills.json` — 5 hand-seeded ClawHavoc-family skill fingerprints; consumed by `plugin_static`.
300
+ - `attack_signatures.yaml` — 5 hand-seeded grep patterns; consumed by `log_review`. Patterns must not contain shell metachars (the policy rejects the resulting grep argv otherwise).
301
+ - `version_patch` Findings now include `cve_db_url` / `cve_db_confidence` from `cve_database.json` for traceability.
302
+ - macOS CI job added (`test-macos` in `.github/workflows/ci.yml`); Linux + macOS matrix both green.
303
+ - Hygiene: orchestrator now records exception type in `report.errors` (e.g. `RuntimeError: ...`); CLI report adds `<!-- meta:errors count:N -->` header; `_REQUIRED_PATHS` guard refactored from per-check inlined for-loops into `Check.check_required_paths()` base method.
304
+
305
+ **Deferred to vNext (intentional v0.2.0 scope cut):**
306
+ - AST-grep danger-pattern detection in `plugin_static`.
307
+ - WebSocket endpoint probing in `exposure_scan`.
308
+ - `agentsec ioc-update` (auto-feed pull) — committed static files only in v0.2.0.
309
+ - Non-standard remote home directories — `resolve_remote_home` uses convention table (Linux: `/home/<user>`/`/root`; macOS: `/Users/<user>`/`/var/root`); deployments using `nsswitch`/LDAP/`ChrootDirectory` need vNext's `getent passwd` policy entry.
310
+
311
+ ### Added (Stage D)
312
+ - `src/agentsec/audit/` server-audit package: `SSHExecutor` (paramiko + policy enforcement), `CommandPolicy` + `path_matcher` + `metachar_guard` + `args_schema` (spec §6.1), `Authorization` with seven-step HMAC-SHA256 validate (spec §6.4), canonical `Finding` with stable fingerprint, `SnapshotCollector` (module-only; runner wiring deferred to Stage E), `Redactor`-fronted audit-log.jsonl, `tunnel.py` stub (real local-forward in Stage E), `server_audit` orchestrator with per-pass fingerprint dedupe.
313
+ - Four P0 audit checks: `native_audit` (consumes `openclaw security audit --json`), `config_audit` (against `config_baseline.yaml`), `version_patch` (PEP-440 specifier match against `threat_intel.yaml`), `active_test` (canary suite through SSH local-forward).
314
+ - New CLI subcommand: `agentsec server-audit` — host / user / key / consent-file / output flags, AUTHORIZATION.txt gate, low-assurance banner.
315
+
316
+ ### Fixed (Stage D post-review)
317
+ Found by ultrareview on PR #4 (10 findings, all addressed before merge).
318
+
319
+ - **bug_009** (`pyproject.toml`): declared `packaging>=22` as a runtime dep. `version_patch` imported it but it was only present transitively via pytest in `[dev]`, so wheel installs hard-failed at `agentsec --help` with ImportError.
320
+ - **bug_001** (`authorization.py` / `cli.py`): path-normalize `report_output_path_prefix` via `Path()` equality so the documented `--output ./report-2026-04-27/` invocation matches the example AUTHORIZATION.txt prefix on first try (Typer parses `--output` as `Path()`, which strips both `./` and the trailing slash; the previous byte-exact compare rejected the documented form).
321
+ - **bug_007** (`authorization.py`): reject naive datetimes in `Authorization.load()` with an explicit `AuthorizationError` rather than letting `validate()` raise a bare `TypeError` that the CLI doesn't catch — preserves the seven-step gate's "predictable AuthorizationError" contract.
322
+ - **bug_006** (`native_audit.py`): pipe `title`, `description`, `remediation` through `ctx.redactor.redact()` symmetrically with `evidence`. Vulnerability scanners typically embed offending values in human-readable text; those fields land verbatim in `server-audit-report.md`, so leaving them un-redacted defeated the global-Redactor guarantee.
323
+ - **bug_011** (`ssh.py`): replace `stdout.read()` / `stderr.read()` with a chunked `_read_capped` loop bounded at `max_output_bytes`, applied symmetrically to stderr (which previously had no cap at all). Bounds in-memory buffering against compromised targets.
324
+ - **bug_002** (`args_schema.py` / `ssh_policy.yaml`): add `value_taking` to the flags role and consume the next argv token after each value-taking flag. `head -n 5 /file`, `tail -n 100 /file`, `stat -c %y /file` — all dormant in Stage D's four checks but Stage E's `log_review` / `exposure_scan` would have hit it on first use.
325
+ - **bug_004** (`ssh_policy.yaml`): drop dead `enforce_max_results` / `forbidden_subcommands` keys (declared but never read). Mirror in spec §6.1.
326
+ - **bug_003** (`ssh.py`): omit `argv` from audit-log.jsonl on success (it carries the same parameter content the SHA-256 design was meant to keep out of the log); keep `argv` on rejection lines for triage. Update docstring + CLAUDE.md.
327
+ - **bug_010** (`config_baseline.yaml`): correct inverted header comment ("FORBIDDEN value" → "REQUIRED value"). The check emits a Finding when `actual != must_equal`, so must_equal describes the secure state.
328
+ - **bug_012** (`active_test.py`): move `active-test-canary.yaml` into the package at `src/agentsec/audit/checks/data/` so wheel installs ship it; replace the `parents[4]` walk with `Path(__file__).parent / "data" / …`. Verified end-to-end against a built wheel.
329
+
330
+ ### Added (Stage C)
331
+ - Discriminated `Assertion` union under `src/agentsec/evaluator/assertions/` covering the seven types from spec §5.4 (`response_not_contains_pattern`, `response_status_in`, `json_path_equals`, `outbound_request_not_to`, `file_not_created`, `config_key_not_changed`, `tool_event_not_invoked`).
332
+ - `DeterministicJudge` and `HybridJudge` implementing spec §5.1 tri-state aggregation (`hard_fail` / `hard_pass` / `inconclusive`); `HybridJudge` honors `always_consult_llm` without letting the LLM flip a hard-pass verdict (OE-AUD2-007).
333
+ - `JudgeRouter` dispatching per-test on `TestCase.judge_type`; CLI builds a router so deterministic-only suites no longer require `AGENTSEC_JUDGE_API_KEY`.
334
+ - Seven P0 category suites under `test_suites/openclaw/` (≥5 cases each): direct PI, indirect PI (with HTML/text fixtures), memory poisoning, tool abuse, consent bypass, auth/authz bypass, SSRF.
335
+ - Stage C end-to-end smoke `tests/integration/test_openclaw_stage_c_smoke.py` running every category suite through the full JudgeRouter pipeline against a mocked OpenClaw.
336
+
337
+ ### Changed (Stage C)
338
+ - `TestCase.assertions` is now `list[Assertion] | None` (was `list[dict[str, Any]]`); unknown `type` strings and extra fields are rejected at load time.
339
+ - `runner.py` distinguishes adapter-side errors (`error="adapter: ..."`) from judge-side errors (`error="judge: ..."`) and fixture/runtime errors (`error="runtime: ..."`) so reports surface failure causes more clearly.
340
+ - New runtime dependency: `jsonpath-ng>=1.6` for `json_path_equals`.
341
+
342
+ ### Fixed (Stage C post-review)
343
+ - **C1**: `AgentResponse.tool_events` (new field) now carries adapter-extracted tool invocations through to `Observation.tool_events`. `OpenClawGatewayAdapter` extracts OpenAI-style `choices[0].message.tool_calls[*]` into `[{"tool": fn.name, "args": fn.arguments}, ...]`. Pre-fix, 4 cases in `04_tool_abuse.yaml` and 2 in `05_consent_bypass.yaml` silently `hard_pass` against an always-empty list. `HttpAgentAdapter._parse_tool_events` is the override hook for non-OpenAI adapters.
344
+ - **I1**: Rewrote `06_auth_authz_bypass.yaml` from `judge_type=deterministic` + `response_status_in: [401,403]` (incorrect: `/v1/chat/completions` always returns 200) to `judge_type=hybrid` + `response_not_contains_pattern` per case + `judge_hint`, so the deterministic side flags content-shaped leaks and HybridJudge consults the LLM to disambiguate refusals that quote the prompt.
345
+ - **I2**: Tightened `tests/integration/test_openclaw_stage_c_smoke.py` to additionally assert `verdict.judge_type == case.judge_type` and that `verdict.passed` / `verdict.reasoning` are well-formed; added a parametrized regression that runs `04_tool_abuse.yaml` and `05_consent_bypass.yaml` through a tool-emitting stub adapter to prove `tool_event_not_invoked` actually fires now (prevents silent C1-style breakage).
346
+ - **I3**: Documented the `oc-ssrf-04` redirect-chain operational prereq (controlled-domain → loopback resolver mapping) inline in `07_ssrf.yaml` and in `docs/guide/writing-test-cases.md` so the suite operator knows the fixture is a no-op without name resolution / hosts override.
347
+
348
+ ### Added (Stage 0/A/B)
349
+ - threat-intel source table (`src/agentsec/audit/ioc/threat_intel.yaml`) and `scripts/check_threat_refs.py` CI gate per spec §2.
350
+ - `.env.example` documents `OPENCLAW_API_KEY` and `AGENTSEC_AUTH_SIGNING_KEY`.
351
+ - Adapter registry (`agentsec.adapters.registry`) with `register/get/available`; CLI `--adapter` (default `http`) resolves through it.
352
+ - `OpenClawGatewayAdapter` posting to `/v1/chat/completions` with OpenClaw body shape (spec §5.2).
353
+ - `Fixture` model + `ServeVia` literal on `TestCase`.
354
+ - `NetworkObserverConfig` + `FixtureOnlyObserver`; CLI `--observer-mode`, `--controlled-domain`, `--same-host`. Loader rejects `outbound_request_not_to.target` outside `controlled_domains` in `fixture_only` mode (spec §5.4 / OE-AUD2-006).
355
+ - Three v1 fixture topologies in `agentsec.observability.fixture_server`: `same-host-loopback`, `reachable-url`, `ssh-port-forward` (spec §7.2; `target-local-http` removed per OE-AUD3-002).
356
+ - `FixtureRuntime` + `{{fixture_url}}` substitution + `serve_via: auto` resolution.
357
+ - Multi-turn replay loop in `runner.py` honoring `Turn.judge_after` / `sleep_ms`; per-turn observer snapshots in `Observation.outbound_requests` / `fixture_events`.
358
+ - 5-case smoke suite for OpenClaw (`test_suites/openclaw/_stage_b_smoke.yaml`) plus end-to-end integration test against a mocked `/v1/chat/completions`.
359
+ - CLI `--token-env` injects `Authorization: Bearer …` and never echoes the token to stdout.
360
+
361
+ ### Changed (Stage 0/A/B)
362
+ - `HttpAgentAdapter` now sends adapter-level headers per request (instead of via the `httpx.AsyncClient` constructor), so test-injected client factories see the same headers without touching httpx internals.
363
+ - `evaluator/__init__.py` lazy-loads `LLMJudge` via `__getattr__` to break the `tests.models` ↔ `evaluator.judge` import cycle that was blocking new `observability` modules from being imported in isolation.
364
+ - `load_test_suite()` accepts an optional `observer_config` (backward compatible).
365
+
366
+ ### Dependencies (Stage 0/A/B)
367
+ - Added: `aiohttp>=3.9` (fixture HTTP server), `paramiko>=3.4` (ssh-port-forward fixture + Stage D SSH).
368
+
369
+ ## [0.1.0] — 2026-04-24
370
+
371
+ ### Added
372
+ - 项目初始框架:`AgentAdapter`、`LLMJudge`、`runner`、Markdown 报告渲染、CLI
373
+ - 内置测试用例:prompt injection(4条)、data leakage(4条)、tool abuse(3条)
374
+ - `HttpAgentAdapter` 通用 HTTP 适配器,支持覆盖 `_build_request` / `_parse_response`
375
+
376
+ ### Fixed
377
+ - `HttpAgentAdapter.send` 中缺失的 `await`,修复并发时阻塞事件循环的问题
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 raoliaoyuan
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.