@clear-capabilities/agentic-security-scanner 0.74.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (331) hide show
  1. package/CHANGELOG.md +1580 -0
  2. package/bin/.agentic-security/findings.json +1577 -0
  3. package/bin/.agentic-security/last-scan.json +1577 -0
  4. package/bin/.agentic-security/last-scan.json.sig +1 -0
  5. package/bin/.agentic-security/scan-history.json +465 -0
  6. package/bin/.agentic-security/streak.json +25 -0
  7. package/bin/agentic-security-audit.js +198 -0
  8. package/bin/agentic-security-consistency.js +80 -0
  9. package/bin/agentic-security-diff.js +136 -0
  10. package/bin/agentic-security-lsp.js +12 -0
  11. package/bin/agentic-security-mcp.js +40 -0
  12. package/bin/agentic-security-rule.js +153 -0
  13. package/bin/agentic-security.js +1683 -0
  14. package/dist/117.index.js +207 -0
  15. package/dist/178.index.js +250 -0
  16. package/dist/218.index.js +793 -0
  17. package/dist/227.index.js +192 -0
  18. package/dist/301.index.js +167 -0
  19. package/dist/384.index.js +18 -0
  20. package/dist/476.index.js +126 -0
  21. package/dist/513.index.js +373 -0
  22. package/dist/520.index.js +13 -0
  23. package/dist/601.index.js +1038 -0
  24. package/dist/634.index.js +1892 -0
  25. package/dist/637.index.js +216 -0
  26. package/dist/660.index.js +131 -0
  27. package/dist/675.index.js +451 -0
  28. package/dist/826.index.js +188 -0
  29. package/dist/830.index.js +133 -0
  30. package/dist/agentic-security.mjs +272 -0
  31. package/dist/agentic-security.mjs.sha256 +1 -0
  32. package/dist/calibration-seed.json +27 -0
  33. package/package.json +77 -0
  34. package/src/.agentic-security/findings.json +80844 -0
  35. package/src/.agentic-security/last-scan.json +80844 -0
  36. package/src/.agentic-security/last-scan.json.sig +1 -0
  37. package/src/.agentic-security/scan-history.json +8408 -0
  38. package/src/.agentic-security/streak.json +26 -0
  39. package/src/badge.js +188 -0
  40. package/src/compare.js +203 -0
  41. package/src/dataflow/.agentic-security/findings.json +3487 -0
  42. package/src/dataflow/.agentic-security/last-scan.json +3487 -0
  43. package/src/dataflow/.agentic-security/last-scan.json.sig +1 -0
  44. package/src/dataflow/.agentic-security/scan-history.json +735 -0
  45. package/src/dataflow/.agentic-security/streak.json +24 -0
  46. package/src/dataflow/CLAUDE.md +38 -0
  47. package/src/dataflow/access-paths.js +172 -0
  48. package/src/dataflow/async-sequencing.js +177 -0
  49. package/src/dataflow/backward.js +201 -0
  50. package/src/dataflow/catalog-expanded.js +485 -0
  51. package/src/dataflow/catalog.js +659 -0
  52. package/src/dataflow/cross-repo.js +219 -0
  53. package/src/dataflow/engine.js +588 -0
  54. package/src/dataflow/exception-flow.js +116 -0
  55. package/src/dataflow/exploit-prover.js +187 -0
  56. package/src/dataflow/higher-order.js +221 -0
  57. package/src/dataflow/ifds.js +347 -0
  58. package/src/dataflow/implicit-flow.js +129 -0
  59. package/src/dataflow/incremental.js +229 -0
  60. package/src/dataflow/index.js +181 -0
  61. package/src/dataflow/numeric-domain.js +192 -0
  62. package/src/dataflow/path-feasibility.js +114 -0
  63. package/src/dataflow/points-to.js +337 -0
  64. package/src/dataflow/polyglot.js +190 -0
  65. package/src/dataflow/proven-clean.js +159 -0
  66. package/src/dataflow/receiver-context.js +76 -0
  67. package/src/dataflow/sanitizer-proof.js +154 -0
  68. package/src/dataflow/soft-taint.js +140 -0
  69. package/src/dataflow/string-domain.js +234 -0
  70. package/src/dataflow/stub-aware-filter.js +100 -0
  71. package/src/dataflow/summaries.js +132 -0
  72. package/src/dataflow/symbolic-exec.js +238 -0
  73. package/src/dataflow/tabulation.js +135 -0
  74. package/src/engine.js +7763 -0
  75. package/src/history-scan.js +229 -0
  76. package/src/index.js +3 -0
  77. package/src/integrations/.agentic-security/findings.json +1504 -0
  78. package/src/integrations/.agentic-security/last-scan.json +1504 -0
  79. package/src/integrations/.agentic-security/scan-history.json +40 -0
  80. package/src/integrations/.agentic-security/streak.json +21 -0
  81. package/src/integrations/index.js +321 -0
  82. package/src/integrations/tickets.js +200 -0
  83. package/src/ir/.agentic-security/findings.json +3036 -0
  84. package/src/ir/.agentic-security/last-scan.json +3036 -0
  85. package/src/ir/.agentic-security/last-scan.json.sig +1 -0
  86. package/src/ir/.agentic-security/scan-history.json +364 -0
  87. package/src/ir/.agentic-security/streak.json +23 -0
  88. package/src/ir/CLAUDE.md +172 -0
  89. package/src/ir/callgraph.js +73 -0
  90. package/src/ir/class-hierarchy.js +195 -0
  91. package/src/ir/index.js +152 -0
  92. package/src/ir/parser-cs.js +260 -0
  93. package/src/ir/parser-java.js +286 -0
  94. package/src/ir/parser-js.js +413 -0
  95. package/src/ir/parser-kt.js +258 -0
  96. package/src/ir/parser-py-cst.js +136 -0
  97. package/src/ir/parser-py.helper.py +501 -0
  98. package/src/ir/parser-py.js +312 -0
  99. package/src/ir/ssa.js +315 -0
  100. package/src/ir/type-stubs.js +288 -0
  101. package/src/leaderboard.js +152 -0
  102. package/src/llm-validator/.agentic-security/findings.json +1891 -0
  103. package/src/llm-validator/.agentic-security/last-scan.json +1891 -0
  104. package/src/llm-validator/.agentic-security/last-scan.json.sig +1 -0
  105. package/src/llm-validator/.agentic-security/scan-history.json +168 -0
  106. package/src/llm-validator/.agentic-security/streak.json +20 -0
  107. package/src/llm-validator/consistency.js +141 -0
  108. package/src/llm-validator/index.js +437 -0
  109. package/src/lsp/.agentic-security/findings.json +28 -0
  110. package/src/lsp/.agentic-security/last-scan.json +28 -0
  111. package/src/lsp/.agentic-security/scan-history.json +79 -0
  112. package/src/lsp/.agentic-security/streak.json +22 -0
  113. package/src/lsp/server.js +275 -0
  114. package/src/mcp/.agentic-security/findings.json +8358 -0
  115. package/src/mcp/.agentic-security/last-scan.json +8358 -0
  116. package/src/mcp/.agentic-security/last-scan.json.sig +1 -0
  117. package/src/mcp/.agentic-security/scan-history.json +1125 -0
  118. package/src/mcp/.agentic-security/streak.json +22 -0
  119. package/src/mcp/CLAUDE.md +54 -0
  120. package/src/mcp/audit.js +136 -0
  121. package/src/mcp/redact.js +75 -0
  122. package/src/mcp/server.js +158 -0
  123. package/src/mcp/stdio.js +83 -0
  124. package/src/mcp/tools.js +940 -0
  125. package/src/mcp/validate.js +49 -0
  126. package/src/personality.js +164 -0
  127. package/src/poc-video.js +239 -0
  128. package/src/posture/.agentic-security/findings.json +51239 -0
  129. package/src/posture/.agentic-security/last-scan.json +51239 -0
  130. package/src/posture/.agentic-security/last-scan.json.sig +1 -0
  131. package/src/posture/.agentic-security/scan-history.json +5557 -0
  132. package/src/posture/.agentic-security/streak.json +24 -0
  133. package/src/posture/CLAUDE.md +42 -0
  134. package/src/posture/adversarial-self-test.js +114 -0
  135. package/src/posture/adversary-agent.js +204 -0
  136. package/src/posture/agents-memory.js +135 -0
  137. package/src/posture/ai-code-fingerprint.js +171 -0
  138. package/src/posture/aibom.js +284 -0
  139. package/src/posture/api-inventory.js +96 -0
  140. package/src/posture/attack-playbooks.js +305 -0
  141. package/src/posture/auditor-agent.js +115 -0
  142. package/src/posture/auth-posture-import.js +135 -0
  143. package/src/posture/baseline-compare.js +114 -0
  144. package/src/posture/blast-radius.js +836 -0
  145. package/src/posture/bounty-prediction.js +141 -0
  146. package/src/posture/business-logic.js +239 -0
  147. package/src/posture/calibration-drift.js +93 -0
  148. package/src/posture/calibration-seed.json +27 -0
  149. package/src/posture/calibration.js +204 -0
  150. package/src/posture/clustering.js +75 -0
  151. package/src/posture/concurrency-checker.js +265 -0
  152. package/src/posture/confidence.js +65 -0
  153. package/src/posture/container-runtime.js +149 -0
  154. package/src/posture/counterfactual.js +109 -0
  155. package/src/posture/cross-lang-graphql.js +165 -0
  156. package/src/posture/cross-lang-grpc.js +166 -0
  157. package/src/posture/cross-lang-meta.js +101 -0
  158. package/src/posture/cross-lang-openapi.js +187 -0
  159. package/src/posture/cross-lang-orm.js +153 -0
  160. package/src/posture/cross-lang-queues.js +210 -0
  161. package/src/posture/crown-jewels.js +110 -0
  162. package/src/posture/custom-rules.js +361 -0
  163. package/src/posture/cve-alert-daemon.js +433 -0
  164. package/src/posture/cve-lookup.js +129 -0
  165. package/src/posture/dead-code.js +430 -0
  166. package/src/posture/defender-agent.js +158 -0
  167. package/src/posture/deploy-platform.js +204 -0
  168. package/src/posture/detector-fuzz.js +61 -0
  169. package/src/posture/deterministic.js +99 -0
  170. package/src/posture/drift.js +165 -0
  171. package/src/posture/epss.js +156 -0
  172. package/src/posture/exploitability-probability.js +212 -0
  173. package/src/posture/exploitability.js +121 -0
  174. package/src/posture/feature-flags.js +110 -0
  175. package/src/posture/finding-defaults.js +132 -0
  176. package/src/posture/fix-history.js +411 -0
  177. package/src/posture/fix-plan.js +121 -0
  178. package/src/posture/fix-verify-loop.js +157 -0
  179. package/src/posture/fix-verify.js +130 -0
  180. package/src/posture/flow-narration.js +105 -0
  181. package/src/posture/grader-calibration.js +156 -0
  182. package/src/posture/harness-discovery.js +113 -0
  183. package/src/posture/holdout-eval.js +144 -0
  184. package/src/posture/iac-reachability.js +163 -0
  185. package/src/posture/iam-policy.js +128 -0
  186. package/src/posture/integrity.js +97 -0
  187. package/src/posture/learning.js +166 -0
  188. package/src/posture/license-policy.js +109 -0
  189. package/src/posture/llm-redteam-prompts.js +418 -0
  190. package/src/posture/llm-redteam.js +303 -0
  191. package/src/posture/material-change.js +163 -0
  192. package/src/posture/mitigation-composite.js +55 -0
  193. package/src/posture/mttr.js +91 -0
  194. package/src/posture/network-policy-import.js +126 -0
  195. package/src/posture/path-predicates.js +99 -0
  196. package/src/posture/persona-prioritization.js +153 -0
  197. package/src/posture/poc-cwe-map.js +51 -0
  198. package/src/posture/poc-generator.js +500 -0
  199. package/src/posture/policy-gate.js +174 -0
  200. package/src/posture/pre-incident-archaeology.js +110 -0
  201. package/src/posture/profile.js +93 -0
  202. package/src/posture/reachability-filter.js +42 -0
  203. package/src/posture/regression-test-gen.js +200 -0
  204. package/src/posture/reverse-blast-radius.js +110 -0
  205. package/src/posture/router.js +109 -0
  206. package/src/posture/rule-overrides.js +198 -0
  207. package/src/posture/rule-pack-signing.js +209 -0
  208. package/src/posture/rule-packs.js +143 -0
  209. package/src/posture/rule-synthesis.js +108 -0
  210. package/src/posture/ruleset-version.js +71 -0
  211. package/src/posture/sbom.js +129 -0
  212. package/src/posture/schema-aware-bridge.js +207 -0
  213. package/src/posture/security-trend.js +87 -0
  214. package/src/posture/semantic-clone.js +114 -0
  215. package/src/posture/specification-mining.js +170 -0
  216. package/src/posture/stable-id.js +75 -0
  217. package/src/posture/stack-playbook.js +229 -0
  218. package/src/posture/streak.js +249 -0
  219. package/src/posture/suppressions.js +135 -0
  220. package/src/posture/telemetry-ingest.js +112 -0
  221. package/src/posture/threat-model.js +145 -0
  222. package/src/posture/three-agent-pipeline.js +74 -0
  223. package/src/posture/triage.js +146 -0
  224. package/src/posture/trust-boundary-diagram.js +115 -0
  225. package/src/posture/type-narrowing.js +129 -0
  226. package/src/posture/validator-metrics.js +179 -0
  227. package/src/posture/verifier-ephemeral.js +118 -0
  228. package/src/posture/verifier-target.js +147 -0
  229. package/src/posture/verifier.js +257 -0
  230. package/src/posture/version.js +75 -0
  231. package/src/posture/waf-ingest.js +200 -0
  232. package/src/posture/why-fired.js +141 -0
  233. package/src/pr-comment.js +172 -0
  234. package/src/pr-delta.js +198 -0
  235. package/src/report/.agentic-security/findings.json +79 -0
  236. package/src/report/.agentic-security/last-scan.json +79 -0
  237. package/src/report/.agentic-security/last-scan.json.sig +1 -0
  238. package/src/report/.agentic-security/scan-history.json +332 -0
  239. package/src/report/.agentic-security/streak.json +23 -0
  240. package/src/report/index.js +1136 -0
  241. package/src/report/mascot.js +42 -0
  242. package/src/runScan.js +141 -0
  243. package/src/sast/.agentic-security/findings.json +5051 -0
  244. package/src/sast/.agentic-security/last-scan.json +5051 -0
  245. package/src/sast/.agentic-security/last-scan.json.sig +1 -0
  246. package/src/sast/.agentic-security/scan-history.json +788 -0
  247. package/src/sast/.agentic-security/streak.json +23 -0
  248. package/src/sast/CLAUDE.md +39 -0
  249. package/src/sast/_comment-strip.js +46 -0
  250. package/src/sast/agent-tool-escalation.js +131 -0
  251. package/src/sast/auth-provider.js +171 -0
  252. package/src/sast/authz.js +236 -0
  253. package/src/sast/bench-shape/.agentic-security/findings.json +28 -0
  254. package/src/sast/bench-shape/.agentic-security/last-scan.json +28 -0
  255. package/src/sast/bench-shape/.agentic-security/scan-history.json +24 -0
  256. package/src/sast/bench-shape/.agentic-security/streak.json +22 -0
  257. package/src/sast/bench-shape/index.js +62 -0
  258. package/src/sast/claude-hook-injection.js +199 -0
  259. package/src/sast/claude-md-prompt-injection.js +170 -0
  260. package/src/sast/claude-settings.js +165 -0
  261. package/src/sast/client-side.js +149 -0
  262. package/src/sast/cpp-bench-extras.js +122 -0
  263. package/src/sast/cpp-dataflow.js +430 -0
  264. package/src/sast/cpp.js +248 -0
  265. package/src/sast/csharp.js +152 -0
  266. package/src/sast/csrf.js +82 -0
  267. package/src/sast/dart-flutter.js +173 -0
  268. package/src/sast/db-rls.js +147 -0
  269. package/src/sast/db-taint.js +215 -0
  270. package/src/sast/defi-deep.js +242 -0
  271. package/src/sast/deserialization-gadgets.js +113 -0
  272. package/src/sast/django-hardening.js +230 -0
  273. package/src/sast/env-hygiene.js +125 -0
  274. package/src/sast/fastapi-hardening.js +145 -0
  275. package/src/sast/go-extended.js +84 -0
  276. package/src/sast/host-header.js +106 -0
  277. package/src/sast/index.js +17 -0
  278. package/src/sast/java-ast-folding.js +561 -0
  279. package/src/sast/java-bench-extras.js +708 -0
  280. package/src/sast/java-collection-passthrough.js +178 -0
  281. package/src/sast/java-constant-fold.js +244 -0
  282. package/src/sast/java-deserialization.js +125 -0
  283. package/src/sast/jndi.js +104 -0
  284. package/src/sast/juliet-shape.js +324 -0
  285. package/src/sast/jwt-exp.js +104 -0
  286. package/src/sast/kotlin.js +82 -0
  287. package/src/sast/laravel-hardening.js +198 -0
  288. package/src/sast/ldap-injection.js +100 -0
  289. package/src/sast/llm-owasp.js +465 -0
  290. package/src/sast/llm-stored-prompt.js +103 -0
  291. package/src/sast/llm-trading-agent.js +161 -0
  292. package/src/sast/llm.js +308 -0
  293. package/src/sast/logic.js +140 -0
  294. package/src/sast/mass-assignment.js +101 -0
  295. package/src/sast/mcp-audit.js +242 -0
  296. package/src/sast/mobile-manifest.js +195 -0
  297. package/src/sast/model-load.js +164 -0
  298. package/src/sast/mutation-xss.js +87 -0
  299. package/src/sast/nosql-injection.js +82 -0
  300. package/src/sast/open-redirect.js +119 -0
  301. package/src/sast/php.js +91 -0
  302. package/src/sast/pipeline.js +122 -0
  303. package/src/sast/primary-cwe-java.js +155 -0
  304. package/src/sast/prompt-firewall.js +151 -0
  305. package/src/sast/prompt-template.js +157 -0
  306. package/src/sast/prototype-pollution.js +112 -0
  307. package/src/sast/python-sinks.js +195 -0
  308. package/src/sast/quarkus-hardening.js +102 -0
  309. package/src/sast/rag-poisoning.js +118 -0
  310. package/src/sast/rate-limit.js +128 -0
  311. package/src/sast/response-splitting.js +138 -0
  312. package/src/sast/ruby.js +108 -0
  313. package/src/sast/rust.js +105 -0
  314. package/src/sast/solidity.js +167 -0
  315. package/src/sast/springboot-hardening.js +186 -0
  316. package/src/sast/ssrf-cloud-metadata.js +80 -0
  317. package/src/sast/ssti.js +116 -0
  318. package/src/sast/swift.js +162 -0
  319. package/src/sast/toctou.js +95 -0
  320. package/src/sast/webhook.js +101 -0
  321. package/src/sast/xpath-injection.js +51 -0
  322. package/src/sast/xxe.js +140 -0
  323. package/src/sast/zip-slip.js +200 -0
  324. package/src/sca/base-images.json +45 -0
  325. package/src/sca/container.js +107 -0
  326. package/src/sca/dep-confusion.js +134 -0
  327. package/src/sca/index.js +6 -0
  328. package/src/sca/popular-packages.json +41 -0
  329. package/src/sca/sarif-ingest.js +187 -0
  330. package/src/sca/vuln-function-hints.json +89 -0
  331. package/src/secrets/index.js +4 -0
package/CHANGELOG.md ADDED
@@ -0,0 +1,1580 @@
1
+ # Changelog
2
+
3
+ ## 0.74.0 — viral surface: PoC video gen + security-tutor skill + personality voices + compare runner
4
+
5
+ Four shareability lifts.
6
+
7
+ ### Auto-recorded PoC scripts — `scanner/src/poc-video.js`
8
+ For findings with `_exploitInput` (v0.71 symbolic prover), generate a
9
+ self-contained script the operator runs against their own staging URL:
10
+ - **playwright**: TypeScript test that drives the exploit live + records video. Default for UI-driven exploits.
11
+ - **curl**: bash script with verbose tracing + payload-acceptance assertion. Default for backend exploits.
12
+ - **http**: RFC 7230-style raw request pastable into Postman/Insomnia.
13
+
14
+ The generator does NOT execute anything; produces share-grade evidence the operator runs against their OWN environment.
15
+
16
+ ### Educational mode skill — `skills/security-tutor/SKILL.md`
17
+ Auto-activates when the user asks "why is X dangerous", references a finding-id and asks for context, or has mechanically accepted ≥3 fixes in a row. Walks the finding Socratically: identify source/sink/sanitizer, ask user to propose the payload BEFORE showing the fix, verify understanding with follow-up traps. CWE-specific Socratic patterns table covers 8 families.
18
+
19
+ ### Security personality voices — `scanner/src/personality.js`
20
+ Three tone modes wrapping any rendered report: **sage** (calm, default), **cassandra** (alarmist), **vince** (drill-sergeant). Same findings, dramatically different shareability. `AGENTIC_SECURITY_PERSONALITY` env selects. Only the framing changes — technical content stays identical.
21
+
22
+ ### Compare runner framework — `scanner/src/compare.js`
23
+ Bring-your-own-tool side-by-side comparison. User supplies the other tool's invocation + field map; we render a Markdown card with overlap / unique / severity-disagreement sections. Framework is generic — no competitor-specific adapters shipped.
24
+
25
+ ### Test totals
26
+ **847 scanner tests pass / 0 fail** (up from 832).
27
+
28
+ ## 0.73.0 — technical depth: IFDS summary edges + type-stub filter + cross-repo federation
29
+
30
+ Three technical-depth lifts. v0.71 shipped IFDS scaffolding with bottom
31
+ summaries; v0.70 added type-stubs but didn't thread them into the
32
+ engine; v0.68 added cross-lang within a single repo but not cross-repo.
33
+ v0.73 closes all three loops.
34
+
35
+ ### IFDS full summary edges — `scanner/src/dataflow/ifds.js`
36
+
37
+ The v0.71 IFDS solver used bottom summaries (every callee was assumed
38
+ clean → no interprocedural facts flowed). v0.73 adds:
39
+ - `summaries: Map<qid|entryFact, Set<exitFact>>` records per-function
40
+ summary edges
41
+ - `pendingReturns: Map<qid|entryFact, [{fn,returnNode,callerEntry}]>`
42
+ registers callers waiting on more summary facts
43
+ - `_entryFactForCall(callNode, currentFact, callee)` derives callee's
44
+ entry fact from a call site
45
+ - `_mapReturnFact(callNode, exitFact, callerCurrent)` translates exit
46
+ facts back into caller namespace
47
+ - Summary reuse: second call to same (callee, entry fact) is O(1)
48
+
49
+ This is what makes IFDS polynomial in practice rather than re-solving
50
+ every call site.
51
+
52
+ ### Type-stub-aware filter — `scanner/src/dataflow/stub-aware-filter.js`
53
+
54
+ Post-pass after the taint engine. Consults the project's TS/.pyi/JAR
55
+ type stubs (loaded by v0.70's `ir/type-stubs.js`) and demotes findings
56
+ whose source type cannot carry the vulnerability metacharacters:
57
+
58
+ | Family | CWE | Safe types (demoted) |
59
+ |--------|-----|----------------------|
60
+ | XSS | CWE-79 | number, boolean, Date, RegExp, bigint |
61
+ | SQLi | CWE-89 | number, boolean, Date, bigint |
62
+ | Cmd | CWE-78 | number, boolean, bigint |
63
+ | Path | CWE-22 | number, boolean |
64
+ | SSRF | CWE-918 | number, boolean |
65
+
66
+ Severity drops one tier (critical → high → medium → low → info); never
67
+ drops the finding. Operator sees `_stubTypeDemoted: true` + reason.
68
+
69
+ Gate: `AGENTIC_SECURITY_TYPE_STUBS=1` (same flag as the v0.70 stub
70
+ loader).
71
+
72
+ ### Cross-repo federation — `scanner/src/dataflow/cross-repo.js`
73
+
74
+ The intra-repo `cross-lang-openapi.js` posture module shipped in v0.66
75
+ ties a single repo's client call to its server route. v0.73 ships the
76
+ inter-repo lift: `buildFederatedGraph(specs)` walks a SET of OpenAPI
77
+ specs from different repos, finds shared `(method, path)` endpoints
78
+ with overlapping field schemas, and emits federated edges. Each edge
79
+ becomes a `CROSS-REPO` finding (`CWE-829`, `family: cross-repo-taint`)
80
+ showing both repos + the shared fields in the trace.
81
+
82
+ Use case: scan the auth-service repo + the billing-service repo
83
+ together; the scanner detects that `/users/{id}` is published by auth
84
+ and consumed by billing, with shared fields `email + bio`. A taint in
85
+ auth's response surfaces in billing's input — both teams now own the
86
+ sanitization contract.
87
+
88
+ ### Test totals
89
+ **832 scanner tests pass / 0 fail** (up from 811).
90
+
91
+ ## 0.72.1 — CI template + README adopts the v0.72 viral features
92
+
93
+ Patch release. Two adoption follow-ups for v0.72's viral features.
94
+
95
+ ### CI template defaults to advisor-tone PR comment
96
+
97
+ `.github/workflows/scan.yml` — new `pr-comment-mode` input (default
98
+ `"advisor"`, alternative `"findings-table"`):
99
+
100
+ - **advisor** (new default): runs `pr-delta --base origin/<base_ref>` to
101
+ compute the security DELTA between PR and base, then pipes the JSON
102
+ into `pr-comment` to render the security-advisor's note. The comment
103
+ shows only what THIS PR introduced/resolved, with CWE narrative + fix
104
+ snippet + blocking-merge footer.
105
+ - **findings-table** (legacy): the prior critical/high count table.
106
+ Available behind the input flag for adopters who prefer it.
107
+
108
+ Downstream consumers automatically get the new comment style on next CI
109
+ run. Opt back to the legacy table by passing `pr-comment-mode: findings-table`
110
+ to the reusable workflow.
111
+
112
+ ### README adopts the status badge + leaderboard pitch
113
+
114
+ `README.md`:
115
+ - Stale `version-0.64.0` badge bumped to `version-0.72.1`.
116
+ - New badge row entry: `[![agentic-security](...)]()`.
117
+ - New "Status badge for your README" section with paste-ready Markdown,
118
+ three example states (passing / high / critical), and self-host
119
+ instructions for users who don't want to depend on `agentic-security.dev`.
120
+ - New "Public leaderboard (preview)" section pointing at the v0.72
121
+ `leaderboard-row` backend.
122
+
123
+ ### Test totals
124
+ **811 scanner tests pass / 0 fail** (unchanged from 0.72.0).
125
+
126
+ ## 0.72.0 — viral features: shadowscan delta + advisor-tone PR comment + live badge + leaderboard backend
127
+
128
+ Three viral-lever features built to compound: every PR generates a
129
+ screenshotable advisor's note (not a wall of findings), every repo can
130
+ wear a live security badge (pull-marketing), and every scan's data shape
131
+ is ready for a public leaderboard.
132
+
133
+ ### #5 Shadowscan / security-DELTA on PR — `scanner/src/pr-delta.js`
134
+
135
+ `computePrDelta(root, { baseRef, headRef })` scans both refs in-memory
136
+ (no checkout, via `git show <ref>:<path>`), diffs by `stableId`, and
137
+ emits:
138
+ - `introduced` — findings in head not in base
139
+ - `resolved` — findings in base not in head
140
+ - `persistent` — same stableId both sides
141
+ - `shifted` — same stableId but severity or CWE changed
142
+ - `summary.net` — per-severity head − base delta
143
+
144
+ New CLI:
145
+ ```
146
+ agentic-security pr-delta --base origin/main [--head HEAD] [--json]
147
+ [--fail-on-introduced]
148
+ ```
149
+
150
+ ### #1 Advisor-tone PR comment — `scanner/src/pr-comment.js`
151
+
152
+ `renderPrComment(delta, { repoName, prNumber, prTitle })` produces a
153
+ single Markdown comment that reads like a person, not a table. Three
154
+ auto-detected modes:
155
+ - **clean** (no delta) → "Safe to merge."
156
+ - **resolves-only** → "This PR resolves N finding(s)... Nice cleanup."
157
+ - **needs-work** → narrative + per-finding paragraph with CWE 'why'
158
+ text + remediation snippet + blocking-merge footer for critical/high.
159
+
160
+ CWE narrative table covers 19 families with one-sentence "why does this
161
+ matter" explanations. The mode is what gets **screenshotted** — security
162
+ tool output that reads like an advisor, not a SARIF dump.
163
+
164
+ New CLI:
165
+ ```
166
+ agentic-security pr-comment [--in delta.json | --base <ref>]
167
+ [--repo <slug>] [--pr <n>] [--title <text>]
168
+ # Reads JSON delta from --in, --base (recomputes), or stdin.
169
+ ```
170
+
171
+ ### #2 Live SVG badge — `scanner/src/badge.js`
172
+
173
+ `renderBadge({ format, style, scanRoot, scan })` emits a shields.io-style
174
+ SVG (or JSON for frontend renderers) summarizing the latest scan:
175
+ `agentic-security: crit 0 · high 2 · med 5 · 4h ago`. Color driven by
176
+ highest non-zero severity. Two styles: `flat` (default) + `for-the-badge`.
177
+
178
+ New CLI:
179
+ ```
180
+ agentic-security badge [--format svg|json] [--style flat|for-the-badge]
181
+ ```
182
+
183
+ Reads from `.agentic-security/last-scan.json`. The badge is intended as
184
+ a README ornament that doubles as pull-marketing — every adopting repo
185
+ becomes a billboard.
186
+
187
+ ### Leaderboard backend — `scanner/src/leaderboard.js`
188
+
189
+ `leaderboardRowFor({ scanRoot, repo })` builds one row of the future
190
+ public leaderboard data: posture grade A-F, severity counts, top CWE,
191
+ last-scan age, delta trend (`improving`/`flat`/`regressing` from
192
+ `scan-history.jsonl` if present), and the badge URL/Markdown snippet
193
+ ready to paste. `rankRows(rows)` sorts by critical → high → grade.
194
+
195
+ Public hosting of `agentic-security.dev/leaderboard` is deferred — this
196
+ release ships the data side so the future site is a thin frontend.
197
+
198
+ New CLI:
199
+ ```
200
+ agentic-security leaderboard-row --repo owner/name [--root <dir>]
201
+ ```
202
+
203
+ ### Test totals
204
+ **811 scanner tests pass / 0 fail** (up from 792).
205
+
206
+ ### Migration
207
+ All four features are additive opt-in CLI subcommands. CI templates can
208
+ adopt `pr-delta | pr-comment` to replace findings-dump comments without
209
+ breaking the existing scan-and-comment flow. README badge adoption is
210
+ manual (paste a Markdown snippet).
211
+
212
+ ## 0.71.1 — dependency hygiene + CodeQL ignore-list for scanner/
213
+
214
+ Patch release. No behavior change.
215
+
216
+ ### Dependency bumps
217
+ - `@types/node`: `^20.0.0` → `^24.0.0` (scanner + vscode). Node 20 reached
218
+ EOL in 2026-04; tracking the current LTS.
219
+ - `scanner/package.json` `engines.node`: `>=20.0.0` → `>=22.0.0`.
220
+ - `vscode/package.json` `@types/vscode` + `engines.vscode`: `^1.85.0` →
221
+ `^1.95.0` (the engine pair stays consistent so VSCE doesn't warn).
222
+
223
+ Other deps already current and unchanged: `@babel/*` 7.x, `@vercel/ncc`
224
+ 0.38.x, `js-yaml` 4.x, `safe-regex` 2.x, `fast-glob` 3.x, `esbuild` 0.25.x,
225
+ `@vscode/vsce` 3.x. GitHub Actions in workflows already on v5/v8.
226
+
227
+ ### CodeQL ignore-list
228
+
229
+ The scanner directory contains the taint engine itself — full of SAST
230
+ patterns, hardcoded fixture credentials, eval() shapes, raw SQL strings.
231
+ Any other SAST (including GitHub CodeQL) flags these as vulnerabilities,
232
+ producing noise that drowns out real findings.
233
+
234
+ Two new files:
235
+ - `.github/codeql/codeql-config.yml` — 15-entry `paths-ignore` covering
236
+ `scanner/**`, `bench/**`, `vscode/dist/**`, all test fixtures, the
237
+ `.bench-cache/**` tree, and generated bundles.
238
+ - `.github/workflows/codeql.yml` — advanced-setup CodeQL workflow on
239
+ push/PR + weekly cron, references the config above. Uses
240
+ `security-extended` query suite.
241
+
242
+ **To activate**: switch the repo from default to advanced code-scanning
243
+ setup at Settings → Code security → Code scanning → Set up → Advanced.
244
+ The workflow will then run and honor the paths-ignore list.
245
+
246
+ ### Test totals
247
+ **792 scanner tests pass / 0 fail** (unchanged from 0.71.0).
248
+
249
+ ## 0.71.0 — taint engine frontier release (final 2 of 10 — IFDS + symbolic exploit proofs)
250
+
251
+ Third and final release in the v0.69 → v0.71 taint-engine arc. v0.71
252
+ ships the two heaviest items: IFDS tabulation as an alternative
253
+ context-sensitive analyzer, and a symbolic-execution post-pass that
254
+ generates concrete attacker payloads + proves infeasibility.
255
+
256
+ ### #3 IFDS / IDE tabulation — `scanner/src/dataflow/ifds.js`
257
+
258
+ Implementation of Reps-Horwitz-Sagiv "Precise interprocedural dataflow
259
+ analysis via graph reachability" (POPL 1995). Runs as an ALTERNATIVE
260
+ analyzer that augments the existing k=2 worklist when
261
+ `AGENTIC_SECURITY_IFDS=1` — its findings are merged with the worklist
262
+ output, deduped by `(file, line, sinkId)`.
263
+
264
+ Components:
265
+ - `IFDSSolver` class: path-edge worklist over the exploded supergraph
266
+ - `_flowAssign`: distributive transfer function (copy / kill / source-gen)
267
+ - `_detectSinkAtCall`: catalog-driven sink matching at each call node
268
+ - Budget: `AGENTIC_SECURITY_IFDS_BUDGET_FACTS=10000` (default) caps the
269
+ edge count; the solver returns partial findings + `_ifdsStats.capped: true`
270
+
271
+ What v1 supports: intraprocedural flow + the IFDS framework scaffolding.
272
+ Full call-graph summary edges are stubbed (the path-edge worklist
273
+ demonstrates the framework; production-quality summary caching arrives
274
+ in v0.72). The merge-with-worklist design means the existing engine
275
+ keeps producing findings; IFDS adds context-sensitive flows the k=2
276
+ cache joined out.
277
+
278
+ ### #9 Symbolic exploit prover — `scanner/src/dataflow/exploit-prover.js`
279
+
280
+ Post-pass that runs after `runTaintEngine`. For each finding:
281
+
282
+ **Step 1 — Infeasibility check** via SMT-lite (homegrown, ~150 LOC).
283
+ Walks the finding's `trace + chain` for sanitizer-output regexes that
284
+ exclude the family's required metacharacters. If the path passes
285
+ through e.g. `htmlspecialchars` for an XSS finding, the metachars
286
+ `<`, `>`, `"`, `'` are excluded → `_provenUnreachable: true`, severity
287
+ demoted to LOW.
288
+
289
+ **Step 2 — Exploit input synthesis.** For feasible findings, attaches
290
+ `f._exploitInput` with the family's canonical payload. 16 families
291
+ covered including SQLi (`1' OR '1'='1`), XSS (`<script>alert(1)</script>`),
292
+ cmd-inj, path-traversal, SSRF, deserialization, XXE, SSTI, LDAP/XPath
293
+ injection, open redirect, response splitting, ReDoS, CSRF, prototype
294
+ pollution, and prompt injection.
295
+
296
+ **Optional Z3 backend.** When `AGENTIC_SECURITY_SYMEXEC_Z3=1` AND the
297
+ customer has installed `z3-solver`, the prover uses real SMT for the
298
+ infeasibility check. Default install never bundles Z3 — the SMT-lite
299
+ fallback handles every query we issue today. Activation:
300
+ `AGENTIC_SECURITY_SYMEXEC=1` (lite); add `AGENTIC_SECURITY_SYMEXEC_Z3=1`
301
+ for the Z3 path.
302
+
303
+ ### Test totals
304
+ **792 scanner tests pass / 0 fail** (up from 773 in v0.70).
305
+ Dataflow: 215 tests (up from 196).
306
+
307
+ ### Migration
308
+ Both items opt-in via env flag. No existing behavior changes. With both
309
+ v0.71 items active + the v0.69+v0.70 stack on opt-in, the engine's
310
+ precision ceiling rises substantially — full default-on cutover after
311
+ two consecutive nightly CVE-replay runs show F1 delta ≥ +1pp without
312
+ precision drop >1pp.
313
+
314
+ ### 10-item taint-engine arc complete
315
+
316
+ v0.69 → v0.71 has shipped all 10 items:
317
+
318
+ | # | Item | Module | Release |
319
+ |---|------|--------|---------|
320
+ | 1 | Backward slicing | `dataflow/backward.js` | v0.69 |
321
+ | 2 | Steensgaard alias | `dataflow/points-to.js` | v0.70 |
322
+ | 3 | IFDS tabulation | `dataflow/ifds.js` | v0.71 |
323
+ | 4 | String regex lattice | `dataflow/string-domain.js` | v0.69 |
324
+ | 5 | Incremental cache | `dataflow/incremental.js` | v0.69 |
325
+ | 6 | Probabilistic taint | `dataflow/soft-taint.js` | v0.70 |
326
+ | 7 | Type-stubs | `ir/type-stubs.js` | v0.70 |
327
+ | 8 | Capture-set | `dataflow/higher-order.js` | v0.69 |
328
+ | 9 | Symbolic exploit proof | `dataflow/exploit-prover.js` | v0.71 |
329
+ |10 | DB-aware taint | `sast/db-taint.js` | v0.70 |
330
+
331
+ ## 0.70.0 — taint engine foundations release (4 more of 10 leap items)
332
+
333
+ Second of three releases (v0.69 / v0.70 / v0.71). v0.70 adds the
334
+ "needs new theory" capabilities — aliasing, type inference, soft taint,
335
+ and DB round-trip flow. These are the foundations that lift the
336
+ intra-procedural lattice; v0.71 will swap in IFDS + symbolic exec on
337
+ top.
338
+
339
+ ### #2 Steensgaard points-to / alias analysis — `scanner/src/dataflow/points-to.js`
340
+ Unification-based, near-linear alias analysis. Walks every assign/call
341
+ across the function set, unifying classes for direct copies + field
342
+ store/load operations. Interprocedural step at resolved call sites
343
+ unifies caller args with callee params. The engine consumes the graph
344
+ via `_addPathAliasAware`: when a tainted target is added to state, all
345
+ aliases of the root variable are tainted too. Closes the
346
+ `let a = obj; a.x = tainted; sink(obj.x)` FN class.
347
+ Opt-in via `AGENTIC_SECURITY_POINTS_TO=1`.
348
+
349
+ ### #7 Type-stub integration — `scanner/src/ir/type-stubs.js`
350
+ Parses TypeScript `.d.ts` under `node_modules/@types/**`, Python `.pyi`
351
+ at project root. Outputs `{signatures, types, frameworks, fingerprint}`.
352
+ Cache under `$XDG_CONFIG_HOME/agentic-security/stub-cache/` keyed by
353
+ package-lock + package.json fingerprint. Budget gate via
354
+ `AGENTIC_SECURITY_TYPE_STUBS_BUDGET_MS` (default 10s).
355
+ Opt-in via `AGENTIC_SECURITY_TYPE_STUBS=1`.
356
+
357
+ ### #6 Probabilistic / soft taint — `scanner/src/dataflow/soft-taint.js`
358
+ Post-pass over IR-TAINT findings: walks `trace + chain + pathSteps`,
359
+ multiplies (1 − sanitizer-effectiveness) across each call. 22-entry
360
+ default-effectiveness table (DOMPurify=0.98, parameterize=1.0,
361
+ trim=0.05, etc.) — overrideable per catalog entry via
362
+ `sanitizerEffectiveness` field. Findings below
363
+ `AGENTIC_SECURITY_SOFT_TAINT_THRESHOLD` (default 0.5) get severity
364
+ demoted (critical→high→medium→low→info) but are NEVER dropped —
365
+ auditors see the demotion + the sanitizer that earned it.
366
+ Opt-in via `AGENTIC_SECURITY_SOFT_TAINT=1`.
367
+
368
+ ### #10 Database-aware taint — `scanner/src/sast/db-taint.js`
369
+ Recognizes ORM write/read pairs across Sequelize / Prisma / TypeORM /
370
+ Mongoose / Django ORM / SQLAlchemy. When `req.body.X` is written to
371
+ `Model.field` then later read and rendered, emits a stored-XSS
372
+ finding with a 2-step trace pointing at both the write and read sites.
373
+ Handles indirection (`const u = await Model.findOne(...); res.send(u.bio)`)
374
+ and direct chains (`res.send(Model.findOne(...).bio)`).
375
+ Fires automatically — already gated by ORM context heuristic.
376
+
377
+ ### Test totals
378
+ **773 scanner tests pass / 0 fail** (up from 736 in v0.69).
379
+ Dataflow: 196 tests (up from 188).
380
+
381
+ ### Migration
382
+ All four items are additive. v0.69's items remain opt-in this release;
383
+ v0.71 will flip the v0.69 set to default-on if CVE-replay shows F1
384
+ delta ≥ +1pp without precision drop >1pp across two consecutive runs.
385
+
386
+ ## 0.69.0 — taint engine wire-up release (4 of 10 leap items)
387
+
388
+ First of three releases (v0.69 / v0.70 / v0.71) that lift the taint
389
+ engine toward academic state-of-the-art. v0.69 ships items that wire
390
+ already-built infrastructure into the engine's main path — minimum new
391
+ code, maximum precision gain.
392
+
393
+ ### #1 Backward slicing — `scanner/src/dataflow/backward.js`
394
+ Already-implemented backward slicer gets a walltime budget
395
+ (`AGENTIC_SECURITY_BACKWARD_SLICE_BUDGET_MS`, default 30s) and emits
396
+ `_annotateBackwardSlicesStats` { annotated, skipped, exhausted } on the
397
+ findings array. Each finding gets `f.backwardSlice: [...]` ordered
398
+ source→sink and `f.pathSteps` merged with the existing trace.
399
+ Opt-in via `AGENTIC_SECURITY_BACKWARD_SLICE=1`; flips default in v0.70.
400
+
401
+ ### #5 Cross-scan incremental cache — `scanner/src/dataflow/incremental.js`
402
+ Already-implemented persistence layer (`readIncrementalState`,
403
+ `seedSummaryCache`, `serializeSummaries`, `commitIncrementalState`) gets
404
+ wired into `runDeepAnalysis`. State lives in
405
+ `<scanRoot>/.agentic-security/incremental/{version,files,summaries}.json`.
406
+ Diff via file SHA-256, reverse call-graph for transitive invalidation,
407
+ version-pinned by `(scanner, catalog-size)`. On hit: ≥70% summary reuse
408
+ on re-scans; identical findings.
409
+ Opt-in via `AGENTIC_SECURITY_INCREMENTAL=1`; flips default in v0.70.
410
+
411
+ ### #4a String regex lattice — `scanner/src/dataflow/string-domain.js`
412
+ New `{kind: 'Regex', pattern}` lattice value alongside Const/Concat/Unknown.
413
+ `abstract()` recognizes sanitizer-output regexes for `encodeURIComponent`,
414
+ `encodeURI`, `parseInt`, `parseFloat`, `hashSync`, `digest`, `toString`,
415
+ `htmlspecialchars`. New `provablyMatches(absVal, safe)` proves an
416
+ abstract value fits a safe-charset regex — used by `sanitizer-proof.js`
417
+ to elevate findings to `provenClean` for non-SQL classes.
418
+ Opt-in via `AGENTIC_SECURITY_STRING_DOMAIN=1`; flips default in v0.70.
419
+
420
+ ### #8a Closure capture-set analysis — `scanner/src/dataflow/higher-order.js`
421
+ New `capturedFreeVars(node, boundNames)` walker + `callbackCaptureSet(cb)`.
422
+ Extracts free variables from inline arrow/function-value bodies,
423
+ handling nested closures and shadowing correctly. The motivating
424
+ example `let t = req.query.x; arr.map(i => exec(t))` correctly
425
+ identifies `t` as captured.
426
+ Engine wiring (consume the capture set at call sites) waits for
427
+ v0.70's alias analysis; the extractor + tests ship now.
428
+ Opt-in via `AGENTIC_SECURITY_CLOSURE_CAPTURE=1`.
429
+
430
+ ### Test totals
431
+ **736 scanner tests pass / 0 fail** (up from 698 in v0.68).
432
+ Dataflow scope: 188 tests (up from 130).
433
+
434
+ ### Migration
435
+ All four are additive, opt-in via env flag. No existing behavior changes.
436
+ v0.70 flips the four to default-on if CVE-replay shows F1 delta ≥ +1pp
437
+ without precision drop >1pp across two consecutive runs.
438
+
439
+ ## 0.68.0 — five capabilities that open clear competitive gap
440
+
441
+ Five world-class capabilities ship together. Each addresses something
442
+ mainstream SAST (SonarQube / Semgrep / Snyk / Checkmarx / Veracode /
443
+ CodeQL) does poorly or not at all.
444
+
445
+ ### #3 Closed-loop auto-fix verification
446
+
447
+ `scanner/src/posture/fix-verify-loop.js` — new `verifyFixWithTests`
448
+ runs the full chain: re-scan + project linter + project test suite.
449
+ A fix is `verified-clean` only when all three pass.
450
+
451
+ Test-runner auto-discovery: `npm test`, pytest, go test, cargo test,
452
+ bundle exec rspec, mvn test, ./gradlew test. Returns one of:
453
+ `verified-clean`, `untested-but-passes` (no runner found — honest),
454
+ or `verification-failed` (with per-leg detail).
455
+
456
+ Competitor gap: most SAST tools suggest fixes but don't close the loop
457
+ by running the user's tests.
458
+
459
+ ### #4 LLMSecOps coverage (3 new detectors)
460
+
461
+ | Module | CWE | What it catches |
462
+ |--------|-----|-----------------|
463
+ | `sast/llm-stored-prompt.js` | CWE-1336 | System prompt sourced from DB / config file / writable mount fed to LLM call without hardening (delimiters, immutable instruction prefix, allow-list) |
464
+ | `sast/rag-poisoning.js` | CWE-1336 | User-controlled text written to Chroma/Pinecone/Weaviate/Qdrant/LangChain/pgvector without `metadata: { source, trust_level }` provenance |
465
+ | `sast/agent-tool-escalation.js` | CWE-269 | Agent harness exposes both READ tools (list/get/fetch/scrape) and ACT tools (exec/write/send/delete) with no approval gate between them — classic tool-chain privilege escalation |
466
+
467
+ Competitor gap: nobody else ships LLM-agent-specific privilege flow
468
+ analysis. The AI security market is wide open.
469
+
470
+ ### #7 Probabilistic exploitability with Wilson 95% CI
471
+
472
+ `scanner/src/posture/exploitability-probability.js` — replaces opaque
473
+ severity strings with a calibrated probability + 95% confidence interval:
474
+
475
+ ```
476
+ f.exploitProbability ∈ [0,1]
477
+ f.exploitProbabilityCI95 [lo, hi]
478
+ f.exploitProbabilityWhy string[] -- which factors fired
479
+ f.exploitProbabilitySlice 'CWE-89×js' | 'CWE-89' | 'prior-only'
480
+ ```
481
+
482
+ Method: CISA-KEV-derived CWE-family prior + multiplicative factor
483
+ update (reachability, source provenance, sanitizer-in-path, project
484
+ hardening). Wilson CI from operator-curated `.agentic-security/
485
+ exploit-history.jsonl` when n ≥ 5 (slice grain); falls back to wider
486
+ prior-only CI when sample is thin. The CI WIDTH is the honest signal.
487
+
488
+ Competitor gap: every SAST emits severity strings; none surface
489
+ calibrated probability with uncertainty.
490
+
491
+ ### #8 Provable-clean for SQL injection
492
+
493
+ `scanner/src/dataflow/proven-clean.js` — `proveSqlClean` walks the
494
+ function's CFG between every reaching source and the SQL sink,
495
+ verifies at least one parameterizer (catalog-tagged sanitizer or
496
+ known driver method: setString/AddWithValue/bindParam/etc.) sits on
497
+ the path. If proof holds, `f.provenClean = true` with
498
+ `f.provenanceProof.sanitizers: [...]`. Stronger statement than
499
+ "we didn't find a flow" — auditor-grade evidence.
500
+
501
+ v1 uses path-existence; v2 will substitute SMT-backed string-domain
502
+ constraints behind the same interface.
503
+
504
+ Competitor gap: existing tools emit "issue found" or "no issue
505
+ found." Nobody emits "proven safe."
506
+
507
+ ### #9 Time-travel + counterfactual scanning
508
+
509
+ `scanner/src/history-scan.js` + two new CLI subcommands:
510
+
511
+ ```
512
+ agentic-security history --since 6.months --interval 1.month
513
+ # Walks N historical git refs, scans each, emits a timeline of
514
+ # introduced + resolved findings between consecutive refs.
515
+
516
+ agentic-security what-if --overlay app.js:./new-app.js [--remove foo.js]
517
+ # Apply virtual file overlays + deletes, scan the counterfactual
518
+ # state, return findings delta vs. baseline. Working tree is never
519
+ # touched (overlay is in-memory via runFullScan's fileContents map).
520
+ ```
521
+
522
+ Use cases: "What was our posture 6 months ago vs. today?" / "If I
523
+ remove this auth middleware, how many new findings appear?" / "If I
524
+ downgrade lodash to 4.17.20, how many CVE matches drop?"
525
+
526
+ Competitor gap: existing tools scan the working state. None offer
527
+ historical replay or counterfactual mode at this granularity.
528
+
529
+ ### Test totals
530
+
531
+ **698 scanner tests pass / 0 fail** (up from 665 in v0.67).
532
+
533
+ ### Migration
534
+
535
+ No breaking changes. All new capabilities are additive:
536
+ - LLM/RAG/agent detectors fire automatically on relevant code
537
+ - exploitProbability fields appear alongside existing severity
538
+ - provenClean is informational (does NOT drop findings)
539
+ - history + what-if are opt-in CLI subcommands
540
+
541
+ ## 0.67.0 — detection rules for 6 new CWE families (SSTI / LDAP / open-redirect / response-splitting)
542
+
543
+ The v0.66 corpus expansion exposed six CWE families with no detection
544
+ coverage (or partial coverage that missed common shapes). This release
545
+ ships dedicated detectors plus a runner fix.
546
+
547
+ ### New SAST detectors
548
+
549
+ | Module | CWE | Languages | What it catches |
550
+ |--------|-----|-----------|-----------------|
551
+ | `sast/ssti.js` | CWE-94 | py, js, php, java | Jinja2 `from_string` / `Template()`, Handlebars / EJS / Mustache / Pug `.compile`, Twig `createTemplate`, Velocity `evaluate` — fires only when the template body is non-literal AND has a taint hint or comes from a variable assigned from user input in the preceding 10 lines |
552
+ | `sast/open-redirect.js` | CWE-601 | js, py, java, php | `res.redirect` / `ctx.redirect` / `flask.redirect` / `HttpResponseRedirect` / Spring `"redirect:" + …` / PHP `header("Location: " . …)` with user-derived target AND no allow-list check in the preceding 30 lines |
553
+ | `sast/response-splitting.js` | CWE-113 | js, py, java, php | `setHeader` / `addHeader` / `response.headers[…] = …` / PHP `header()` with user-derived value (or method param in Java handler context) AND no CRLF strip / sanitizer above |
554
+ | `sast/ldap-injection.js` | CWE-90 | js, java, py | **Extended:** indirect filter shape (`String filter = "(uid=" + name + ")"; ctx.search(…, filter, …)`) and `search_s` / `paged_search` callees, gated on a file-level LDAP context hint |
555
+
556
+ XPath (CWE-643) and ReDoS (CWE-1333) already had working detectors; the
557
+ runner just wasn't checking the right arrays.
558
+
559
+ ### Runner fix
560
+
561
+ `bench/cve-replay/runner.mjs` now consults `scan.findings`, `scan.secrets`,
562
+ `scan.supplyChain`, AND `scan.logicVulns` when scoring a fixture.
563
+ Previously, business-logic findings (where ReDoS / weak-crypto / behavioral
564
+ checks live) were invisible to the scoring pipeline.
565
+
566
+ ### Engine cleanup
567
+
568
+ Removed the legacy coarse `(?:res\.redirect|response\.redirect|.redirect\(|header\(['"]Location)`
569
+ REGEX rule from `engine.js` — the new `scanOpenRedirect` detector is
570
+ precise (allow-list aware) and replaces it cleanly.
571
+
572
+ ### Results on the v0.66 corpus
573
+
574
+ All 9 fixtures across the 6 new CWE families now score **pre:TP post:TN**:
575
+
576
+ | CVE | CWE | v0.66 | v0.67 |
577
+ |-----|-----|-------|-------|
578
+ | CVE-2017-16016-handlebars-ssti | CWE-94 | pre:FN | pre:TP post:TN |
579
+ | CVE-2017-9805-ldap-injection | CWE-90 | pre:FN | pre:TP post:TN |
580
+ | CVE-2018-1320-xpath-injection | CWE-643 | pre:TP | pre:TP post:TN |
581
+ | CVE-2019-8341-jinja-ssti | CWE-94 | pre:FN | pre:TP post:TN |
582
+ | CVE-2020-15252-open-redirect | CWE-601 | pre:TP post:FP | pre:TP post:TN |
583
+ | CVE-2020-7660-resp-splitting | CWE-113 | pre:FN | pre:TP post:TN |
584
+ | CVE-2021-25966-open-redirect-py | CWE-601 | pre:FN | pre:TP post:TN |
585
+ | CVE-2021-29622-ldap-py | CWE-90 | pre:FN | pre:TP post:TN |
586
+ | CVE-2021-3801-redos | CWE-1333 | pre:FN | pre:TP post:TN |
587
+
588
+ Aggregate F1: **0.500 → 0.597** on the same 88-entry corpus. Wilson 95%
589
+ CI [0.334, 0.523] (narrower than v0.66's [0.249, 0.429]). Regression
590
+ tier still F1=1.0.
591
+
592
+ ### Tests
593
+
594
+ `scanner/test/new-cwe-detectors.test.js` — 11 tests covering each
595
+ detector's vulnerable + clean shape, including post-fixture
596
+ suppression patterns (allow-list checks for open-redirect, CRLF
597
+ sanitizers for response-splitting).
598
+
599
+ **665 scanner tests pass / 0 fail** (up from 654).
600
+
601
+ ## 0.66.0 — interprocedural precision + LLM default-on + C# / Kotlin IRs + corpus to 88
602
+
603
+ Four world-class lifts shipped together. After v0.65 the F1=0.636 number
604
+ was honest but the engine was still k=1 monovariant, the LLM validator
605
+ was opt-in, and the IR coverage stopped at JS/TS/Python/Java.
606
+
607
+ ### Interprocedural taint precision (engine semantics)
608
+
609
+ `scanner/src/dataflow/engine.js`:
610
+ - **k≥2 context-sensitive summaries.** At assign-from-call sites the
611
+ engine now builds the entry-taint-state from call args + current
612
+ taint via `entryStateFromCall()` and looks up (lazily computes) a
613
+ summary keyed by THAT entry state. Closes the "helper is pure when
614
+ called clean but tainted when called with user input" FN class.
615
+ - **`applyAtCallSite` wired.** Mutated by-reference params propagate
616
+ back to caller vars (`Object.assign(target, tainted)` → `target`
617
+ tainted in caller). Was previously dead code.
618
+ - **Fixed-point iteration.** `runTaintEngine` now runs the pre-pass
619
+ up to MAX_FP_ITERS (3) iterations or until the summary cache size
620
+ stabilizes — recursion no longer under-approximates. Budget caps
621
+ on walltime + cache size still hold.
622
+
623
+ Tests in `scanner/test/interproc-k2.test.js` lock the lifts: context
624
+ disambiguates tainted vs clean call sites, recursion converges within
625
+ budget, large helper chains finish within walltime.
626
+
627
+ ### LLM validator default-on
628
+
629
+ `scanner/src/llm-validator/index.js` flips from opt-in to default-on:
630
+
631
+ | Env state | Behavior |
632
+ |----------------------------------------------|---------------|
633
+ | `LLM_ENDPOINT` unset | no-op |
634
+ | `LLM_ENDPOINT` set, `VALIDATE` unset | **runs** |
635
+ | `LLM_ENDPOINT` set, `VALIDATE=0` | no-op (opt-out) |
636
+ | `LLM_ENDPOINT` set, `VALIDATE=1` | runs (legacy) |
637
+
638
+ Cache by `(file-content-sha256, source→sink path, prompt version,
639
+ model id)` continues to suppress repeat calls. Fail-closed semantics
640
+ unchanged — any prompt-injection / verify-failure → escalate (keep).
641
+
642
+ ### C# IR backend (new language)
643
+
644
+ `scanner/src/ir/parser-cs.js` (~290 lines) — regex-based first pass,
645
+ parallel approach to the legacy Python regex parser. Models method
646
+ declarations with modifiers, params, body extraction with brace-depth
647
+ tracking. Lowers `var x = …`, `Type x = …`, `x = …`, calls, return,
648
+ throw. Builds a linear CFG per method. Plus 24 C# catalog entries:
649
+ ASP.NET MVC sources (`Request.Form`, `Request.QueryString`,
650
+ `Request.Cookies`, `Request.Headers`, `Request.Body`), sinks (SqlCommand,
651
+ Process.Start, File.ReadAll*, WebClient, HttpClient, BinaryFormatter),
652
+ sanitizers (HtmlEncode, UrlEncode, GetFullPath, Parse/TryParse,
653
+ Regex.Escape, AddWithValue).
654
+
655
+ ### Kotlin IR backend (new language)
656
+
657
+ `scanner/src/ir/parser-kt.js` (~250 lines) — same regex approach.
658
+ Models `fun` declarations with modifiers, params, optional return
659
+ type, body extraction. Lowers `val`/`var`/`x = …`, calls, return,
660
+ throw. Kotlin string interpolation (`"hi $x"` / `"hi ${name}"`) lowers
661
+ into IR template-expression form so the engine sees the inner taint.
662
+ Plus 14 Kotlin catalog entries: Ktor / Spring sources, JDBC / Exposed /
663
+ ProcessBuilder / readText / ObjectInputStream sinks, escapeHtml4 /
664
+ URLEncoder / toInt / canonicalFile / setString sanitizers.
665
+
666
+ Both IRs wire into `buildProjectIR` and `buildProjectIRAsync`. Tests
667
+ in `scanner/test/parser-cs-kt.test.js`: shape correctness, multi-method
668
+ files, end-to-end scan over ASP.NET + Ktor fixtures.
669
+
670
+ ### CVE-replay corpus: 50 → 88 entries (20 CWEs × 8 languages)
671
+
672
+ `bench/cve-replay/generate-corpus-extended.mjs` adds 38 entries:
673
+ - 8 C# fixtures (exercises new IR)
674
+ - 8 Kotlin fixtures (exercises new IR)
675
+ - 6 new CWE families: SSTI (CWE-94), LDAP injection (CWE-90), XPath
676
+ injection (CWE-643), open redirect (CWE-601), HTTP response
677
+ splitting (CWE-113), regex DoS (CWE-1333)
678
+ - 16 framework variants for existing families (NestJS, Koa, Symfony,
679
+ Laravel, Gin, Fiber, etc.)
680
+
681
+ **Aggregate F1 = 0.500** (Wilson 95% CI [0.249, 0.429]) on the 88-entry
682
+ corpus. Lower than v0.65's 0.636 BECAUSE the new fixtures include
683
+ capabilities the scanner doesn't yet detect (C#/Kotlin coverage is
684
+ still thin; new CWE families have no detection rules). This is the
685
+ honest direction — broader corpus, narrower CI, real measurement.
686
+ Regression-tier CI gate remains F1=1.0.
687
+
688
+ ### Test totals
689
+
690
+ 654 scanner tests pass / 0 fail (up from 640 in v0.65). Smoke +
691
+ regression-tier CI both green.
692
+
693
+ ### Migration
694
+
695
+ No breaking changes. To enable the LLM validator default-on path, set
696
+ `AGENTIC_SECURITY_LLM_ENDPOINT`. To opt out: `AGENTIC_SECURITY_LLM_VALIDATE=0`.
697
+ C# and Kotlin scans require no setup — drop a `.cs` or `.kt` file in
698
+ the scan tree.
699
+
700
+ ## 0.65.0 — sanitizer catalog 8× / CVE corpus 6× / continuous CVE alerting
701
+
702
+ Closes three ASPM/SAST competitiveness gaps surfaced in the post-v0.64 review:
703
+ sanitizer coverage that lagged commercial vendors, a published F1 number
704
+ measured against a corpus too small to be credible, and a `/cve-alerts`
705
+ command that configured a webhook but never actually monitored anything.
706
+
707
+ ### Sanitizer catalog: 48 → 372 entries (7.7×)
708
+
709
+ New module `scanner/src/dataflow/catalog-expanded.js` adds ~325 sanitizer
710
+ entries spanning 6 languages and 10 categories (HTML escape, SQL
711
+ parameterization, shell escape, URL encode, path normalize, regex escape,
712
+ LDAP/XPath, XML/JSON, validators, type coercion). Merged into the main
713
+ catalog at load time; on id collision the base catalog wins.
714
+
715
+ | Language | Before | After |
716
+ |-------------|-------:|------:|
717
+ | JavaScript | 11 | 105 |
718
+ | Python | 11 | 96 |
719
+ | Java | 8 | 61 |
720
+ | PHP | 4 | 41 |
721
+ | Ruby | 5 | 33 |
722
+ | Go | 2 | 36 |
723
+ | **Total** | **48** |**372**|
724
+
725
+ Tests in `scanner/test/catalog-expanded.test.js` enforce: minimum entry
726
+ count, per-language coverage floors, well-formed entry shape, no
727
+ duplicate IDs across the merged catalog, callee identifiers that the
728
+ indexer can match, and family vocabulary hygiene.
729
+
730
+ Two pre-existing duplicate IDs in the base catalog (`py-input`,
731
+ `py-os-environ`, `py-open`, plus 14 in the v2 Python block) were fixed
732
+ in this pass — the duplicate-id test surfaced them.
733
+
734
+ ### CVE-replay corpus: 8 → 50 entries (6.25×)
735
+
736
+ `bench/cve-replay/generate-corpus.mjs` emits 42 capability-tier fixtures
737
+ across 11 high-priority CWE families and 6 languages:
738
+
739
+ | Family | CWE | Entries |
740
+ |---------------------|------------|--------:|
741
+ | SQL injection | CWE-89 | 5 |
742
+ | XSS | CWE-79 | 4 |
743
+ | Command injection | CWE-78 | 5 |
744
+ | Path traversal | CWE-22 | 5 |
745
+ | SSRF | CWE-918 | 4 |
746
+ | Deserialization | CWE-502 | 4 |
747
+ | XXE | CWE-611 | 3 |
748
+ | Prototype pollution | CWE-1321 | 2 |
749
+ | CSRF | CWE-352 | 2 |
750
+ | Hardcoded secrets | CWE-798 | 3 |
751
+ | Weak crypto | CWE-327/338| 5 |
752
+
753
+ Aggregate F1 against the new corpus is **0.636** (Wilson 95% CI [0.346,
754
+ 0.591]) — an honest baseline, replacing the previous F1 number measured
755
+ against 8 cherry-picked fixtures. The regression-tier CI gate still
756
+ passes F1=1.0. Failing capability entries graduate to regression as fixes
757
+ land (CONTRIBUTING.md's 5-snapshot rule).
758
+
759
+ ### Continuous CVE alerting daemon
760
+
761
+ New `scanner/src/posture/cve-alert-daemon.js` polls OSV for the project's
762
+ dependency tree and fires the configured webhook when a new advisory
763
+ drops. Multi-ecosystem: npm, PyPI, Ruby, Go, Cargo, Composer, Maven,
764
+ Dart. Reads `.agentic-security/cve-alerts.json` (the schema written by
765
+ `/cve-alerts`), dedupes against `.agentic-security/cve-alerts-state.json`
766
+ so re-runs don't re-page. Slack / Discord / generic webhook payload
767
+ shapes built in.
768
+
769
+ - `agentic-security cve-watch [--alert-url] [--min-severity] [--dry-run]`
770
+ — one-shot run. Schedule it via cron or CI.
771
+ - `scripts/ci-templates/cve-watch.github-actions.yml` — drop-in GitHub
772
+ Actions workflow (daily 08:00 UTC + `workflow_dispatch`). Reads
773
+ `CVE_ALERT_URL` from repo secrets; commits state file with `[skip ci]`.
774
+
775
+ 21 unit tests in `scanner/test/cve-alert-daemon.test.js` cover each
776
+ manifest reader, severity normalization, deduplication across runs,
777
+ min-severity floors, payload formatting, and offline-mode refusal.
778
+
779
+ ### Migration notes
780
+
781
+ - Re-running `npm run build` is recommended to bundle the new daemon
782
+ binary entry. No breaking changes; all v0.64.0 commands and skills
783
+ still work as before.
784
+ - The capability-tier F1 score in the manifest is intentionally honest
785
+ (0.636, not 0.85). Path to 0.85 is more corpus, not better numbers.
786
+
787
+ ## 0.64.0 — auto-activating skills + multi-harness manifests
788
+
789
+ Inspired by patterns from the obra/superpowers plugin's "mandatory workflows,
790
+ not suggestions" stance: the agent shouldn't wait for the user to type
791
+ `/scan` or `/fix` before doing the security thing. Nine new auto-activating
792
+ skills cover the common security/privacy moments where the agent should
793
+ intervene before damage lands. Plus Codex / Cursor / Gemini manifests so the
794
+ 12 MCP tools work in those harnesses too.
795
+
796
+ ### Auto-activating skills (9 new)
797
+
798
+ Each lives at `skills/<slug>/SKILL.md`. The `description:` frontmatter is
799
+ the activation cue Claude Code's skill router reads. All ≤120 chars,
800
+ enforced by `npm run test:lifecycle`.
801
+
802
+ - **`security-explain-cve`** — fires when user mentions CVE-id / GHSA / asks "what is this vuln". Routes to `lookup_cve` MCP tool + `/explain`.
803
+ - **`security-scan-on-deploy`** — fires on "ship / deploy / launch / is this safe?" intent. Checks `last-scan.json` mtime, runs a fresh scan if stale, renders a verdict (not a wall of findings).
804
+ - **`security-fix-finding`** — fires when user references a finding and asks to fix. Enforces the deterministic toolchain (`synthesize_fix → verify_fix → apply_fix`); refuses raw `Edit`.
805
+ - **`security-weak-crypto`** — fires **before** the agent writes md5/sha1 for passwords, DES/3DES/RC4, static IVs, `Math.random` for tokens, or JWT with `none` algorithm. Refuses the write, proposes the right primitive with literal code.
806
+ - **`security-rotate-leak`** — fires when a leaked secret is mentioned. Masks the value, detects the provider, prints the revoke URL, estimates blast radius BEFORE rotating, refuses to print the value back.
807
+ - **`security-eval-warn`** — fires before `eval()` / `new Function()` / `setTimeout(string,…)` / `pickle.loads` / `eval($x)` / `class_eval`. Diagnoses what the user actually wants, proposes the structured alternative.
808
+ - **`security-sql-injection-warn`** — fires before template-literal queries / `+`-concat into SQL / NoSQL operator injection / LDAP/XPath concat. Shows the literal parameterized form for the user's specific DB driver.
809
+ - **`threat-model-first`** — fires **before** the agent writes new auth / secret / external-API / file-upload / OAuth / deserialization code. Walks STRIDE per touch-point (one sentence per row, no skipping); writes `TM.md` to `.agentic-security/agent-scratchpad/threat-model/<session>/` via `append_scratchpad`. Then proposes implementation with each defensive measure citing its STRIDE row in a code comment.
810
+ - **`privacy-data-flow`** — fires **before** the agent writes code touching PII / PHI / PCI / GDPR-special / confidential data shapes. Classifies the data, traces the destination (storage tier / encryption / third-party processors / logging / retention / backups / replication), maps to jurisdiction (GDPR / HIPAA / CCPA / PCI-DSS), writes `DATA_FLOW.md` to the scratchpad. Refuses hard violations (logging full PAN, sending PHI to non-BAA processor, storing CVV after auth).
811
+
812
+ ### Skills-registry integrity test
813
+
814
+ `scanner/test/skills-registry.test.js` enforces:
815
+ - Every `skills/<slug>/SKILL.md` has well-formed YAML frontmatter
816
+ - `name:` equals `agentic-security:<slug>`
817
+ - `description:` is ≤ 120 chars (re-asserted at unit-test time)
818
+ - Auto-activating skills include an "Activate" / "Activate on" cue
819
+ - Every `/<slash-command>` referenced in a skill body resolves to a real
820
+ file under `commands/`
821
+
822
+ 7 new tests, all passing.
823
+
824
+ ### Multi-harness manifests (3 new)
825
+
826
+ The MCP server is harness-agnostic — same binary, different manifest:
827
+
828
+ | Harness | Manifest |
829
+ |----------------|-----------------------------------|
830
+ | Claude Code | `.claude-plugin/plugin.json` (already shipping) |
831
+ | **Codex CLI** | `.codex-plugin/plugin.json` (new) |
832
+ | **Cursor** | `.cursor-plugin/plugin.json` (new) |
833
+ | **Gemini CLI** | `gemini-extension.json` (root) (new) |
834
+
835
+ Each manifest declares the same `agentic-security` MCP server pointing at
836
+ `scanner/bin/agentic-security-mcp.js`. Each carries an explicit note about
837
+ which surface IS validated vs not. The 12 MCP tools work identically across
838
+ all four harnesses; the slash-command + skill-activation surface is Claude-
839
+ Code-specific today.
840
+
841
+ README updated with an "Install in your harness" table covering all four
842
+ plus the generic MCP-aware-client fallback.
843
+
844
+ ### Lint state
845
+
846
+ 89 surfaces total (80 commands + 9 skills + add-scan-rule SKILL). All
847
+ within the 120-char description / 200-char argument-hint caps.
848
+
849
+ ### Tests
850
+
851
+ 619/619 passing (was 612 in v0.63.0; +7 skills-registry tests).
852
+
853
+ ## 0.63.0 — Python IR via stdlib ast (real parser, regex fallback)
854
+
855
+ Replaces the hand-rolled regex Python parser with Python 3's stdlib `ast`
856
+ module (zero npm bundle bloat, zero pip install, runs in a per-scan
857
+ subprocess) and keeps the regex parser as a fallback when Python isn't on
858
+ PATH. The new path closes the gaps the regex parser admitted to in its own
859
+ comments: comprehensions, decorators, `match` statements, `async`/`await`,
860
+ lambda bodies, and nested-paren default args (`def f(x=Foo(1,2))`).
861
+
862
+ ### What ships
863
+
864
+ - **`scanner/src/ir/parser-py.helper.py`** — Python 3.8+ stdlib script
865
+ that reads `[{file, content}, ...]` JSON on stdin and emits the same
866
+ IR shape as the regex parser, but computed from a real AST. Models
867
+ assign / call / member / subscript / f-string / if / for / while /
868
+ try-except / return / raise / async-for / async-with. Captures every
869
+ function definition (including nested, decorated, async, generic) even
870
+ when the body has unmodeled constructs.
871
+ - **`scanner/src/ir/parser-py-cst.js`** — Node-side dispatcher.
872
+ Batched: ALL Python files in a project go in one subprocess invocation.
873
+ Capability probe cached per-process. 10 s timeout on the whole batch.
874
+ - **`scanner/src/ir/index.js`** — three-mode toggle:
875
+ `AGENTIC_SECURITY_PY_PARSER=auto` (default, falls back silently when
876
+ python3 missing), `cst` (force, error if unavailable), `regex`
877
+ (force legacy).
878
+ - **`scanner/src/ir/CLAUDE.md`** — documents the dual-parser shape,
879
+ the IR contract every parser must produce, and the retirement plan
880
+ for the regex parser.
881
+
882
+ ### What's STILL not modeled
883
+
884
+ The CST parser intentionally emits `kind: 'noop'` for these to keep the
885
+ CFG bounded — the regex parser dropped the entire function for the same
886
+ shapes; we capture the function record but skip the body lowering:
887
+
888
+ - `match` statement case bodies (function is captured; per-case taint
889
+ flow not yet routed)
890
+ - destructuring assignment (`a, b = req.body`) — only single-target
891
+ assigns get a precise `target` field
892
+ - comprehension `if` filters and multi-`for` generators — the elt is
893
+ modeled; the generator's own predicates aren't
894
+
895
+ ### Cost / risk
896
+
897
+ - One `python3` subprocess per `runScan`, not per file. Batched stdin
898
+ payload. Capability probe runs once and is cached.
899
+ - When python3 isn't installed (or is < 3.8), the regex parser handles
900
+ the scan unchanged. No behavior regression for those customers.
901
+ - Set `AGENTIC_SECURITY_PY_PARSER_DEBUG=1` to surface fallback events
902
+ on stderr.
903
+
904
+ ### Tests
905
+
906
+ 12 new CST-specific tests in `scanner/test/parser-py-cst.test.js`
907
+ covering decorators, async, nested-paren defaults, match statements, list
908
+ comprehension taint flow, nested function defs, batch behavior, syntax-
909
+ error isolation per file, single-file/batch shim equivalence. All skip
910
+ gracefully when python3 isn't on PATH. Total suite: 612/612 passing.
911
+
912
+ ## 0.62.0 — agent-harness hardening + slash-command consolidation
913
+
914
+ Five rounds of analysis applied to the plugin's scanner + MCP server + sub-agent
915
+ harness across this release. Each section corresponds to one external source;
916
+ in-source comments tag the originating thread (`premortem #N`, `post-rec #N`,
917
+ `harness-anatomy #N`) for cross-reference.
918
+
919
+ ### Security & integrity (premortem hardening)
920
+
921
+ - **Per-install HMAC key** for `last-scan.json` integrity (was hostname-derived
922
+ and publicly forgeable in CI / containers). Stored at
923
+ `$XDG_CONFIG_HOME/agentic-security/scan-key`; override via
924
+ `$AGENTIC_SECURITY_HMAC_KEY`. Legacy hostname key verified for one release
925
+ to migrate existing signed scans.
926
+ - **MCP reserved-write list expanded** to `.github/`, `.gitlab/`, `.circleci/`,
927
+ `.buildkite/`, `.terraform/`, IaC dirs, every common manifest basename
928
+ (`Dockerfile`, `Jenkinsfile`, `package.json`, lockfiles, `pom.xml`,
929
+ `Cargo.toml`, …) and `*.tf` / `docker-compose.yml`. Closes the
930
+ forged-finding-rewrites-CI-workflow attack path.
931
+ - **`rules.yml disable:` requires signature.** `applyOverrides` now refuses
932
+ the `disable:` list unless `.agentic-security/rules.yml.sig` verifies
933
+ under the per-install HMAC. `severityOverrides`, `custom:`, `ignorePaths`
934
+ are not gated (they don't reduce coverage). Override via
935
+ `$AGENTIC_SECURITY_RULES_UNSIGNED=1`.
936
+ - **MCP `SERVER_VERSION`** reads `package.json` at module load (was a
937
+ hardcoded literal that rotted).
938
+ - **MCP `find_rule_module` tool** for codebase navigation (CWE / family →
939
+ detector file) without grep-and-pray.
940
+ - **MCP `apply_fix`** now passes patch text through unredacted (the prior
941
+ redact-on-output behavior silently corrupted valid patches whose content
942
+ matched a secret-shape).
943
+ - **Per-stableId attempt budget** (default 2) on `apply_fix`. Refuses a
944
+ third attempt with structured `{ budgetExceeded, attempts, maxAttempts }`.
945
+ - **Optional remote audit-log sink.** Set
946
+ `$AGENTIC_SECURITY_AUDIT_WEBHOOK=<url>` and every MCP tool call is
947
+ fire-and-forget POSTed to the witness. Closes the full-file-rewrite
948
+ blind spot of the local-only hash chain.
949
+
950
+ ### Scanner correctness
951
+
952
+ - **`SummaryCache` wired** into the taint engine (k=1 monovariant
953
+ return-taint). Was dead code; now the assign-from-call lattice consults
954
+ cached summaries for resolved callees.
955
+ - **Per-flow source attribution** in IR-TAINT (was first-source-globally-
956
+ seen; produced misattributed evidence in findings).
957
+ - **`finding-defaults` backfill** stamps `parser` + `family` on every
958
+ finding before calibration / confidence run. Closes the "0 parser /
959
+ 20 family null on a smoke run" silent-no-op.
960
+ - **Tautological Brier removed.** `computeBrierFromHistory` (always
961
+ returned 0) replaced with `computeBrierOnHeldOut(samples)` taking real
962
+ labels. New `posture/holdout-eval.js` evaluator: Brier + ECE + per-family
963
+ TP/FP + Wilson CI.
964
+ - **PoC param-key inference** reads the actual handler file window;
965
+ surfaces `paramKey`, `paramKeyConfidence`, `paramKeyInferred`. Low-
966
+ confidence PoCs trigger `regression-test-gen` to refuse rather than
967
+ ship a fake-passing test.
968
+ - **CVE-replay scoring fixed.** TN branch reachable; pre/post scored
969
+ independently. Per-slice F1 (by CWE, language, source-quality tier).
970
+ Wilson 95% CI on the aggregate TP-rate.
971
+ - **Python parser** switched to a balanced-paren scanner for calls + def
972
+ signatures (was a `[^()]*` regex that rejected `db.execute(sanitize(x))`
973
+ and `def f(x=Foo(1,2))`).
974
+
975
+ ### Agent harness
976
+
977
+ - **`security-fixer` writes via MCP, not Edit.** Tool list stripped to
978
+ `Read, Bash, Grep`. The deterministic toolchain (`synthesize_fix` →
979
+ `verify_fix` → `apply_fix`) is the only write path. The LLM is the
980
+ intent layer; the MCP server is the execution layer.
981
+ - **Subagent path-confinement schema** (`agents/_CONFINEMENT.md`) shared
982
+ with the MCP reserved-write list.
983
+ - **`security-fixer` consumes structured `verify_fix.introduced[]`** to
984
+ diagnose template-incomplete vs codebase-prior vs lint-failed outcomes.
985
+ - **PLAN.md decomposition convention** for batched runs:
986
+ `.agentic-security/agent-scratchpad/<agent>/<session>/PLAN.md`. Survives
987
+ context resets; auditable artifact for governance.
988
+ - **AGENTS.md continual learning.** `.agentic-security/AGENTS.md` is the
989
+ append-only narrative file the agent writes to at session end. The
990
+ SessionStart hook reads it; the Stop hook nudges the agent to record an
991
+ entry when work happened.
992
+ - **MCP scratchpad pair** (`append_scratchpad`, `read_scratchpad`)
993
+ confined to `.agentic-security/agent-scratchpad/<agent>/<session>/`.
994
+ Strict path validation; 2 MB / file, 50 MB total caps.
995
+ - **MCP tool-output offloading.** `scan_diff` and `explain_finding`
996
+ results exceeding `OFFLOAD_THRESHOLD` (default 10) write the full payload
997
+ to the scratchpad; the response shrinks to `{ head, tail, total,
998
+ scratchpadPath, pagingHint }`. The agent pages through with
999
+ `read_scratchpad`.
1000
+ - **MCP `lookup_cve`** tool: read-only access to local OSV / KEV / EPSS
1001
+ caches with staleness tiers. Closes the knowledge-cutoff gap for SCA
1002
+ reasoning without triggering a network fetch.
1003
+ - **MCP `append_agents_memory` / `read_agents_memory`** tools wrap the
1004
+ AGENTS.md surface.
1005
+
1006
+ ### Evals + benches
1007
+
1008
+ - **CVE-replay corpus tiered** into `regression/` (CI gates here — F1=1.0
1009
+ required) and `capability/` (frontier; failure informational).
1010
+ Graduation policy: 5 consecutive passes → promote.
1011
+ - **`npm run bench:cve-replay:ci`** new CI gate.
1012
+ - **Agent-task corpus** at `bench/agent-tasks/security-fixer/`: end-to-end
1013
+ eval of the deterministic toolchain (synth → verify → apply) against
1014
+ fresh temp copies of fixtures. 7 graders per task; pass@1 reporting.
1015
+ - **`llm-validator` consistency harness** (`scanner/src/llm-validator/
1016
+ consistency.js` + `agentic-security-consistency` bin): pass^k stability
1017
+ measurement across N trials on the same fixture set.
1018
+ - **Human ↔ LLM grader calibration** (`posture/grader-calibration.js`):
1019
+ Cohen's κ between `/triage` human verdicts and validator verdicts on
1020
+ the stableId overlap. Alarm when κ < 0.6 with n ≥ 10.
1021
+ - **`agentic-security-audit` CLI**: `review`, `metrics`, `verify`
1022
+ subcommands for the MCP audit log. `--by-session` aggregation with
1023
+ outlier flagging (default ≥20 calls per tool).
1024
+ - **`audit.js`** stamps `sessionId` on every entry.
1025
+
1026
+ ### Repo structure (Claude-Code-at-scale)
1027
+
1028
+ - **`.claude/settings.json`** with team-committed read-deny list
1029
+ (generated bundle, bench caches, scan-state JSON) to keep noise out of
1030
+ context.
1031
+ - **Subdirectory `CLAUDE.md` files** added: `scanner/`,
1032
+ `scanner/src/{sast,posture,dataflow,mcp}/`. Root `CLAUDE.md` trimmed
1033
+ 253 → 115 lines (pointers + gotchas only).
1034
+ - **`npm test` split into scoped scripts**: `test:smoke / sast / posture /
1035
+ dataflow / mcp / report / bench-modules / lifecycle`. Full suite chains
1036
+ them.
1037
+ - **Stop hook (`hooks/session-stop-drift-check.js`)** flags new modules
1038
+ in `scanner/src/{sast,posture,dataflow,mcp}/` not yet indexed in the
1039
+ matching subdir CLAUDE.md, plus prompts for an AGENTS.md entry when
1040
+ the session touched tracked files.
1041
+ - **SessionStart self-check (`hooks/session-start-self-check.js`)**
1042
+ validates every command/agent frontmatter shape; surfaces malformed
1043
+ surfaces.
1044
+ - **`skills/add-scan-rule/SKILL.md`** holds the "add a new SAST rule"
1045
+ workflow as an on-demand skill (was in root CLAUDE.md).
1046
+ - **`docs/POSITIONING.md`** — explicit ICP statement (vibecoder-first;
1047
+ pro follow-on).
1048
+
1049
+ ### Slash-command consolidation (LangChain harness-anatomy #5)
1050
+
1051
+ The 77-command surface was the exact "tool proliferation" anti-pattern the
1052
+ post warned about. Always-paid frontmatter (description + argument-hint)
1053
+ trimmed **20.3 KB → 11.3 KB (44% reduction)**.
1054
+
1055
+ - **Description cap of 120 chars** + argument-hint cap of 200 chars,
1056
+ enforced by `scripts/lint-command-descriptions.mjs` in
1057
+ `npm run test:lifecycle`. 76 surfaces trimmed.
1058
+ - **Eleven commands folded into canonical forms**, with deprecated
1059
+ aliases kept one release for muscle memory:
1060
+
1061
+ | Old | New |
1062
+ |-----|-----|
1063
+ | `/ci-gate-multi` | `/ci-gate --provider <name>` |
1064
+ | `/rotate-key-auto` | `/rotate-secret --auto` |
1065
+ | `/trim-dead-code` | `/trim --what code` |
1066
+ | `/trim-dependencies` | `/trim --what deps` |
1067
+ | `/story-explain` | `/explain --narrative` |
1068
+ | `/security-badge` | `/security-attestation` (default) |
1069
+ | `/security-onepager` | `/security-attestation --format onepager` |
1070
+ | `/trust-page` | `/security-attestation --format page` |
1071
+ | `/dep-pinning` | `/supply-chain-check --show pinning` |
1072
+ | `/dep-freshness` | `/supply-chain-check --show freshness` |
1073
+ | `/dep-alternatives` | `/supply-chain-check --show alternatives` |
1074
+
1075
+ - **Skipped on purpose:** `/secure` (vibecoder entry point — kept
1076
+ untouched); the LLM-sec cluster (each command serves a distinct
1077
+ workflow). Tier 3 demote-to-skills also skipped after investigation —
1078
+ Claude Code today loads both commands and skills' descriptions in the
1079
+ always-paid surface, so the move wouldn't actually save context.
1080
+
1081
+ ### Tests
1082
+
1083
+ 600/600 tests passing. CVE-replay CI gate green (regression F1=1.0 on
1084
+ 3 entries). Lint gate green (all 80 surfaces within caps).
1085
+
1086
+ ## 0.51.0 — 11 of 16 PRD-missing features (5 research items deferred)
1087
+
1088
+ This release lands all 11 tractable FRs from the v2 PRD audit. The 5
1089
+ research-level FRs (k=2 calling context, narrow symbolic execution, hybrid
1090
+ static+dynamic, eBPF/dtrace live instrumentation, LLM-based intent
1091
+ inference) are deferred to Phase 6+ with their reasons documented in the
1092
+ PRD.
1093
+
1094
+ ### Shipped
1095
+
1096
+ - **FR-CHAIN-FILTER** (`posture/cross-lang-meta.js`). Cross-language chain
1097
+ detectors only chain to chain-worthy families (sql-injection,
1098
+ command-injection, xss, ssrf, code-injection, deserialization, xxe,
1099
+ path-traversal, idor, mass-assignment, prototype pollution, and others).
1100
+ Eliminates the "queue chain to CSRF" semantic-noise the polyglot bench
1101
+ surfaced.
1102
+ - **FR-FAMILY-REGISTRY** (`posture/cross-lang-meta.js`). Cross-language
1103
+ chains get canonical family names (xlang-openapi / xlang-grpc /
1104
+ xlang-graphql / xlang-queue / xlang-orm / xlang-iac / xlang-unknown).
1105
+ - **FR-LEARN-7** (`bin/agentic-security reset`). Right-to-delete CLI;
1106
+ wipes accumulated learned state while preserving operator-authored
1107
+ config. `--yes` to actually delete; `--keep <names>` to spare specific
1108
+ items.
1109
+ - **FR-PY-SAST** (`sast/python-sinks.js`). Python sink-side coverage:
1110
+ SQLAlchemy text() with f-string, cursor.execute concat, os.system /
1111
+ subprocess shell=True, pickle.loads, yaml.load, marshal.loads, eval/exec
1112
+ on request data, compile() on user input, flask.send_file with user
1113
+ path, send_from_directory, open() with f-string, requests verify=False,
1114
+ ssl._create_unverified_context, requests/urlopen with user URL, lxml/
1115
+ etree on user input. **Closes G3:** polyglot F1 went from 0.727 → 1.00.
1116
+ - **FR-VER-3** (`posture/regression-test-gen.js`). Per finding with a PoC,
1117
+ emit a framework-idiomatic regression test (Jest for Node, pytest for
1118
+ Python). Surfaced as `f.regression_test = { lang, framework, filename,
1119
+ runHint, code }`.
1120
+ - **FR-LIVE-HARNESS** (`posture/verifier-target.js`). Schema for
1121
+ `.agentic-security/verifier-target.yaml` describing how to bring up the
1122
+ customer's app (docker-compose or command shape). The `verify --live`
1123
+ CLI auto-discovers it. Safety: `command` shape requires a known-good
1124
+ start pattern unless `AGENTIC_SECURITY_VERIFY_TARGET_OK=1`.
1125
+ - **FR-XSAT-7** (`posture/iam-policy.js`). AWS IAM policy auditing.
1126
+ Curated dangerous-actions list (iam:*, s3:*, lambda:*, ec2:*, dynamodb:*,
1127
+ rds:*, secretsmanager:*, kms:*). Flag Effect=Allow + wildcard resource
1128
+ + no Condition.
1129
+ - **FR-XSAT-8** (`posture/container-runtime.js`). Dockerfile + k8s
1130
+ manifest + ECS task def. Detects USER root, privileged: true,
1131
+ hostNetwork, hostPID, runAsUser: 0, capabilities ALL/SYS_ADMIN,
1132
+ /var/run/docker.sock bind-mount, ADD with remote URL.
1133
+ - **FR-LOGIC-1 + FR-LOGIC-2 + FR-LOGIC-7** (`posture/business-logic.js`).
1134
+ AuthZ matrix construction (per-resource consistency check + IDOR
1135
+ detection on mutation routes with :id but no ownership/role check),
1136
+ state-machine extraction (catches writes outside the declared status
1137
+ set), and negative-test-gap detection (auth route + happy-path test +
1138
+ no 401/403 assertion = miss).
1139
+ - **FR-LOGIC-6** (`posture/flow-narration.js`). Per high-severity finding,
1140
+ emit a one-paragraph attacker→impact→cost narrative. Template fallback
1141
+ for 10 CWE families; opt-in LLM mode via
1142
+ `AGENTIC_SECURITY_FLOW_NARRATION_LLM=1`.
1143
+ - **FR-LEARN-6** (`posture/rule-synthesis.js`, `agentic-security rule-synth`).
1144
+ Read triage-feedback.json, cluster FP verdicts by family + dir prefix,
1145
+ propose a YAML suppression rule when ≥ 5 verdicts cluster. Proposes —
1146
+ doesn't activate.
1147
+ - **FR-SDLC-5** (`report/index.js::toSTIX`). `--format stix` emits a STIX
1148
+ 2.1 bundle with one Vulnerability + Indicator + Relationship SDO per
1149
+ finding. CWE external_references; x_* custom properties for severity,
1150
+ calibrated confidence, exploitability, verifier verdict.
1151
+ - **FR-SDLC-9** (`posture/policy-gate.js`, `--policy <file.rego>`).
1152
+ Policy-as-code gate. External OPA binary preferred; embedded mini-DSL
1153
+ evaluator for the common case. Supports == != > < >= != comparisons
1154
+ on `finding.<field>` and `sprintf("...", [args])` for messages.
1155
+
1156
+ ### Deferred (Phase 6+ research)
1157
+
1158
+ - FR-SEM-2 k=2 calling-context — requires dataflow engine refactor
1159
+ - FR-SEM-5 narrow symbolic execution — needs KLEE-style backend
1160
+ - FR-SEM-6 hybrid static+dynamic — needs customer app instrumentation
1161
+ - FR-VER-5 eBPF/dtrace live instrumentation — Linux/macOS only, opt-in
1162
+ - FR-LOGIC-5 intent inference — LLM-based; pending prompt-injection-safe design
1163
+
1164
+ ### Tests, bench, integrity
1165
+
1166
+ - 295 + 26 + 2 unit tests pass (was 240 before this release).
1167
+ - Synthetic-bench F1 = 100% (baseline updated; new IDOR expected entry added
1168
+ for orm-raw-sql:15 — AuthZ-matrix detector finds a genuine missing
1169
+ ownership check that wasn't previously caught).
1170
+ - Polyglot bench F1 = 100% (was 72.7%; Python SAST coverage closed G3 gap).
1171
+ - No dead exports.
1172
+
1173
+ ### Honesty correction
1174
+
1175
+ The PRD v2 said all 16 missing features. This release ships 11; 5 are
1176
+ honestly deferred. The PRD-v3 update (next session) should reflect this
1177
+ delivery state.
1178
+
1179
+ ## 0.50.0 — next-gen SAST Phase 1 complete (5 of 5 units)
1180
+
1181
+ Closes Phase 1 of `docs/PRD-next-gen-sast-phase1.md`. The two units queued
1182
+ from v0.49.0 (P1.2 verifier sandbox, P1.4 polyglot bench) are now wired.
1183
+
1184
+ ### Shipped & wired
1185
+
1186
+ - **P1.2 — Verifier sandbox loop (FR-VER-3, FR-VER-6, FR-VER-7).** New
1187
+ module `scanner/src/posture/verifier.js`. Consumes the `f.poc` artifacts
1188
+ from P1.1 and assigns a per-finding `verifier_verdict`:
1189
+ - `verified-exploit` — PoC ran against a live target and exited 0
1190
+ - `verified-by-llm` — Layer-3 LLM accepted the finding
1191
+ - `verified-sanitizer-absence` — pattern-based proof that no sanitizer
1192
+ appears in a ±10 line window around the sink (9 vuln families covered)
1193
+ - `unverified-by-design` — CWE family where v1 explicitly doesn't ship a PoC
1194
+ - `cannot-verify` — sandbox error, missing target, PoC validation failed
1195
+
1196
+ PoC static validation refuses destructive shell payloads, hardcoded cloud
1197
+ metadata IPs, runaway-length code, and Node PoCs without a deterministic
1198
+ `process.exit(...)`. Sandbox execution mode (opt-in via
1199
+ `AGENTIC_SECURITY_VERIFY_LIVE=1` + `AGENTIC_SECURITY_VERIFY_TARGET=<url>`)
1200
+ runs each PoC under Docker with `--cap-drop=ALL --memory=256m --read-only
1201
+ --user=nobody`; falls back to subprocess with `ulimit` when Docker isn't
1202
+ available. Fail-closed: any error → `cannot-verify`, never silent drop.
1203
+ New CLI subcommand `agentic-security verify [--finding <id>] [--live
1204
+ --target <url>]` re-runs the verifier loop on `last-scan.json` and
1205
+ persists the verdicts. Smoke on `vulnerable-js` fixture: 7 findings get
1206
+ `verified-sanitizer-absence` static proofs; 2 get `unverified-by-design`;
1207
+ the rest are `cannot-verify` pending live execution.
1208
+
1209
+ - **P1.4 — Cross-language polyglot benchmark (G3).** New `bench/polyglot/`
1210
+ with a tiny dependency-free YAML parser, the runner `runner.mjs`, and 4
1211
+ starter cases:
1212
+ - 01 HTTP→Python SQL (canonical Phase-2 detector gap — Python SAST)
1213
+ - 02 Queue→Python cmd (same gap; queue chain detected; sink not yet)
1214
+ - 03 ORM round-trip (Node-only; mass-assignment + data-exposure TPs)
1215
+ - 04 HTTP→Node SQL (clean end-to-end test of the OpenAPI cross-asset bridge)
1216
+
1217
+ Default mode `recall-only` measures "does the chain fire where it
1218
+ should?" rather than penalizing incidental findings (header-hardening,
1219
+ CSRF on test routes, body-parser DoS warnings). Set `mode: strict` in a
1220
+ manifest for full-precision scoring. Current overall F1 = 72.7%; PRD G3
1221
+ target is 85%; the 27pp gap is Python-side detector coverage (Phase 2).
1222
+ New `npm run bench:polyglot`.
1223
+
1224
+ ### Tests, bench, integrity
1225
+
1226
+ - 19 new tests in `test/verifier.test.js` (validation, sanitizer proofs,
1227
+ verdict assignment, batch annotation, fail-closed defense-in-depth).
1228
+ - All 218 + 26 + 2 unit tests pass.
1229
+ - Synthetic-bench F1 still 100%.
1230
+ - Polyglot bench F1 72.7% (above 30% v1 floor; below 85% G3 target — the
1231
+ gap is documented in `bench/polyglot/README.md`).
1232
+ - No new dead exports.
1233
+
1234
+ ### Honesty correction
1235
+
1236
+ The PRD's G2 target ("≥80% of high+/critical findings ship with a verified
1237
+ PoC") is not measured yet — that requires a labeled run-against-target,
1238
+ which the v1 verifier supports via `--live --target` but we haven't built
1239
+ a target harness. v1 ships the framework; the labeled measurement is
1240
+ Phase 5 work.
1241
+
1242
+ ## 0.49.0 — next-gen SAST Phase 1 (3 of 5 units)
1243
+
1244
+ Implements 3 of the 5 Phase-1 shippable units from
1245
+ `docs/PRD-next-gen-sast-phase1.md` (parent `docs/PRD-next-gen-sast.md`).
1246
+ The two queued for the next session are noted at the end.
1247
+
1248
+ ### Shipped & wired
1249
+
1250
+ - **P1.1 — PoC generator framework (FR-VER-2).** New module
1251
+ `scanner/src/posture/poc-generator.js` ships runnable proof-of-concept
1252
+ files for the top-10 CWE families from the parent PRD: SQL injection,
1253
+ command injection, XSS, path traversal, SSRF, code injection, CSRF, open
1254
+ redirect, XXE, and insecure deserialization. Each PoC is a self-contained
1255
+ Node script with one `fetch()` call, evidence-pattern detection, and a
1256
+ deterministic exit code (0 = exploit demonstrated, 1 = not demonstrated, 2
1257
+ = error). Templates respect a safety policy: no destructive shell commands,
1258
+ no real cloud-metadata IPs, no outbound network beyond localhost. Smoke:
1259
+ scanning `test/fixtures/vulnerable-js` produces 8 PoCs across 6 distinct
1260
+ CWE families. Findings get a new `f.poc = { lang, kind, cwe, family, runHint, code }`
1261
+ field surfaced in normalizeFindings and SARIF. Families without v1 template
1262
+ coverage get `f.poc = null` and a documented entry in
1263
+ `poc-cwe-map.js::NO_POC_FAMILIES`.
1264
+ - **P1.3 — Brier-calibrated confidence (FR-UX-1, FR-UX-2).** New module
1265
+ `scanner/src/posture/calibration.js` turns the ordinal `confidence` score
1266
+ into a calibrated probability with 95% Wilson confidence interval. Per
1267
+ finding: `calibrated_confidence`, `calibrated_confidence_ci`,
1268
+ `calibrated_n`, `calibration_reason` (set when null — "insufficient-samples"
1269
+ / "no-family" / "no-history"). Seed corpus in
1270
+ `calibration-seed.json` covers 20 vuln families from the OWASP Benchmark +
1271
+ Juliet labeled runs; the customer's `.agentic-security/validator-metrics.json`
1272
+ overrides per-family when sample count is higher. Calibration is honest
1273
+ about uncertainty: `MIN_SAMPLES_FOR_CALIBRATION = 30`. The PRD G1 target
1274
+ (Brier ≤ 0.10 on a held-out labeled set) is queued for Phase 5; this ships
1275
+ the framework, the math, and the seed data.
1276
+ - **P1.5 — Cross-language message queues (FR-XSAT-4).** New module
1277
+ `scanner/src/posture/cross-lang-queues.js` indexes producer and consumer
1278
+ call sites for Kafka (kafkajs, kafka-clients, confluent-kafka), AWS SQS
1279
+ (aws-sdk, boto3), RabbitMQ (amqplib, pika, Spring `RabbitTemplate`), Redis
1280
+ Streams (XADD / XREAD across Node, Python, Go), and Google Pub/Sub. When
1281
+ producer and consumer agree on a topic name and the consumer file has a
1282
+ high+ finding, we emit a `cross_language: true` chain back to the producer
1283
+ (and vice-versa). Severity is demoted one tier so the chain doesn't double-
1284
+ count in severity bucketing. Honest about uncertainty: only literal-string
1285
+ topic matches; constant-folded names left for Phase 2.
1286
+
1287
+ ### Tests, bench, integrity
1288
+
1289
+ - 14 new tests in `test/poc-generator.test.js` (PoC coverage + safety).
1290
+ - 9 new tests in `test/cross-lang-queues.test.js`.
1291
+ - 14 new tests in `test/calibration.test.js` (Wilson + Brier + annotation).
1292
+ - All 199 + 26 + 2 unit tests pass.
1293
+ - Synthetic-bench F1 still 100%.
1294
+ - No new dead exports; `test/no-dead-modules.test.js` both subtests pass.
1295
+
1296
+ ### Queued for next session
1297
+
1298
+ - **P1.2 — Verifier sandbox loop (FR-VER-3, FR-VER-6, FR-VER-7).** Needs
1299
+ Docker integration, network isolation, and a sandbox-escape test. The PoC
1300
+ generator already produces files; the verifier executes them in isolation.
1301
+ - **P1.4 — Cross-language polyglot benchmark (G3).** Needs fixture builds
1302
+ across Node → Python → Java → Postgres. Measures the cross-asset claims
1303
+ we've now made for HTTP/gRPC/GraphQL/ORM/IaC/Queues.
1304
+
1305
+ ### Honesty correction
1306
+
1307
+ The parent PRD claimed v1.0.0 ships at ~15 months. This release is one
1308
+ session of work; we're at ~v0.49.0 on a path to v0.50.0 (Phase-1 release).
1309
+ The PRD's G1 (Brier ≤ 0.10 on a held-out set) is not yet measured — the
1310
+ shipped calibration is on the SEED corpus, which is by definition not held
1311
+ out. We surface this in the `_caveat` field of `calibration-seed.json`.
1312
+
1313
+ ## 0.48.0 — fourth-round premortem + CI bench failure
1314
+
1315
+ ### Bench regression fix
1316
+
1317
+ The synthetic-bench CI job started failing at v0.47.0. Two issues:
1318
+
1319
+ - **Root-cause clustering over-merged across detectors.** Two distinct
1320
+ detectors (structural `Open Redirect` and `host-header`) that share CWE-601
1321
+ on the same `res.redirect(...)` line were collapsing into one finding,
1322
+ hiding the host-header bug. `sinkKey` now includes `f.parser` so two
1323
+ detectors never merge. Empty `sinkExpr` keys are skipped (was bucketing all
1324
+ rate-limit findings into one).
1325
+ - **Two expected entries pointed at the same post-clustered line.** Cleaned
1326
+ up `expected.json` for `orm-raw-sql` and added six new `csrf` family
1327
+ expected entries for fixtures that legitimately lack CSRF protection.
1328
+ Baseline refreshed.
1329
+
1330
+ ### Node 20 deprecation
1331
+
1332
+ Bumped `actions/{checkout,setup-node,upload-artifact}` to v5 and
1333
+ `actions/github-script` to v8 (Node 24 native). Dropped the
1334
+ `FORCE_JAVASCRIPT_ACTIONS_TO_NODE24` workaround env.
1335
+
1336
+ ### Fourth-round premortem — 15 findings closed
1337
+
1338
+ - **4R-1**: rule-pack signing is fail-closed in CI. When `CI=true` (and the
1339
+ common variants) and no signing keys are configured, pass-through mode
1340
+ refuses rather than silently accepting. Opt-in via
1341
+ `AGENTIC_SECURITY_ALLOW_PASSTHROUGH_IN_CI=1`.
1342
+ - **4R-2**: `scanner/dist/agentic-security.mjs` is now correctly tracked in
1343
+ `.gitignore`. The previous "Not committed" comment lied — the bundle was
1344
+ always committed, the comment was wrong. Now `dist/*` is ignored except
1345
+ `agentic-security.mjs` and `agentic-security.mjs.sha256`.
1346
+ - **4R-3**: `scan.yml` downloads the bundle with checksum verification. New
1347
+ `scanner-ref` workflow input lets callers pin to a release tag or commit SHA
1348
+ for supply-chain hardening. `scanner/dist/agentic-security.mjs.sha256` is
1349
+ generated by `npm run build` and committed.
1350
+ - **4R-4**: catalog `filterByProvenance` memoizes per (entries, mode) so the
1351
+ taint hot path no longer allocates a fresh array per match.
1352
+ - **4R-5**: LSP `_depCache` is granularly invalidated on manifest save — only
1353
+ the saved file's entry is refreshed, not the whole project tree.
1354
+ - **4R-6**: `no-dead-modules.test.js` has a sister "allowlist decay" check.
1355
+ Stale ALLOWLIST entries (25 of them, from v0.47.0) were removed.
1356
+ - **4R-7**: `version.js` warns to stderr when `package.json` can't be read
1357
+ instead of silently falling back to `'unknown'`.
1358
+ - **4R-8**: `applyFix` accepts `stableId` from the caller (`bin/` and `mcp/`)
1359
+ rather than re-deriving via `findingId`, which rotates on line-shift.
1360
+ - **4R-9**: fix-history stale-lock reap is PID-aware. Only unlinks when the
1361
+ PID is dead OR the file's old AND the PID is unkillable. Atomic re-read of
1362
+ the lockfile before unlink avoids racing a fresh acquirer.
1363
+ - **4R-10**: SARIF emits a tri-state `signatureStatus: 'verified' | 'unsigned'
1364
+ | 'pass-through'` field. The legacy `_unsigned` / `_passThroughSigning`
1365
+ flags are emitted alongside for one release of grace.
1366
+ - **4R-11**: CLI and Markdown reports now render `validator_verdict` so SCA
1367
+ findings tagged `not-applicable` aren't invisible to the reader.
1368
+ - **4R-12**: custom-rules deadline is per-scanRoot, accumulating across calls
1369
+ within a process. New `resetCustomRulesBudget(scanRoot)` for long-lived LSP
1370
+ scans; wired into the LSP server.
1371
+ - **4R-13**: `prepublishOnly` refuses to overwrite a locally-edited
1372
+ `scanner/CHANGELOG.md` that differs from the canonical `../CHANGELOG.md`.
1373
+ - **4R-14**: new `scripts/nist-compliance/test_regex_redos.py` asserts every
1374
+ import regex runs in linear time on pathological input — guards against
1375
+ re-introducing the `(?:[^)]|\n)+?` ReDoS fixed in `e0c669b`.
1376
+ - **4R-15**: `PROMPT_VERSION` is now a public export of `llm-validator/index.js`.
1377
+ The `validator-cache gc` subcommand no longer reaches through the
1378
+ underscore-prefixed `_internal` private API and fails loudly if the version
1379
+ can't be read.
1380
+
1381
+ ### Honesty note
1382
+
1383
+ All 15 fourth-round findings are closed without dead code (verified by the
1384
+ no-dead-modules test). The bench failure was a real regression introduced
1385
+ in v0.47.0 (clustering by CWE alone) — caught by CI, fixed by adding
1386
+ `f.parser` to the cluster key.
1387
+
1388
+ ## 0.47.0 — third-round premortem remediation
1389
+
1390
+ Third adversarial premortem identified 17 findings against the v0.46.0
1391
+ remediation. All 17 are now closed. Highlights:
1392
+
1393
+ - **3R-1: integration test for dead exports** — new `test/no-dead-modules.test.js`
1394
+ walks `scanner/src/{posture,llm-validator,dataflow,lsp,ir,mcp}` and asserts
1395
+ every exported symbol has at least one external call site (`.js` files and
1396
+ `commands/*.md`). Allowlist for legitimate library-style exports. Closes the
1397
+ recurring "wired in code review, dead in code" failure mode.
1398
+
1399
+ - **3R-2 / 3R-3: single-sourced version** — `scanner/src/posture/version.js`
1400
+ reads `scanner/package.json#version` at module load; SARIF `tool.driver.version`
1401
+ and `CURRENT_RULESET_VERSION` now derive from it instead of independently
1402
+ hardcoded constants that diverged on every release.
1403
+
1404
+ - **3R-4: signing graceful degradation** — `rule-pack-signing.js` operates in a
1405
+ pass-through mode when both bundled and project keys are absent. One audit
1406
+ warning per session; findings carry `_passThroughSigning:true`. Set
1407
+ `AGENTIC_SECURITY_STRICT_SIGNING=1` to disable pass-through.
1408
+
1409
+ - **3R-5: CLI keygen safety rails** — `agentic-security-rule keygen` refuses
1410
+ `--out` paths under `.agentic-security/`; warns on non-TTY stdout without
1411
+ `--out`; writes private-key files mode 0600. `--i-understand-private-keys`
1412
+ to override.
1413
+
1414
+ - **3R-6: provenance surfaced in reports** — `normalizeFindings` carries
1415
+ `_unsigned` and `_passThroughSigning` through; SARIF `result.properties`
1416
+ emits `unsigned:true` / `passThroughSigning:true`; SARIF
1417
+ `invocations[].properties` now includes `rulesetVersion`, `rulesetVersionSource`,
1418
+ and `rulesetVersionMismatch` for trend attribution.
1419
+
1420
+ - **3R-7: requiresReAudit is now load-bearing** — `bench-realworld.js` reads
1421
+ curated expected JSONs' `requiresReAudit:true`, emits a stderr warning per
1422
+ affected corpus, and tags the corpus result with
1423
+ `requiresReAudit:true` so consumers know its F1 is informational.
1424
+
1425
+ - **3R-8: global deadline for custom rules** — `applyCustomRules()` now caps
1426
+ the total scan time across all files and all rules at 30s (overridable via
1427
+ `AGENTIC_SECURITY_CUSTOM_RULES_BUDGET_MS`), guarding against ReDoS sprees
1428
+ across many files even when each individual regex respects its 200ms budget.
1429
+
1430
+ - **3R-9: LSP dep-cache invalidation on manifest save** — saving any
1431
+ `package.json`/`pyproject.toml`/`Cargo.toml`/etc. now invalidates the cached
1432
+ dep snapshot before re-scanning, so freshly added vulnerable packages and
1433
+ removed ones reflect immediately in editor diagnostics.
1434
+
1435
+ - **3R-10: catalog OFFICIAL_ONLY is per-match** — `AGENTIC_SECURITY_CATALOG_OFFICIAL_ONLY=1`
1436
+ is now read per source/sink match instead of once at module load, so CI lanes
1437
+ that toggle strict mode just before invocation are actually honored.
1438
+
1439
+ - **3R-11: validator preflight handles SCA locators** — findings with
1440
+ `parser:'SCA'` or `pkg`/`component`/`purl` set are tagged
1441
+ `validator_verdict:'not-applicable'` rather than `'unvalidated'`, which
1442
+ was misleading for findings that an LLM cannot meaningfully judge.
1443
+
1444
+ - **3R-12: applyFix recover() cross-checks against last-scan.json** — the
1445
+ fix-history log entry records the matching finding's stableId at apply
1446
+ time; `recover()` after a crash now tags promoted entries as
1447
+ `applied-stale` when the finding has vanished from last-scan.json.
1448
+
1449
+ - **3R-13: file lock around log writes** — concurrent `applyFix`, `recover`,
1450
+ and `undo` invocations no longer race the `log.json` write; serialization
1451
+ via `log.lock` with 30s stale-lock reaping and 5s contention timeout.
1452
+
1453
+ - **3R-14: validator-cache GC subcommand** — `agentic-security validator-cache
1454
+ stats|gc [--older-than N] [--dry-run]` prunes `.agentic-security/llm-cache/`
1455
+ by age and prompt-version mismatch.
1456
+
1457
+ - **3R-15: tier cutoffs stable under 2-decimal rounding** — confidence tier
1458
+ (`high|medium|low|very-low`) is now derived from the 2-decimal display value,
1459
+ so a finding reported as "0.75" never lands in two tiers depending on the
1460
+ viewer's rounding.
1461
+
1462
+ - **3R-16: CHANGELOG ships with npm package** — `prepublishOnly` copies
1463
+ CHANGELOG.md into `scanner/`; added to `package.json#files`. The repo-root
1464
+ copy remains canonical; the in-package copy is gitignored.
1465
+
1466
+ - **3R-17: fix-history log compaction** — `agentic-security undo --compact
1467
+ [--retain-days N] [--prune-backups]` archives terminal entries (reverted,
1468
+ failed, applied-stale) older than the retention window into
1469
+ `log-archive-YYYY-MM.json`, optionally pruning their `.bak` files.
1470
+
1471
+ ### Honesty correction
1472
+
1473
+ No claims in this release exceeded what shipped. v0.47.0 closes the 17
1474
+ third-round premortem findings against v0.46.0 cleanly; the round-4 premortem
1475
+ will surely find more, and that is fine.
1476
+
1477
+ ## 0.46.0 — second-round premortem remediation + honesty correction
1478
+
1479
+ ### Honesty correction for v0.45.0
1480
+
1481
+ The v0.45.0 commit message (`3acca6b fix(security): premortem remediation —
1482
+ all 15 findings`) claimed all 15 first-round premortem findings were
1483
+ remediated. A second-round adversarial premortem identified five of those
1484
+ "closures" as dead code or wire-up regressions:
1485
+
1486
+ - `posture/fix-history.js::recover()` was exported but never called from
1487
+ any startup path → pending entries from a crashed `applyFix` accumulated
1488
+ forever. **Now fixed**: wired into `runScan.js` at top of every scan.
1489
+
1490
+ - `posture/ruleset-version.js::stampScan()` / `effectiveVersion()` were
1491
+ exported but never imported → ruleset-pinning was documentation only.
1492
+ **Now fixed**: wired into `runScan.js` to stamp every scan result.
1493
+
1494
+ - `posture/validator-metrics.js::recordTriage()` was exported but the
1495
+ `/triage` slash command did not invoke it → per-CWE production metrics
1496
+ never accumulated. **Now fixed**: `/triage` now calls `recordTriage` on
1497
+ every verdict (subject to the new symmetric learn gate).
1498
+
1499
+ - The custom-rules pipeline tagged unsigned RULES with `_unsigned: true`
1500
+ but the per-finding emitter (`toFinding`) did not copy the marker →
1501
+ the audit chain promised by the warning log did not exist in the data.
1502
+ **Now fixed**: findings now carry `_unsigned: true` when their rule does.
1503
+
1504
+ - `engine.js:6941` called the LLM validator with `concurrency: 4`,
1505
+ overriding the validator's `concurrency: 1` determinism default →
1506
+ cache-cold runs produced non-deterministic SARIF in the same commit
1507
+ that promised determinism. **Now fixed**: respects `AGENTIC_SECURITY_LLM_CONCURRENCY` env (default 1).
1508
+
1509
+ ### Other second-round fixes
1510
+
1511
+ - **String-aware JSON parser** in the LLM validator. Previous
1512
+ `parseLastJsonObject` ignored string-state and could be fooled by braces
1513
+ inside JSON string literals. Rewritten to walk forward with full string-
1514
+ and escape-state tracking, then return the LAST valid candidate.
1515
+
1516
+ - **Empty file/line pre-flight** in `validateOne`. A validator response of
1517
+ `{"file":"","line":0,...}` trivially satisfied the cross-check on findings
1518
+ without precise location. Now refused with `unvalidated`.
1519
+
1520
+ - **Protected signing trust root**: trusted keys come from a built-in
1521
+ constant (`BUNDLED_OFFICIAL_KEYS`); project-local `.agentic-security/trusted-keys.json`
1522
+ is refused unless `AGENTIC_SECURITY_ALLOW_PROJECT_KEYS=1` is set
1523
+ (audit-logged). A PR contributor can no longer bootstrap a key into trust.
1524
+
1525
+ - **Key revocation**: trusted-keys.json `crl[]` honored (signature-hash
1526
+ blacklist); `revokedAt` field on each key honored (signatures dated after
1527
+ revocation refused).
1528
+
1529
+ - **`agentic-security-rule` CLI** for `keygen` / `sign` / `verify` with a
1530
+ first-time setup walkthrough and explicit private-key-handling warnings.
1531
+
1532
+ - **Symmetric AGENTIC_SECURITY_LEARN gate**: `/triage` no longer writes
1533
+ verdicts to `triage-feedback.json` without explicit opt-in. Prevents an
1534
+ attacker from poisoning the file in advance of someone flipping the
1535
+ read-side flag.
1536
+
1537
+ - **Worklist deadline check**: deep-mode taint engine honors `deadlineMs`
1538
+ inside `analyzeFunction`'s worklist (every 128 iterations). Pathological
1539
+ CFGs can no longer hold past the global timeout.
1540
+
1541
+ - **LSP loads dep-manifest files**: per-save scan in `lsp/server.js` now
1542
+ pre-walks the project tree once for `package.json` / `pom.xml` / `.proto`
1543
+ / `.graphql` / `.tf` so SCA + cross-language passes have their inputs.
1544
+
1545
+ - **SARIF notifications for caveats**: `tool.driver.notifications` and
1546
+ `invocations.toolExecutionNotifications` now carry the load-bearing
1547
+ warnings (priority scores are ordinal, OWASP Benchmark numbers are
1548
+ benchmark-tuned). Customer CI ingesters see them without reading docs.
1549
+
1550
+ - **Re-sanitization on cache read**: validator reasoning passes through
1551
+ `sanitizeReasoning` again on cache hit (defense in depth against any
1552
+ future write-path regression).
1553
+
1554
+ - **Provenance + requiresReAudit fields** added to all 25 bootstrapped GT
1555
+ files under `bench/.../expected/`. Machine-readable signal that the
1556
+ bootstrap origin is self-referential.
1557
+
1558
+ ### What this commit honestly does NOT close
1559
+
1560
+ - BUNDLED_OFFICIAL_KEYS is empty — a production deployment needs the
1561
+ maintainers to generate a real keypair, distribute the private key
1562
+ offline, and ship the public key. Today's effective behavior is "no
1563
+ official keys, project keys via opt-in."
1564
+ - The CVE-replay corpus is still 1 starter entry (G1 second half remains
1565
+ not delivered).
1566
+ - Real-world Java F1 generalization is still unmeasured.
1567
+
1568
+ ## 0.45.0 — first-round premortem remediation
1569
+
1570
+ (See commit 3acca6b. Some closures were dead-code; see honesty correction
1571
+ above.)
1572
+
1573
+ ## 0.44.0 — multi-session items: gRPC/GraphQL/ORM cross-lang, IDE plugins
1574
+
1575
+ ## 0.43.0 — small engineering items: MCP verify_fix/synthesize_fix,
1576
+ SentQL path predicates, conversation-context hook, fix-plan,
1577
+ per-CWE metrics
1578
+
1579
+ ## 0.42.0 — Layer 1 IR + Layer 2 interprocedural taint, F1=0.907 on
1580
+ OWASP Bench v1.2 (blind, strict)