@clear-capabilities/agentic-security-scanner 0.74.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +1580 -0
- package/bin/.agentic-security/findings.json +1577 -0
- package/bin/.agentic-security/last-scan.json +1577 -0
- package/bin/.agentic-security/last-scan.json.sig +1 -0
- package/bin/.agentic-security/scan-history.json +465 -0
- package/bin/.agentic-security/streak.json +25 -0
- package/bin/agentic-security-audit.js +198 -0
- package/bin/agentic-security-consistency.js +80 -0
- package/bin/agentic-security-diff.js +136 -0
- package/bin/agentic-security-lsp.js +12 -0
- package/bin/agentic-security-mcp.js +40 -0
- package/bin/agentic-security-rule.js +153 -0
- package/bin/agentic-security.js +1683 -0
- package/dist/117.index.js +207 -0
- package/dist/178.index.js +250 -0
- package/dist/218.index.js +793 -0
- package/dist/227.index.js +192 -0
- package/dist/301.index.js +167 -0
- package/dist/384.index.js +18 -0
- package/dist/476.index.js +126 -0
- package/dist/513.index.js +373 -0
- package/dist/520.index.js +13 -0
- package/dist/601.index.js +1038 -0
- package/dist/634.index.js +1892 -0
- package/dist/637.index.js +216 -0
- package/dist/660.index.js +131 -0
- package/dist/675.index.js +451 -0
- package/dist/826.index.js +188 -0
- package/dist/830.index.js +133 -0
- package/dist/agentic-security.mjs +272 -0
- package/dist/agentic-security.mjs.sha256 +1 -0
- package/dist/calibration-seed.json +27 -0
- package/package.json +77 -0
- package/src/.agentic-security/findings.json +80844 -0
- package/src/.agentic-security/last-scan.json +80844 -0
- package/src/.agentic-security/last-scan.json.sig +1 -0
- package/src/.agentic-security/scan-history.json +8408 -0
- package/src/.agentic-security/streak.json +26 -0
- package/src/badge.js +188 -0
- package/src/compare.js +203 -0
- package/src/dataflow/.agentic-security/findings.json +3487 -0
- package/src/dataflow/.agentic-security/last-scan.json +3487 -0
- package/src/dataflow/.agentic-security/last-scan.json.sig +1 -0
- package/src/dataflow/.agentic-security/scan-history.json +735 -0
- package/src/dataflow/.agentic-security/streak.json +24 -0
- package/src/dataflow/CLAUDE.md +38 -0
- package/src/dataflow/access-paths.js +172 -0
- package/src/dataflow/async-sequencing.js +177 -0
- package/src/dataflow/backward.js +201 -0
- package/src/dataflow/catalog-expanded.js +485 -0
- package/src/dataflow/catalog.js +659 -0
- package/src/dataflow/cross-repo.js +219 -0
- package/src/dataflow/engine.js +588 -0
- package/src/dataflow/exception-flow.js +116 -0
- package/src/dataflow/exploit-prover.js +187 -0
- package/src/dataflow/higher-order.js +221 -0
- package/src/dataflow/ifds.js +347 -0
- package/src/dataflow/implicit-flow.js +129 -0
- package/src/dataflow/incremental.js +229 -0
- package/src/dataflow/index.js +181 -0
- package/src/dataflow/numeric-domain.js +192 -0
- package/src/dataflow/path-feasibility.js +114 -0
- package/src/dataflow/points-to.js +337 -0
- package/src/dataflow/polyglot.js +190 -0
- package/src/dataflow/proven-clean.js +159 -0
- package/src/dataflow/receiver-context.js +76 -0
- package/src/dataflow/sanitizer-proof.js +154 -0
- package/src/dataflow/soft-taint.js +140 -0
- package/src/dataflow/string-domain.js +234 -0
- package/src/dataflow/stub-aware-filter.js +100 -0
- package/src/dataflow/summaries.js +132 -0
- package/src/dataflow/symbolic-exec.js +238 -0
- package/src/dataflow/tabulation.js +135 -0
- package/src/engine.js +7763 -0
- package/src/history-scan.js +229 -0
- package/src/index.js +3 -0
- package/src/integrations/.agentic-security/findings.json +1504 -0
- package/src/integrations/.agentic-security/last-scan.json +1504 -0
- package/src/integrations/.agentic-security/scan-history.json +40 -0
- package/src/integrations/.agentic-security/streak.json +21 -0
- package/src/integrations/index.js +321 -0
- package/src/integrations/tickets.js +200 -0
- package/src/ir/.agentic-security/findings.json +3036 -0
- package/src/ir/.agentic-security/last-scan.json +3036 -0
- package/src/ir/.agentic-security/last-scan.json.sig +1 -0
- package/src/ir/.agentic-security/scan-history.json +364 -0
- package/src/ir/.agentic-security/streak.json +23 -0
- package/src/ir/CLAUDE.md +172 -0
- package/src/ir/callgraph.js +73 -0
- package/src/ir/class-hierarchy.js +195 -0
- package/src/ir/index.js +152 -0
- package/src/ir/parser-cs.js +260 -0
- package/src/ir/parser-java.js +286 -0
- package/src/ir/parser-js.js +413 -0
- package/src/ir/parser-kt.js +258 -0
- package/src/ir/parser-py-cst.js +136 -0
- package/src/ir/parser-py.helper.py +501 -0
- package/src/ir/parser-py.js +312 -0
- package/src/ir/ssa.js +315 -0
- package/src/ir/type-stubs.js +288 -0
- package/src/leaderboard.js +152 -0
- package/src/llm-validator/.agentic-security/findings.json +1891 -0
- package/src/llm-validator/.agentic-security/last-scan.json +1891 -0
- package/src/llm-validator/.agentic-security/last-scan.json.sig +1 -0
- package/src/llm-validator/.agentic-security/scan-history.json +168 -0
- package/src/llm-validator/.agentic-security/streak.json +20 -0
- package/src/llm-validator/consistency.js +141 -0
- package/src/llm-validator/index.js +437 -0
- package/src/lsp/.agentic-security/findings.json +28 -0
- package/src/lsp/.agentic-security/last-scan.json +28 -0
- package/src/lsp/.agentic-security/scan-history.json +79 -0
- package/src/lsp/.agentic-security/streak.json +22 -0
- package/src/lsp/server.js +275 -0
- package/src/mcp/.agentic-security/findings.json +8358 -0
- package/src/mcp/.agentic-security/last-scan.json +8358 -0
- package/src/mcp/.agentic-security/last-scan.json.sig +1 -0
- package/src/mcp/.agentic-security/scan-history.json +1125 -0
- package/src/mcp/.agentic-security/streak.json +22 -0
- package/src/mcp/CLAUDE.md +54 -0
- package/src/mcp/audit.js +136 -0
- package/src/mcp/redact.js +75 -0
- package/src/mcp/server.js +158 -0
- package/src/mcp/stdio.js +83 -0
- package/src/mcp/tools.js +940 -0
- package/src/mcp/validate.js +49 -0
- package/src/personality.js +164 -0
- package/src/poc-video.js +239 -0
- package/src/posture/.agentic-security/findings.json +51239 -0
- package/src/posture/.agentic-security/last-scan.json +51239 -0
- package/src/posture/.agentic-security/last-scan.json.sig +1 -0
- package/src/posture/.agentic-security/scan-history.json +5557 -0
- package/src/posture/.agentic-security/streak.json +24 -0
- package/src/posture/CLAUDE.md +42 -0
- package/src/posture/adversarial-self-test.js +114 -0
- package/src/posture/adversary-agent.js +204 -0
- package/src/posture/agents-memory.js +135 -0
- package/src/posture/ai-code-fingerprint.js +171 -0
- package/src/posture/aibom.js +284 -0
- package/src/posture/api-inventory.js +96 -0
- package/src/posture/attack-playbooks.js +305 -0
- package/src/posture/auditor-agent.js +115 -0
- package/src/posture/auth-posture-import.js +135 -0
- package/src/posture/baseline-compare.js +114 -0
- package/src/posture/blast-radius.js +836 -0
- package/src/posture/bounty-prediction.js +141 -0
- package/src/posture/business-logic.js +239 -0
- package/src/posture/calibration-drift.js +93 -0
- package/src/posture/calibration-seed.json +27 -0
- package/src/posture/calibration.js +204 -0
- package/src/posture/clustering.js +75 -0
- package/src/posture/concurrency-checker.js +265 -0
- package/src/posture/confidence.js +65 -0
- package/src/posture/container-runtime.js +149 -0
- package/src/posture/counterfactual.js +109 -0
- package/src/posture/cross-lang-graphql.js +165 -0
- package/src/posture/cross-lang-grpc.js +166 -0
- package/src/posture/cross-lang-meta.js +101 -0
- package/src/posture/cross-lang-openapi.js +187 -0
- package/src/posture/cross-lang-orm.js +153 -0
- package/src/posture/cross-lang-queues.js +210 -0
- package/src/posture/crown-jewels.js +110 -0
- package/src/posture/custom-rules.js +361 -0
- package/src/posture/cve-alert-daemon.js +433 -0
- package/src/posture/cve-lookup.js +129 -0
- package/src/posture/dead-code.js +430 -0
- package/src/posture/defender-agent.js +158 -0
- package/src/posture/deploy-platform.js +204 -0
- package/src/posture/detector-fuzz.js +61 -0
- package/src/posture/deterministic.js +99 -0
- package/src/posture/drift.js +165 -0
- package/src/posture/epss.js +156 -0
- package/src/posture/exploitability-probability.js +212 -0
- package/src/posture/exploitability.js +121 -0
- package/src/posture/feature-flags.js +110 -0
- package/src/posture/finding-defaults.js +132 -0
- package/src/posture/fix-history.js +411 -0
- package/src/posture/fix-plan.js +121 -0
- package/src/posture/fix-verify-loop.js +157 -0
- package/src/posture/fix-verify.js +130 -0
- package/src/posture/flow-narration.js +105 -0
- package/src/posture/grader-calibration.js +156 -0
- package/src/posture/harness-discovery.js +113 -0
- package/src/posture/holdout-eval.js +144 -0
- package/src/posture/iac-reachability.js +163 -0
- package/src/posture/iam-policy.js +128 -0
- package/src/posture/integrity.js +97 -0
- package/src/posture/learning.js +166 -0
- package/src/posture/license-policy.js +109 -0
- package/src/posture/llm-redteam-prompts.js +418 -0
- package/src/posture/llm-redteam.js +303 -0
- package/src/posture/material-change.js +163 -0
- package/src/posture/mitigation-composite.js +55 -0
- package/src/posture/mttr.js +91 -0
- package/src/posture/network-policy-import.js +126 -0
- package/src/posture/path-predicates.js +99 -0
- package/src/posture/persona-prioritization.js +153 -0
- package/src/posture/poc-cwe-map.js +51 -0
- package/src/posture/poc-generator.js +500 -0
- package/src/posture/policy-gate.js +174 -0
- package/src/posture/pre-incident-archaeology.js +110 -0
- package/src/posture/profile.js +93 -0
- package/src/posture/reachability-filter.js +42 -0
- package/src/posture/regression-test-gen.js +200 -0
- package/src/posture/reverse-blast-radius.js +110 -0
- package/src/posture/router.js +109 -0
- package/src/posture/rule-overrides.js +198 -0
- package/src/posture/rule-pack-signing.js +209 -0
- package/src/posture/rule-packs.js +143 -0
- package/src/posture/rule-synthesis.js +108 -0
- package/src/posture/ruleset-version.js +71 -0
- package/src/posture/sbom.js +129 -0
- package/src/posture/schema-aware-bridge.js +207 -0
- package/src/posture/security-trend.js +87 -0
- package/src/posture/semantic-clone.js +114 -0
- package/src/posture/specification-mining.js +170 -0
- package/src/posture/stable-id.js +75 -0
- package/src/posture/stack-playbook.js +229 -0
- package/src/posture/streak.js +249 -0
- package/src/posture/suppressions.js +135 -0
- package/src/posture/telemetry-ingest.js +112 -0
- package/src/posture/threat-model.js +145 -0
- package/src/posture/three-agent-pipeline.js +74 -0
- package/src/posture/triage.js +146 -0
- package/src/posture/trust-boundary-diagram.js +115 -0
- package/src/posture/type-narrowing.js +129 -0
- package/src/posture/validator-metrics.js +179 -0
- package/src/posture/verifier-ephemeral.js +118 -0
- package/src/posture/verifier-target.js +147 -0
- package/src/posture/verifier.js +257 -0
- package/src/posture/version.js +75 -0
- package/src/posture/waf-ingest.js +200 -0
- package/src/posture/why-fired.js +141 -0
- package/src/pr-comment.js +172 -0
- package/src/pr-delta.js +198 -0
- package/src/report/.agentic-security/findings.json +79 -0
- package/src/report/.agentic-security/last-scan.json +79 -0
- package/src/report/.agentic-security/last-scan.json.sig +1 -0
- package/src/report/.agentic-security/scan-history.json +332 -0
- package/src/report/.agentic-security/streak.json +23 -0
- package/src/report/index.js +1136 -0
- package/src/report/mascot.js +42 -0
- package/src/runScan.js +141 -0
- package/src/sast/.agentic-security/findings.json +5051 -0
- package/src/sast/.agentic-security/last-scan.json +5051 -0
- package/src/sast/.agentic-security/last-scan.json.sig +1 -0
- package/src/sast/.agentic-security/scan-history.json +788 -0
- package/src/sast/.agentic-security/streak.json +23 -0
- package/src/sast/CLAUDE.md +39 -0
- package/src/sast/_comment-strip.js +46 -0
- package/src/sast/agent-tool-escalation.js +131 -0
- package/src/sast/auth-provider.js +171 -0
- package/src/sast/authz.js +236 -0
- package/src/sast/bench-shape/.agentic-security/findings.json +28 -0
- package/src/sast/bench-shape/.agentic-security/last-scan.json +28 -0
- package/src/sast/bench-shape/.agentic-security/scan-history.json +24 -0
- package/src/sast/bench-shape/.agentic-security/streak.json +22 -0
- package/src/sast/bench-shape/index.js +62 -0
- package/src/sast/claude-hook-injection.js +199 -0
- package/src/sast/claude-md-prompt-injection.js +170 -0
- package/src/sast/claude-settings.js +165 -0
- package/src/sast/client-side.js +149 -0
- package/src/sast/cpp-bench-extras.js +122 -0
- package/src/sast/cpp-dataflow.js +430 -0
- package/src/sast/cpp.js +248 -0
- package/src/sast/csharp.js +152 -0
- package/src/sast/csrf.js +82 -0
- package/src/sast/dart-flutter.js +173 -0
- package/src/sast/db-rls.js +147 -0
- package/src/sast/db-taint.js +215 -0
- package/src/sast/defi-deep.js +242 -0
- package/src/sast/deserialization-gadgets.js +113 -0
- package/src/sast/django-hardening.js +230 -0
- package/src/sast/env-hygiene.js +125 -0
- package/src/sast/fastapi-hardening.js +145 -0
- package/src/sast/go-extended.js +84 -0
- package/src/sast/host-header.js +106 -0
- package/src/sast/index.js +17 -0
- package/src/sast/java-ast-folding.js +561 -0
- package/src/sast/java-bench-extras.js +708 -0
- package/src/sast/java-collection-passthrough.js +178 -0
- package/src/sast/java-constant-fold.js +244 -0
- package/src/sast/java-deserialization.js +125 -0
- package/src/sast/jndi.js +104 -0
- package/src/sast/juliet-shape.js +324 -0
- package/src/sast/jwt-exp.js +104 -0
- package/src/sast/kotlin.js +82 -0
- package/src/sast/laravel-hardening.js +198 -0
- package/src/sast/ldap-injection.js +100 -0
- package/src/sast/llm-owasp.js +465 -0
- package/src/sast/llm-stored-prompt.js +103 -0
- package/src/sast/llm-trading-agent.js +161 -0
- package/src/sast/llm.js +308 -0
- package/src/sast/logic.js +140 -0
- package/src/sast/mass-assignment.js +101 -0
- package/src/sast/mcp-audit.js +242 -0
- package/src/sast/mobile-manifest.js +195 -0
- package/src/sast/model-load.js +164 -0
- package/src/sast/mutation-xss.js +87 -0
- package/src/sast/nosql-injection.js +82 -0
- package/src/sast/open-redirect.js +119 -0
- package/src/sast/php.js +91 -0
- package/src/sast/pipeline.js +122 -0
- package/src/sast/primary-cwe-java.js +155 -0
- package/src/sast/prompt-firewall.js +151 -0
- package/src/sast/prompt-template.js +157 -0
- package/src/sast/prototype-pollution.js +112 -0
- package/src/sast/python-sinks.js +195 -0
- package/src/sast/quarkus-hardening.js +102 -0
- package/src/sast/rag-poisoning.js +118 -0
- package/src/sast/rate-limit.js +128 -0
- package/src/sast/response-splitting.js +138 -0
- package/src/sast/ruby.js +108 -0
- package/src/sast/rust.js +105 -0
- package/src/sast/solidity.js +167 -0
- package/src/sast/springboot-hardening.js +186 -0
- package/src/sast/ssrf-cloud-metadata.js +80 -0
- package/src/sast/ssti.js +116 -0
- package/src/sast/swift.js +162 -0
- package/src/sast/toctou.js +95 -0
- package/src/sast/webhook.js +101 -0
- package/src/sast/xpath-injection.js +51 -0
- package/src/sast/xxe.js +140 -0
- package/src/sast/zip-slip.js +200 -0
- package/src/sca/base-images.json +45 -0
- package/src/sca/container.js +107 -0
- package/src/sca/dep-confusion.js +134 -0
- package/src/sca/index.js +6 -0
- package/src/sca/popular-packages.json +41 -0
- package/src/sca/sarif-ingest.js +187 -0
- package/src/sca/vuln-function-hints.json +89 -0
- package/src/secrets/index.js +4 -0
package/CHANGELOG.md
ADDED
|
@@ -0,0 +1,1580 @@
|
|
|
1
|
+
# Changelog
|
|
2
|
+
|
|
3
|
+
## 0.74.0 — viral surface: PoC video gen + security-tutor skill + personality voices + compare runner
|
|
4
|
+
|
|
5
|
+
Four shareability lifts.
|
|
6
|
+
|
|
7
|
+
### Auto-recorded PoC scripts — `scanner/src/poc-video.js`
|
|
8
|
+
For findings with `_exploitInput` (v0.71 symbolic prover), generate a
|
|
9
|
+
self-contained script the operator runs against their own staging URL:
|
|
10
|
+
- **playwright**: TypeScript test that drives the exploit live + records video. Default for UI-driven exploits.
|
|
11
|
+
- **curl**: bash script with verbose tracing + payload-acceptance assertion. Default for backend exploits.
|
|
12
|
+
- **http**: RFC 7230-style raw request pastable into Postman/Insomnia.
|
|
13
|
+
|
|
14
|
+
The generator does NOT execute anything; produces share-grade evidence the operator runs against their OWN environment.
|
|
15
|
+
|
|
16
|
+
### Educational mode skill — `skills/security-tutor/SKILL.md`
|
|
17
|
+
Auto-activates when the user asks "why is X dangerous", references a finding-id and asks for context, or has mechanically accepted ≥3 fixes in a row. Walks the finding Socratically: identify source/sink/sanitizer, ask user to propose the payload BEFORE showing the fix, verify understanding with follow-up traps. CWE-specific Socratic patterns table covers 8 families.
|
|
18
|
+
|
|
19
|
+
### Security personality voices — `scanner/src/personality.js`
|
|
20
|
+
Three tone modes wrapping any rendered report: **sage** (calm, default), **cassandra** (alarmist), **vince** (drill-sergeant). Same findings, dramatically different shareability. `AGENTIC_SECURITY_PERSONALITY` env selects. Only the framing changes — technical content stays identical.
|
|
21
|
+
|
|
22
|
+
### Compare runner framework — `scanner/src/compare.js`
|
|
23
|
+
Bring-your-own-tool side-by-side comparison. User supplies the other tool's invocation + field map; we render a Markdown card with overlap / unique / severity-disagreement sections. Framework is generic — no competitor-specific adapters shipped.
|
|
24
|
+
|
|
25
|
+
### Test totals
|
|
26
|
+
**847 scanner tests pass / 0 fail** (up from 832).
|
|
27
|
+
|
|
28
|
+
## 0.73.0 — technical depth: IFDS summary edges + type-stub filter + cross-repo federation
|
|
29
|
+
|
|
30
|
+
Three technical-depth lifts. v0.71 shipped IFDS scaffolding with bottom
|
|
31
|
+
summaries; v0.70 added type-stubs but didn't thread them into the
|
|
32
|
+
engine; v0.68 added cross-lang within a single repo but not cross-repo.
|
|
33
|
+
v0.73 closes all three loops.
|
|
34
|
+
|
|
35
|
+
### IFDS full summary edges — `scanner/src/dataflow/ifds.js`
|
|
36
|
+
|
|
37
|
+
The v0.71 IFDS solver used bottom summaries (every callee was assumed
|
|
38
|
+
clean → no interprocedural facts flowed). v0.73 adds:
|
|
39
|
+
- `summaries: Map<qid|entryFact, Set<exitFact>>` records per-function
|
|
40
|
+
summary edges
|
|
41
|
+
- `pendingReturns: Map<qid|entryFact, [{fn,returnNode,callerEntry}]>`
|
|
42
|
+
registers callers waiting on more summary facts
|
|
43
|
+
- `_entryFactForCall(callNode, currentFact, callee)` derives callee's
|
|
44
|
+
entry fact from a call site
|
|
45
|
+
- `_mapReturnFact(callNode, exitFact, callerCurrent)` translates exit
|
|
46
|
+
facts back into caller namespace
|
|
47
|
+
- Summary reuse: second call to same (callee, entry fact) is O(1)
|
|
48
|
+
|
|
49
|
+
This is what makes IFDS polynomial in practice rather than re-solving
|
|
50
|
+
every call site.
|
|
51
|
+
|
|
52
|
+
### Type-stub-aware filter — `scanner/src/dataflow/stub-aware-filter.js`
|
|
53
|
+
|
|
54
|
+
Post-pass after the taint engine. Consults the project's TS/.pyi/JAR
|
|
55
|
+
type stubs (loaded by v0.70's `ir/type-stubs.js`) and demotes findings
|
|
56
|
+
whose source type cannot carry the vulnerability metacharacters:
|
|
57
|
+
|
|
58
|
+
| Family | CWE | Safe types (demoted) |
|
|
59
|
+
|--------|-----|----------------------|
|
|
60
|
+
| XSS | CWE-79 | number, boolean, Date, RegExp, bigint |
|
|
61
|
+
| SQLi | CWE-89 | number, boolean, Date, bigint |
|
|
62
|
+
| Cmd | CWE-78 | number, boolean, bigint |
|
|
63
|
+
| Path | CWE-22 | number, boolean |
|
|
64
|
+
| SSRF | CWE-918 | number, boolean |
|
|
65
|
+
|
|
66
|
+
Severity drops one tier (critical → high → medium → low → info); never
|
|
67
|
+
drops the finding. Operator sees `_stubTypeDemoted: true` + reason.
|
|
68
|
+
|
|
69
|
+
Gate: `AGENTIC_SECURITY_TYPE_STUBS=1` (same flag as the v0.70 stub
|
|
70
|
+
loader).
|
|
71
|
+
|
|
72
|
+
### Cross-repo federation — `scanner/src/dataflow/cross-repo.js`
|
|
73
|
+
|
|
74
|
+
The intra-repo `cross-lang-openapi.js` posture module shipped in v0.66
|
|
75
|
+
ties a single repo's client call to its server route. v0.73 ships the
|
|
76
|
+
inter-repo lift: `buildFederatedGraph(specs)` walks a SET of OpenAPI
|
|
77
|
+
specs from different repos, finds shared `(method, path)` endpoints
|
|
78
|
+
with overlapping field schemas, and emits federated edges. Each edge
|
|
79
|
+
becomes a `CROSS-REPO` finding (`CWE-829`, `family: cross-repo-taint`)
|
|
80
|
+
showing both repos + the shared fields in the trace.
|
|
81
|
+
|
|
82
|
+
Use case: scan the auth-service repo + the billing-service repo
|
|
83
|
+
together; the scanner detects that `/users/{id}` is published by auth
|
|
84
|
+
and consumed by billing, with shared fields `email + bio`. A taint in
|
|
85
|
+
auth's response surfaces in billing's input — both teams now own the
|
|
86
|
+
sanitization contract.
|
|
87
|
+
|
|
88
|
+
### Test totals
|
|
89
|
+
**832 scanner tests pass / 0 fail** (up from 811).
|
|
90
|
+
|
|
91
|
+
## 0.72.1 — CI template + README adopts the v0.72 viral features
|
|
92
|
+
|
|
93
|
+
Patch release. Two adoption follow-ups for v0.72's viral features.
|
|
94
|
+
|
|
95
|
+
### CI template defaults to advisor-tone PR comment
|
|
96
|
+
|
|
97
|
+
`.github/workflows/scan.yml` — new `pr-comment-mode` input (default
|
|
98
|
+
`"advisor"`, alternative `"findings-table"`):
|
|
99
|
+
|
|
100
|
+
- **advisor** (new default): runs `pr-delta --base origin/<base_ref>` to
|
|
101
|
+
compute the security DELTA between PR and base, then pipes the JSON
|
|
102
|
+
into `pr-comment` to render the security-advisor's note. The comment
|
|
103
|
+
shows only what THIS PR introduced/resolved, with CWE narrative + fix
|
|
104
|
+
snippet + blocking-merge footer.
|
|
105
|
+
- **findings-table** (legacy): the prior critical/high count table.
|
|
106
|
+
Available behind the input flag for adopters who prefer it.
|
|
107
|
+
|
|
108
|
+
Downstream consumers automatically get the new comment style on next CI
|
|
109
|
+
run. Opt back to the legacy table by passing `pr-comment-mode: findings-table`
|
|
110
|
+
to the reusable workflow.
|
|
111
|
+
|
|
112
|
+
### README adopts the status badge + leaderboard pitch
|
|
113
|
+
|
|
114
|
+
`README.md`:
|
|
115
|
+
- Stale `version-0.64.0` badge bumped to `version-0.72.1`.
|
|
116
|
+
- New badge row entry: `[]()`.
|
|
117
|
+
- New "Status badge for your README" section with paste-ready Markdown,
|
|
118
|
+
three example states (passing / high / critical), and self-host
|
|
119
|
+
instructions for users who don't want to depend on `agentic-security.dev`.
|
|
120
|
+
- New "Public leaderboard (preview)" section pointing at the v0.72
|
|
121
|
+
`leaderboard-row` backend.
|
|
122
|
+
|
|
123
|
+
### Test totals
|
|
124
|
+
**811 scanner tests pass / 0 fail** (unchanged from 0.72.0).
|
|
125
|
+
|
|
126
|
+
## 0.72.0 — viral features: shadowscan delta + advisor-tone PR comment + live badge + leaderboard backend
|
|
127
|
+
|
|
128
|
+
Three viral-lever features built to compound: every PR generates a
|
|
129
|
+
screenshotable advisor's note (not a wall of findings), every repo can
|
|
130
|
+
wear a live security badge (pull-marketing), and every scan's data shape
|
|
131
|
+
is ready for a public leaderboard.
|
|
132
|
+
|
|
133
|
+
### #5 Shadowscan / security-DELTA on PR — `scanner/src/pr-delta.js`
|
|
134
|
+
|
|
135
|
+
`computePrDelta(root, { baseRef, headRef })` scans both refs in-memory
|
|
136
|
+
(no checkout, via `git show <ref>:<path>`), diffs by `stableId`, and
|
|
137
|
+
emits:
|
|
138
|
+
- `introduced` — findings in head not in base
|
|
139
|
+
- `resolved` — findings in base not in head
|
|
140
|
+
- `persistent` — same stableId both sides
|
|
141
|
+
- `shifted` — same stableId but severity or CWE changed
|
|
142
|
+
- `summary.net` — per-severity head − base delta
|
|
143
|
+
|
|
144
|
+
New CLI:
|
|
145
|
+
```
|
|
146
|
+
agentic-security pr-delta --base origin/main [--head HEAD] [--json]
|
|
147
|
+
[--fail-on-introduced]
|
|
148
|
+
```
|
|
149
|
+
|
|
150
|
+
### #1 Advisor-tone PR comment — `scanner/src/pr-comment.js`
|
|
151
|
+
|
|
152
|
+
`renderPrComment(delta, { repoName, prNumber, prTitle })` produces a
|
|
153
|
+
single Markdown comment that reads like a person, not a table. Three
|
|
154
|
+
auto-detected modes:
|
|
155
|
+
- **clean** (no delta) → "Safe to merge."
|
|
156
|
+
- **resolves-only** → "This PR resolves N finding(s)... Nice cleanup."
|
|
157
|
+
- **needs-work** → narrative + per-finding paragraph with CWE 'why'
|
|
158
|
+
text + remediation snippet + blocking-merge footer for critical/high.
|
|
159
|
+
|
|
160
|
+
CWE narrative table covers 19 families with one-sentence "why does this
|
|
161
|
+
matter" explanations. The mode is what gets **screenshotted** — security
|
|
162
|
+
tool output that reads like an advisor, not a SARIF dump.
|
|
163
|
+
|
|
164
|
+
New CLI:
|
|
165
|
+
```
|
|
166
|
+
agentic-security pr-comment [--in delta.json | --base <ref>]
|
|
167
|
+
[--repo <slug>] [--pr <n>] [--title <text>]
|
|
168
|
+
# Reads JSON delta from --in, --base (recomputes), or stdin.
|
|
169
|
+
```
|
|
170
|
+
|
|
171
|
+
### #2 Live SVG badge — `scanner/src/badge.js`
|
|
172
|
+
|
|
173
|
+
`renderBadge({ format, style, scanRoot, scan })` emits a shields.io-style
|
|
174
|
+
SVG (or JSON for frontend renderers) summarizing the latest scan:
|
|
175
|
+
`agentic-security: crit 0 · high 2 · med 5 · 4h ago`. Color driven by
|
|
176
|
+
highest non-zero severity. Two styles: `flat` (default) + `for-the-badge`.
|
|
177
|
+
|
|
178
|
+
New CLI:
|
|
179
|
+
```
|
|
180
|
+
agentic-security badge [--format svg|json] [--style flat|for-the-badge]
|
|
181
|
+
```
|
|
182
|
+
|
|
183
|
+
Reads from `.agentic-security/last-scan.json`. The badge is intended as
|
|
184
|
+
a README ornament that doubles as pull-marketing — every adopting repo
|
|
185
|
+
becomes a billboard.
|
|
186
|
+
|
|
187
|
+
### Leaderboard backend — `scanner/src/leaderboard.js`
|
|
188
|
+
|
|
189
|
+
`leaderboardRowFor({ scanRoot, repo })` builds one row of the future
|
|
190
|
+
public leaderboard data: posture grade A-F, severity counts, top CWE,
|
|
191
|
+
last-scan age, delta trend (`improving`/`flat`/`regressing` from
|
|
192
|
+
`scan-history.jsonl` if present), and the badge URL/Markdown snippet
|
|
193
|
+
ready to paste. `rankRows(rows)` sorts by critical → high → grade.
|
|
194
|
+
|
|
195
|
+
Public hosting of `agentic-security.dev/leaderboard` is deferred — this
|
|
196
|
+
release ships the data side so the future site is a thin frontend.
|
|
197
|
+
|
|
198
|
+
New CLI:
|
|
199
|
+
```
|
|
200
|
+
agentic-security leaderboard-row --repo owner/name [--root <dir>]
|
|
201
|
+
```
|
|
202
|
+
|
|
203
|
+
### Test totals
|
|
204
|
+
**811 scanner tests pass / 0 fail** (up from 792).
|
|
205
|
+
|
|
206
|
+
### Migration
|
|
207
|
+
All four features are additive opt-in CLI subcommands. CI templates can
|
|
208
|
+
adopt `pr-delta | pr-comment` to replace findings-dump comments without
|
|
209
|
+
breaking the existing scan-and-comment flow. README badge adoption is
|
|
210
|
+
manual (paste a Markdown snippet).
|
|
211
|
+
|
|
212
|
+
## 0.71.1 — dependency hygiene + CodeQL ignore-list for scanner/
|
|
213
|
+
|
|
214
|
+
Patch release. No behavior change.
|
|
215
|
+
|
|
216
|
+
### Dependency bumps
|
|
217
|
+
- `@types/node`: `^20.0.0` → `^24.0.0` (scanner + vscode). Node 20 reached
|
|
218
|
+
EOL in 2026-04; tracking the current LTS.
|
|
219
|
+
- `scanner/package.json` `engines.node`: `>=20.0.0` → `>=22.0.0`.
|
|
220
|
+
- `vscode/package.json` `@types/vscode` + `engines.vscode`: `^1.85.0` →
|
|
221
|
+
`^1.95.0` (the engine pair stays consistent so VSCE doesn't warn).
|
|
222
|
+
|
|
223
|
+
Other deps already current and unchanged: `@babel/*` 7.x, `@vercel/ncc`
|
|
224
|
+
0.38.x, `js-yaml` 4.x, `safe-regex` 2.x, `fast-glob` 3.x, `esbuild` 0.25.x,
|
|
225
|
+
`@vscode/vsce` 3.x. GitHub Actions in workflows already on v5/v8.
|
|
226
|
+
|
|
227
|
+
### CodeQL ignore-list
|
|
228
|
+
|
|
229
|
+
The scanner directory contains the taint engine itself — full of SAST
|
|
230
|
+
patterns, hardcoded fixture credentials, eval() shapes, raw SQL strings.
|
|
231
|
+
Any other SAST (including GitHub CodeQL) flags these as vulnerabilities,
|
|
232
|
+
producing noise that drowns out real findings.
|
|
233
|
+
|
|
234
|
+
Two new files:
|
|
235
|
+
- `.github/codeql/codeql-config.yml` — 15-entry `paths-ignore` covering
|
|
236
|
+
`scanner/**`, `bench/**`, `vscode/dist/**`, all test fixtures, the
|
|
237
|
+
`.bench-cache/**` tree, and generated bundles.
|
|
238
|
+
- `.github/workflows/codeql.yml` — advanced-setup CodeQL workflow on
|
|
239
|
+
push/PR + weekly cron, references the config above. Uses
|
|
240
|
+
`security-extended` query suite.
|
|
241
|
+
|
|
242
|
+
**To activate**: switch the repo from default to advanced code-scanning
|
|
243
|
+
setup at Settings → Code security → Code scanning → Set up → Advanced.
|
|
244
|
+
The workflow will then run and honor the paths-ignore list.
|
|
245
|
+
|
|
246
|
+
### Test totals
|
|
247
|
+
**792 scanner tests pass / 0 fail** (unchanged from 0.71.0).
|
|
248
|
+
|
|
249
|
+
## 0.71.0 — taint engine frontier release (final 2 of 10 — IFDS + symbolic exploit proofs)
|
|
250
|
+
|
|
251
|
+
Third and final release in the v0.69 → v0.71 taint-engine arc. v0.71
|
|
252
|
+
ships the two heaviest items: IFDS tabulation as an alternative
|
|
253
|
+
context-sensitive analyzer, and a symbolic-execution post-pass that
|
|
254
|
+
generates concrete attacker payloads + proves infeasibility.
|
|
255
|
+
|
|
256
|
+
### #3 IFDS / IDE tabulation — `scanner/src/dataflow/ifds.js`
|
|
257
|
+
|
|
258
|
+
Implementation of Reps-Horwitz-Sagiv "Precise interprocedural dataflow
|
|
259
|
+
analysis via graph reachability" (POPL 1995). Runs as an ALTERNATIVE
|
|
260
|
+
analyzer that augments the existing k=2 worklist when
|
|
261
|
+
`AGENTIC_SECURITY_IFDS=1` — its findings are merged with the worklist
|
|
262
|
+
output, deduped by `(file, line, sinkId)`.
|
|
263
|
+
|
|
264
|
+
Components:
|
|
265
|
+
- `IFDSSolver` class: path-edge worklist over the exploded supergraph
|
|
266
|
+
- `_flowAssign`: distributive transfer function (copy / kill / source-gen)
|
|
267
|
+
- `_detectSinkAtCall`: catalog-driven sink matching at each call node
|
|
268
|
+
- Budget: `AGENTIC_SECURITY_IFDS_BUDGET_FACTS=10000` (default) caps the
|
|
269
|
+
edge count; the solver returns partial findings + `_ifdsStats.capped: true`
|
|
270
|
+
|
|
271
|
+
What v1 supports: intraprocedural flow + the IFDS framework scaffolding.
|
|
272
|
+
Full call-graph summary edges are stubbed (the path-edge worklist
|
|
273
|
+
demonstrates the framework; production-quality summary caching arrives
|
|
274
|
+
in v0.72). The merge-with-worklist design means the existing engine
|
|
275
|
+
keeps producing findings; IFDS adds context-sensitive flows the k=2
|
|
276
|
+
cache joined out.
|
|
277
|
+
|
|
278
|
+
### #9 Symbolic exploit prover — `scanner/src/dataflow/exploit-prover.js`
|
|
279
|
+
|
|
280
|
+
Post-pass that runs after `runTaintEngine`. For each finding:
|
|
281
|
+
|
|
282
|
+
**Step 1 — Infeasibility check** via SMT-lite (homegrown, ~150 LOC).
|
|
283
|
+
Walks the finding's `trace + chain` for sanitizer-output regexes that
|
|
284
|
+
exclude the family's required metacharacters. If the path passes
|
|
285
|
+
through e.g. `htmlspecialchars` for an XSS finding, the metachars
|
|
286
|
+
`<`, `>`, `"`, `'` are excluded → `_provenUnreachable: true`, severity
|
|
287
|
+
demoted to LOW.
|
|
288
|
+
|
|
289
|
+
**Step 2 — Exploit input synthesis.** For feasible findings, attaches
|
|
290
|
+
`f._exploitInput` with the family's canonical payload. 16 families
|
|
291
|
+
covered including SQLi (`1' OR '1'='1`), XSS (`<script>alert(1)</script>`),
|
|
292
|
+
cmd-inj, path-traversal, SSRF, deserialization, XXE, SSTI, LDAP/XPath
|
|
293
|
+
injection, open redirect, response splitting, ReDoS, CSRF, prototype
|
|
294
|
+
pollution, and prompt injection.
|
|
295
|
+
|
|
296
|
+
**Optional Z3 backend.** When `AGENTIC_SECURITY_SYMEXEC_Z3=1` AND the
|
|
297
|
+
customer has installed `z3-solver`, the prover uses real SMT for the
|
|
298
|
+
infeasibility check. Default install never bundles Z3 — the SMT-lite
|
|
299
|
+
fallback handles every query we issue today. Activation:
|
|
300
|
+
`AGENTIC_SECURITY_SYMEXEC=1` (lite); add `AGENTIC_SECURITY_SYMEXEC_Z3=1`
|
|
301
|
+
for the Z3 path.
|
|
302
|
+
|
|
303
|
+
### Test totals
|
|
304
|
+
**792 scanner tests pass / 0 fail** (up from 773 in v0.70).
|
|
305
|
+
Dataflow: 215 tests (up from 196).
|
|
306
|
+
|
|
307
|
+
### Migration
|
|
308
|
+
Both items opt-in via env flag. No existing behavior changes. With both
|
|
309
|
+
v0.71 items active + the v0.69+v0.70 stack on opt-in, the engine's
|
|
310
|
+
precision ceiling rises substantially — full default-on cutover after
|
|
311
|
+
two consecutive nightly CVE-replay runs show F1 delta ≥ +1pp without
|
|
312
|
+
precision drop >1pp.
|
|
313
|
+
|
|
314
|
+
### 10-item taint-engine arc complete
|
|
315
|
+
|
|
316
|
+
v0.69 → v0.71 has shipped all 10 items:
|
|
317
|
+
|
|
318
|
+
| # | Item | Module | Release |
|
|
319
|
+
|---|------|--------|---------|
|
|
320
|
+
| 1 | Backward slicing | `dataflow/backward.js` | v0.69 |
|
|
321
|
+
| 2 | Steensgaard alias | `dataflow/points-to.js` | v0.70 |
|
|
322
|
+
| 3 | IFDS tabulation | `dataflow/ifds.js` | v0.71 |
|
|
323
|
+
| 4 | String regex lattice | `dataflow/string-domain.js` | v0.69 |
|
|
324
|
+
| 5 | Incremental cache | `dataflow/incremental.js` | v0.69 |
|
|
325
|
+
| 6 | Probabilistic taint | `dataflow/soft-taint.js` | v0.70 |
|
|
326
|
+
| 7 | Type-stubs | `ir/type-stubs.js` | v0.70 |
|
|
327
|
+
| 8 | Capture-set | `dataflow/higher-order.js` | v0.69 |
|
|
328
|
+
| 9 | Symbolic exploit proof | `dataflow/exploit-prover.js` | v0.71 |
|
|
329
|
+
|10 | DB-aware taint | `sast/db-taint.js` | v0.70 |
|
|
330
|
+
|
|
331
|
+
## 0.70.0 — taint engine foundations release (4 more of 10 leap items)
|
|
332
|
+
|
|
333
|
+
Second of three releases (v0.69 / v0.70 / v0.71). v0.70 adds the
|
|
334
|
+
"needs new theory" capabilities — aliasing, type inference, soft taint,
|
|
335
|
+
and DB round-trip flow. These are the foundations that lift the
|
|
336
|
+
intra-procedural lattice; v0.71 will swap in IFDS + symbolic exec on
|
|
337
|
+
top.
|
|
338
|
+
|
|
339
|
+
### #2 Steensgaard points-to / alias analysis — `scanner/src/dataflow/points-to.js`
|
|
340
|
+
Unification-based, near-linear alias analysis. Walks every assign/call
|
|
341
|
+
across the function set, unifying classes for direct copies + field
|
|
342
|
+
store/load operations. Interprocedural step at resolved call sites
|
|
343
|
+
unifies caller args with callee params. The engine consumes the graph
|
|
344
|
+
via `_addPathAliasAware`: when a tainted target is added to state, all
|
|
345
|
+
aliases of the root variable are tainted too. Closes the
|
|
346
|
+
`let a = obj; a.x = tainted; sink(obj.x)` FN class.
|
|
347
|
+
Opt-in via `AGENTIC_SECURITY_POINTS_TO=1`.
|
|
348
|
+
|
|
349
|
+
### #7 Type-stub integration — `scanner/src/ir/type-stubs.js`
|
|
350
|
+
Parses TypeScript `.d.ts` under `node_modules/@types/**`, Python `.pyi`
|
|
351
|
+
at project root. Outputs `{signatures, types, frameworks, fingerprint}`.
|
|
352
|
+
Cache under `$XDG_CONFIG_HOME/agentic-security/stub-cache/` keyed by
|
|
353
|
+
package-lock + package.json fingerprint. Budget gate via
|
|
354
|
+
`AGENTIC_SECURITY_TYPE_STUBS_BUDGET_MS` (default 10s).
|
|
355
|
+
Opt-in via `AGENTIC_SECURITY_TYPE_STUBS=1`.
|
|
356
|
+
|
|
357
|
+
### #6 Probabilistic / soft taint — `scanner/src/dataflow/soft-taint.js`
|
|
358
|
+
Post-pass over IR-TAINT findings: walks `trace + chain + pathSteps`,
|
|
359
|
+
multiplies (1 − sanitizer-effectiveness) across each call. 22-entry
|
|
360
|
+
default-effectiveness table (DOMPurify=0.98, parameterize=1.0,
|
|
361
|
+
trim=0.05, etc.) — overrideable per catalog entry via
|
|
362
|
+
`sanitizerEffectiveness` field. Findings below
|
|
363
|
+
`AGENTIC_SECURITY_SOFT_TAINT_THRESHOLD` (default 0.5) get severity
|
|
364
|
+
demoted (critical→high→medium→low→info) but are NEVER dropped —
|
|
365
|
+
auditors see the demotion + the sanitizer that earned it.
|
|
366
|
+
Opt-in via `AGENTIC_SECURITY_SOFT_TAINT=1`.
|
|
367
|
+
|
|
368
|
+
### #10 Database-aware taint — `scanner/src/sast/db-taint.js`
|
|
369
|
+
Recognizes ORM write/read pairs across Sequelize / Prisma / TypeORM /
|
|
370
|
+
Mongoose / Django ORM / SQLAlchemy. When `req.body.X` is written to
|
|
371
|
+
`Model.field` then later read and rendered, emits a stored-XSS
|
|
372
|
+
finding with a 2-step trace pointing at both the write and read sites.
|
|
373
|
+
Handles indirection (`const u = await Model.findOne(...); res.send(u.bio)`)
|
|
374
|
+
and direct chains (`res.send(Model.findOne(...).bio)`).
|
|
375
|
+
Fires automatically — already gated by ORM context heuristic.
|
|
376
|
+
|
|
377
|
+
### Test totals
|
|
378
|
+
**773 scanner tests pass / 0 fail** (up from 736 in v0.69).
|
|
379
|
+
Dataflow: 196 tests (up from 188).
|
|
380
|
+
|
|
381
|
+
### Migration
|
|
382
|
+
All four items are additive. v0.69's items remain opt-in this release;
|
|
383
|
+
v0.71 will flip the v0.69 set to default-on if CVE-replay shows F1
|
|
384
|
+
delta ≥ +1pp without precision drop >1pp across two consecutive runs.
|
|
385
|
+
|
|
386
|
+
## 0.69.0 — taint engine wire-up release (4 of 10 leap items)
|
|
387
|
+
|
|
388
|
+
First of three releases (v0.69 / v0.70 / v0.71) that lift the taint
|
|
389
|
+
engine toward academic state-of-the-art. v0.69 ships items that wire
|
|
390
|
+
already-built infrastructure into the engine's main path — minimum new
|
|
391
|
+
code, maximum precision gain.
|
|
392
|
+
|
|
393
|
+
### #1 Backward slicing — `scanner/src/dataflow/backward.js`
|
|
394
|
+
Already-implemented backward slicer gets a walltime budget
|
|
395
|
+
(`AGENTIC_SECURITY_BACKWARD_SLICE_BUDGET_MS`, default 30s) and emits
|
|
396
|
+
`_annotateBackwardSlicesStats` { annotated, skipped, exhausted } on the
|
|
397
|
+
findings array. Each finding gets `f.backwardSlice: [...]` ordered
|
|
398
|
+
source→sink and `f.pathSteps` merged with the existing trace.
|
|
399
|
+
Opt-in via `AGENTIC_SECURITY_BACKWARD_SLICE=1`; flips default in v0.70.
|
|
400
|
+
|
|
401
|
+
### #5 Cross-scan incremental cache — `scanner/src/dataflow/incremental.js`
|
|
402
|
+
Already-implemented persistence layer (`readIncrementalState`,
|
|
403
|
+
`seedSummaryCache`, `serializeSummaries`, `commitIncrementalState`) gets
|
|
404
|
+
wired into `runDeepAnalysis`. State lives in
|
|
405
|
+
`<scanRoot>/.agentic-security/incremental/{version,files,summaries}.json`.
|
|
406
|
+
Diff via file SHA-256, reverse call-graph for transitive invalidation,
|
|
407
|
+
version-pinned by `(scanner, catalog-size)`. On hit: ≥70% summary reuse
|
|
408
|
+
on re-scans; identical findings.
|
|
409
|
+
Opt-in via `AGENTIC_SECURITY_INCREMENTAL=1`; flips default in v0.70.
|
|
410
|
+
|
|
411
|
+
### #4a String regex lattice — `scanner/src/dataflow/string-domain.js`
|
|
412
|
+
New `{kind: 'Regex', pattern}` lattice value alongside Const/Concat/Unknown.
|
|
413
|
+
`abstract()` recognizes sanitizer-output regexes for `encodeURIComponent`,
|
|
414
|
+
`encodeURI`, `parseInt`, `parseFloat`, `hashSync`, `digest`, `toString`,
|
|
415
|
+
`htmlspecialchars`. New `provablyMatches(absVal, safe)` proves an
|
|
416
|
+
abstract value fits a safe-charset regex — used by `sanitizer-proof.js`
|
|
417
|
+
to elevate findings to `provenClean` for non-SQL classes.
|
|
418
|
+
Opt-in via `AGENTIC_SECURITY_STRING_DOMAIN=1`; flips default in v0.70.
|
|
419
|
+
|
|
420
|
+
### #8a Closure capture-set analysis — `scanner/src/dataflow/higher-order.js`
|
|
421
|
+
New `capturedFreeVars(node, boundNames)` walker + `callbackCaptureSet(cb)`.
|
|
422
|
+
Extracts free variables from inline arrow/function-value bodies,
|
|
423
|
+
handling nested closures and shadowing correctly. The motivating
|
|
424
|
+
example `let t = req.query.x; arr.map(i => exec(t))` correctly
|
|
425
|
+
identifies `t` as captured.
|
|
426
|
+
Engine wiring (consume the capture set at call sites) waits for
|
|
427
|
+
v0.70's alias analysis; the extractor + tests ship now.
|
|
428
|
+
Opt-in via `AGENTIC_SECURITY_CLOSURE_CAPTURE=1`.
|
|
429
|
+
|
|
430
|
+
### Test totals
|
|
431
|
+
**736 scanner tests pass / 0 fail** (up from 698 in v0.68).
|
|
432
|
+
Dataflow scope: 188 tests (up from 130).
|
|
433
|
+
|
|
434
|
+
### Migration
|
|
435
|
+
All four are additive, opt-in via env flag. No existing behavior changes.
|
|
436
|
+
v0.70 flips the four to default-on if CVE-replay shows F1 delta ≥ +1pp
|
|
437
|
+
without precision drop >1pp across two consecutive runs.
|
|
438
|
+
|
|
439
|
+
## 0.68.0 — five capabilities that open clear competitive gap
|
|
440
|
+
|
|
441
|
+
Five world-class capabilities ship together. Each addresses something
|
|
442
|
+
mainstream SAST (SonarQube / Semgrep / Snyk / Checkmarx / Veracode /
|
|
443
|
+
CodeQL) does poorly or not at all.
|
|
444
|
+
|
|
445
|
+
### #3 Closed-loop auto-fix verification
|
|
446
|
+
|
|
447
|
+
`scanner/src/posture/fix-verify-loop.js` — new `verifyFixWithTests`
|
|
448
|
+
runs the full chain: re-scan + project linter + project test suite.
|
|
449
|
+
A fix is `verified-clean` only when all three pass.
|
|
450
|
+
|
|
451
|
+
Test-runner auto-discovery: `npm test`, pytest, go test, cargo test,
|
|
452
|
+
bundle exec rspec, mvn test, ./gradlew test. Returns one of:
|
|
453
|
+
`verified-clean`, `untested-but-passes` (no runner found — honest),
|
|
454
|
+
or `verification-failed` (with per-leg detail).
|
|
455
|
+
|
|
456
|
+
Competitor gap: most SAST tools suggest fixes but don't close the loop
|
|
457
|
+
by running the user's tests.
|
|
458
|
+
|
|
459
|
+
### #4 LLMSecOps coverage (3 new detectors)
|
|
460
|
+
|
|
461
|
+
| Module | CWE | What it catches |
|
|
462
|
+
|--------|-----|-----------------|
|
|
463
|
+
| `sast/llm-stored-prompt.js` | CWE-1336 | System prompt sourced from DB / config file / writable mount fed to LLM call without hardening (delimiters, immutable instruction prefix, allow-list) |
|
|
464
|
+
| `sast/rag-poisoning.js` | CWE-1336 | User-controlled text written to Chroma/Pinecone/Weaviate/Qdrant/LangChain/pgvector without `metadata: { source, trust_level }` provenance |
|
|
465
|
+
| `sast/agent-tool-escalation.js` | CWE-269 | Agent harness exposes both READ tools (list/get/fetch/scrape) and ACT tools (exec/write/send/delete) with no approval gate between them — classic tool-chain privilege escalation |
|
|
466
|
+
|
|
467
|
+
Competitor gap: nobody else ships LLM-agent-specific privilege flow
|
|
468
|
+
analysis. The AI security market is wide open.
|
|
469
|
+
|
|
470
|
+
### #7 Probabilistic exploitability with Wilson 95% CI
|
|
471
|
+
|
|
472
|
+
`scanner/src/posture/exploitability-probability.js` — replaces opaque
|
|
473
|
+
severity strings with a calibrated probability + 95% confidence interval:
|
|
474
|
+
|
|
475
|
+
```
|
|
476
|
+
f.exploitProbability ∈ [0,1]
|
|
477
|
+
f.exploitProbabilityCI95 [lo, hi]
|
|
478
|
+
f.exploitProbabilityWhy string[] -- which factors fired
|
|
479
|
+
f.exploitProbabilitySlice 'CWE-89×js' | 'CWE-89' | 'prior-only'
|
|
480
|
+
```
|
|
481
|
+
|
|
482
|
+
Method: CISA-KEV-derived CWE-family prior + multiplicative factor
|
|
483
|
+
update (reachability, source provenance, sanitizer-in-path, project
|
|
484
|
+
hardening). Wilson CI from operator-curated `.agentic-security/
|
|
485
|
+
exploit-history.jsonl` when n ≥ 5 (slice grain); falls back to wider
|
|
486
|
+
prior-only CI when sample is thin. The CI WIDTH is the honest signal.
|
|
487
|
+
|
|
488
|
+
Competitor gap: every SAST emits severity strings; none surface
|
|
489
|
+
calibrated probability with uncertainty.
|
|
490
|
+
|
|
491
|
+
### #8 Provable-clean for SQL injection
|
|
492
|
+
|
|
493
|
+
`scanner/src/dataflow/proven-clean.js` — `proveSqlClean` walks the
|
|
494
|
+
function's CFG between every reaching source and the SQL sink,
|
|
495
|
+
verifies at least one parameterizer (catalog-tagged sanitizer or
|
|
496
|
+
known driver method: setString/AddWithValue/bindParam/etc.) sits on
|
|
497
|
+
the path. If proof holds, `f.provenClean = true` with
|
|
498
|
+
`f.provenanceProof.sanitizers: [...]`. Stronger statement than
|
|
499
|
+
"we didn't find a flow" — auditor-grade evidence.
|
|
500
|
+
|
|
501
|
+
v1 uses path-existence; v2 will substitute SMT-backed string-domain
|
|
502
|
+
constraints behind the same interface.
|
|
503
|
+
|
|
504
|
+
Competitor gap: existing tools emit "issue found" or "no issue
|
|
505
|
+
found." Nobody emits "proven safe."
|
|
506
|
+
|
|
507
|
+
### #9 Time-travel + counterfactual scanning
|
|
508
|
+
|
|
509
|
+
`scanner/src/history-scan.js` + two new CLI subcommands:
|
|
510
|
+
|
|
511
|
+
```
|
|
512
|
+
agentic-security history --since 6.months --interval 1.month
|
|
513
|
+
# Walks N historical git refs, scans each, emits a timeline of
|
|
514
|
+
# introduced + resolved findings between consecutive refs.
|
|
515
|
+
|
|
516
|
+
agentic-security what-if --overlay app.js:./new-app.js [--remove foo.js]
|
|
517
|
+
# Apply virtual file overlays + deletes, scan the counterfactual
|
|
518
|
+
# state, return findings delta vs. baseline. Working tree is never
|
|
519
|
+
# touched (overlay is in-memory via runFullScan's fileContents map).
|
|
520
|
+
```
|
|
521
|
+
|
|
522
|
+
Use cases: "What was our posture 6 months ago vs. today?" / "If I
|
|
523
|
+
remove this auth middleware, how many new findings appear?" / "If I
|
|
524
|
+
downgrade lodash to 4.17.20, how many CVE matches drop?"
|
|
525
|
+
|
|
526
|
+
Competitor gap: existing tools scan the working state. None offer
|
|
527
|
+
historical replay or counterfactual mode at this granularity.
|
|
528
|
+
|
|
529
|
+
### Test totals
|
|
530
|
+
|
|
531
|
+
**698 scanner tests pass / 0 fail** (up from 665 in v0.67).
|
|
532
|
+
|
|
533
|
+
### Migration
|
|
534
|
+
|
|
535
|
+
No breaking changes. All new capabilities are additive:
|
|
536
|
+
- LLM/RAG/agent detectors fire automatically on relevant code
|
|
537
|
+
- exploitProbability fields appear alongside existing severity
|
|
538
|
+
- provenClean is informational (does NOT drop findings)
|
|
539
|
+
- history + what-if are opt-in CLI subcommands
|
|
540
|
+
|
|
541
|
+
## 0.67.0 — detection rules for 6 new CWE families (SSTI / LDAP / open-redirect / response-splitting)
|
|
542
|
+
|
|
543
|
+
The v0.66 corpus expansion exposed six CWE families with no detection
|
|
544
|
+
coverage (or partial coverage that missed common shapes). This release
|
|
545
|
+
ships dedicated detectors plus a runner fix.
|
|
546
|
+
|
|
547
|
+
### New SAST detectors
|
|
548
|
+
|
|
549
|
+
| Module | CWE | Languages | What it catches |
|
|
550
|
+
|--------|-----|-----------|-----------------|
|
|
551
|
+
| `sast/ssti.js` | CWE-94 | py, js, php, java | Jinja2 `from_string` / `Template()`, Handlebars / EJS / Mustache / Pug `.compile`, Twig `createTemplate`, Velocity `evaluate` — fires only when the template body is non-literal AND has a taint hint or comes from a variable assigned from user input in the preceding 10 lines |
|
|
552
|
+
| `sast/open-redirect.js` | CWE-601 | js, py, java, php | `res.redirect` / `ctx.redirect` / `flask.redirect` / `HttpResponseRedirect` / Spring `"redirect:" + …` / PHP `header("Location: " . …)` with user-derived target AND no allow-list check in the preceding 30 lines |
|
|
553
|
+
| `sast/response-splitting.js` | CWE-113 | js, py, java, php | `setHeader` / `addHeader` / `response.headers[…] = …` / PHP `header()` with user-derived value (or method param in Java handler context) AND no CRLF strip / sanitizer above |
|
|
554
|
+
| `sast/ldap-injection.js` | CWE-90 | js, java, py | **Extended:** indirect filter shape (`String filter = "(uid=" + name + ")"; ctx.search(…, filter, …)`) and `search_s` / `paged_search` callees, gated on a file-level LDAP context hint |
|
|
555
|
+
|
|
556
|
+
XPath (CWE-643) and ReDoS (CWE-1333) already had working detectors; the
|
|
557
|
+
runner just wasn't checking the right arrays.
|
|
558
|
+
|
|
559
|
+
### Runner fix
|
|
560
|
+
|
|
561
|
+
`bench/cve-replay/runner.mjs` now consults `scan.findings`, `scan.secrets`,
|
|
562
|
+
`scan.supplyChain`, AND `scan.logicVulns` when scoring a fixture.
|
|
563
|
+
Previously, business-logic findings (where ReDoS / weak-crypto / behavioral
|
|
564
|
+
checks live) were invisible to the scoring pipeline.
|
|
565
|
+
|
|
566
|
+
### Engine cleanup
|
|
567
|
+
|
|
568
|
+
Removed the legacy coarse `(?:res\.redirect|response\.redirect|.redirect\(|header\(['"]Location)`
|
|
569
|
+
REGEX rule from `engine.js` — the new `scanOpenRedirect` detector is
|
|
570
|
+
precise (allow-list aware) and replaces it cleanly.
|
|
571
|
+
|
|
572
|
+
### Results on the v0.66 corpus
|
|
573
|
+
|
|
574
|
+
All 9 fixtures across the 6 new CWE families now score **pre:TP post:TN**:
|
|
575
|
+
|
|
576
|
+
| CVE | CWE | v0.66 | v0.67 |
|
|
577
|
+
|-----|-----|-------|-------|
|
|
578
|
+
| CVE-2017-16016-handlebars-ssti | CWE-94 | pre:FN | pre:TP post:TN |
|
|
579
|
+
| CVE-2017-9805-ldap-injection | CWE-90 | pre:FN | pre:TP post:TN |
|
|
580
|
+
| CVE-2018-1320-xpath-injection | CWE-643 | pre:TP | pre:TP post:TN |
|
|
581
|
+
| CVE-2019-8341-jinja-ssti | CWE-94 | pre:FN | pre:TP post:TN |
|
|
582
|
+
| CVE-2020-15252-open-redirect | CWE-601 | pre:TP post:FP | pre:TP post:TN |
|
|
583
|
+
| CVE-2020-7660-resp-splitting | CWE-113 | pre:FN | pre:TP post:TN |
|
|
584
|
+
| CVE-2021-25966-open-redirect-py | CWE-601 | pre:FN | pre:TP post:TN |
|
|
585
|
+
| CVE-2021-29622-ldap-py | CWE-90 | pre:FN | pre:TP post:TN |
|
|
586
|
+
| CVE-2021-3801-redos | CWE-1333 | pre:FN | pre:TP post:TN |
|
|
587
|
+
|
|
588
|
+
Aggregate F1: **0.500 → 0.597** on the same 88-entry corpus. Wilson 95%
|
|
589
|
+
CI [0.334, 0.523] (narrower than v0.66's [0.249, 0.429]). Regression
|
|
590
|
+
tier still F1=1.0.
|
|
591
|
+
|
|
592
|
+
### Tests
|
|
593
|
+
|
|
594
|
+
`scanner/test/new-cwe-detectors.test.js` — 11 tests covering each
|
|
595
|
+
detector's vulnerable + clean shape, including post-fixture
|
|
596
|
+
suppression patterns (allow-list checks for open-redirect, CRLF
|
|
597
|
+
sanitizers for response-splitting).
|
|
598
|
+
|
|
599
|
+
**665 scanner tests pass / 0 fail** (up from 654).
|
|
600
|
+
|
|
601
|
+
## 0.66.0 — interprocedural precision + LLM default-on + C# / Kotlin IRs + corpus to 88
|
|
602
|
+
|
|
603
|
+
Four world-class lifts shipped together. After v0.65 the F1=0.636 number
|
|
604
|
+
was honest but the engine was still k=1 monovariant, the LLM validator
|
|
605
|
+
was opt-in, and the IR coverage stopped at JS/TS/Python/Java.
|
|
606
|
+
|
|
607
|
+
### Interprocedural taint precision (engine semantics)
|
|
608
|
+
|
|
609
|
+
`scanner/src/dataflow/engine.js`:
|
|
610
|
+
- **k≥2 context-sensitive summaries.** At assign-from-call sites the
|
|
611
|
+
engine now builds the entry-taint-state from call args + current
|
|
612
|
+
taint via `entryStateFromCall()` and looks up (lazily computes) a
|
|
613
|
+
summary keyed by THAT entry state. Closes the "helper is pure when
|
|
614
|
+
called clean but tainted when called with user input" FN class.
|
|
615
|
+
- **`applyAtCallSite` wired.** Mutated by-reference params propagate
|
|
616
|
+
back to caller vars (`Object.assign(target, tainted)` → `target`
|
|
617
|
+
tainted in caller). Was previously dead code.
|
|
618
|
+
- **Fixed-point iteration.** `runTaintEngine` now runs the pre-pass
|
|
619
|
+
up to MAX_FP_ITERS (3) iterations or until the summary cache size
|
|
620
|
+
stabilizes — recursion no longer under-approximates. Budget caps
|
|
621
|
+
on walltime + cache size still hold.
|
|
622
|
+
|
|
623
|
+
Tests in `scanner/test/interproc-k2.test.js` lock the lifts: context
|
|
624
|
+
disambiguates tainted vs clean call sites, recursion converges within
|
|
625
|
+
budget, large helper chains finish within walltime.
|
|
626
|
+
|
|
627
|
+
### LLM validator default-on
|
|
628
|
+
|
|
629
|
+
`scanner/src/llm-validator/index.js` flips from opt-in to default-on:
|
|
630
|
+
|
|
631
|
+
| Env state | Behavior |
|
|
632
|
+
|----------------------------------------------|---------------|
|
|
633
|
+
| `LLM_ENDPOINT` unset | no-op |
|
|
634
|
+
| `LLM_ENDPOINT` set, `VALIDATE` unset | **runs** |
|
|
635
|
+
| `LLM_ENDPOINT` set, `VALIDATE=0` | no-op (opt-out) |
|
|
636
|
+
| `LLM_ENDPOINT` set, `VALIDATE=1` | runs (legacy) |
|
|
637
|
+
|
|
638
|
+
Cache by `(file-content-sha256, source→sink path, prompt version,
|
|
639
|
+
model id)` continues to suppress repeat calls. Fail-closed semantics
|
|
640
|
+
unchanged — any prompt-injection / verify-failure → escalate (keep).
|
|
641
|
+
|
|
642
|
+
### C# IR backend (new language)
|
|
643
|
+
|
|
644
|
+
`scanner/src/ir/parser-cs.js` (~290 lines) — regex-based first pass,
|
|
645
|
+
parallel approach to the legacy Python regex parser. Models method
|
|
646
|
+
declarations with modifiers, params, body extraction with brace-depth
|
|
647
|
+
tracking. Lowers `var x = …`, `Type x = …`, `x = …`, calls, return,
|
|
648
|
+
throw. Builds a linear CFG per method. Plus 24 C# catalog entries:
|
|
649
|
+
ASP.NET MVC sources (`Request.Form`, `Request.QueryString`,
|
|
650
|
+
`Request.Cookies`, `Request.Headers`, `Request.Body`), sinks (SqlCommand,
|
|
651
|
+
Process.Start, File.ReadAll*, WebClient, HttpClient, BinaryFormatter),
|
|
652
|
+
sanitizers (HtmlEncode, UrlEncode, GetFullPath, Parse/TryParse,
|
|
653
|
+
Regex.Escape, AddWithValue).
|
|
654
|
+
|
|
655
|
+
### Kotlin IR backend (new language)
|
|
656
|
+
|
|
657
|
+
`scanner/src/ir/parser-kt.js` (~250 lines) — same regex approach.
|
|
658
|
+
Models `fun` declarations with modifiers, params, optional return
|
|
659
|
+
type, body extraction. Lowers `val`/`var`/`x = …`, calls, return,
|
|
660
|
+
throw. Kotlin string interpolation (`"hi $x"` / `"hi ${name}"`) lowers
|
|
661
|
+
into IR template-expression form so the engine sees the inner taint.
|
|
662
|
+
Plus 14 Kotlin catalog entries: Ktor / Spring sources, JDBC / Exposed /
|
|
663
|
+
ProcessBuilder / readText / ObjectInputStream sinks, escapeHtml4 /
|
|
664
|
+
URLEncoder / toInt / canonicalFile / setString sanitizers.
|
|
665
|
+
|
|
666
|
+
Both IRs wire into `buildProjectIR` and `buildProjectIRAsync`. Tests
|
|
667
|
+
in `scanner/test/parser-cs-kt.test.js`: shape correctness, multi-method
|
|
668
|
+
files, end-to-end scan over ASP.NET + Ktor fixtures.
|
|
669
|
+
|
|
670
|
+
### CVE-replay corpus: 50 → 88 entries (20 CWEs × 8 languages)
|
|
671
|
+
|
|
672
|
+
`bench/cve-replay/generate-corpus-extended.mjs` adds 38 entries:
|
|
673
|
+
- 8 C# fixtures (exercises new IR)
|
|
674
|
+
- 8 Kotlin fixtures (exercises new IR)
|
|
675
|
+
- 6 new CWE families: SSTI (CWE-94), LDAP injection (CWE-90), XPath
|
|
676
|
+
injection (CWE-643), open redirect (CWE-601), HTTP response
|
|
677
|
+
splitting (CWE-113), regex DoS (CWE-1333)
|
|
678
|
+
- 16 framework variants for existing families (NestJS, Koa, Symfony,
|
|
679
|
+
Laravel, Gin, Fiber, etc.)
|
|
680
|
+
|
|
681
|
+
**Aggregate F1 = 0.500** (Wilson 95% CI [0.249, 0.429]) on the 88-entry
|
|
682
|
+
corpus. Lower than v0.65's 0.636 BECAUSE the new fixtures include
|
|
683
|
+
capabilities the scanner doesn't yet detect (C#/Kotlin coverage is
|
|
684
|
+
still thin; new CWE families have no detection rules). This is the
|
|
685
|
+
honest direction — broader corpus, narrower CI, real measurement.
|
|
686
|
+
Regression-tier CI gate remains F1=1.0.
|
|
687
|
+
|
|
688
|
+
### Test totals
|
|
689
|
+
|
|
690
|
+
654 scanner tests pass / 0 fail (up from 640 in v0.65). Smoke +
|
|
691
|
+
regression-tier CI both green.
|
|
692
|
+
|
|
693
|
+
### Migration
|
|
694
|
+
|
|
695
|
+
No breaking changes. To enable the LLM validator default-on path, set
|
|
696
|
+
`AGENTIC_SECURITY_LLM_ENDPOINT`. To opt out: `AGENTIC_SECURITY_LLM_VALIDATE=0`.
|
|
697
|
+
C# and Kotlin scans require no setup — drop a `.cs` or `.kt` file in
|
|
698
|
+
the scan tree.
|
|
699
|
+
|
|
700
|
+
## 0.65.0 — sanitizer catalog 8× / CVE corpus 6× / continuous CVE alerting
|
|
701
|
+
|
|
702
|
+
Closes three ASPM/SAST competitiveness gaps surfaced in the post-v0.64 review:
|
|
703
|
+
sanitizer coverage that lagged commercial vendors, a published F1 number
|
|
704
|
+
measured against a corpus too small to be credible, and a `/cve-alerts`
|
|
705
|
+
command that configured a webhook but never actually monitored anything.
|
|
706
|
+
|
|
707
|
+
### Sanitizer catalog: 48 → 372 entries (7.7×)
|
|
708
|
+
|
|
709
|
+
New module `scanner/src/dataflow/catalog-expanded.js` adds ~325 sanitizer
|
|
710
|
+
entries spanning 6 languages and 10 categories (HTML escape, SQL
|
|
711
|
+
parameterization, shell escape, URL encode, path normalize, regex escape,
|
|
712
|
+
LDAP/XPath, XML/JSON, validators, type coercion). Merged into the main
|
|
713
|
+
catalog at load time; on id collision the base catalog wins.
|
|
714
|
+
|
|
715
|
+
| Language | Before | After |
|
|
716
|
+
|-------------|-------:|------:|
|
|
717
|
+
| JavaScript | 11 | 105 |
|
|
718
|
+
| Python | 11 | 96 |
|
|
719
|
+
| Java | 8 | 61 |
|
|
720
|
+
| PHP | 4 | 41 |
|
|
721
|
+
| Ruby | 5 | 33 |
|
|
722
|
+
| Go | 2 | 36 |
|
|
723
|
+
| **Total** | **48** |**372**|
|
|
724
|
+
|
|
725
|
+
Tests in `scanner/test/catalog-expanded.test.js` enforce: minimum entry
|
|
726
|
+
count, per-language coverage floors, well-formed entry shape, no
|
|
727
|
+
duplicate IDs across the merged catalog, callee identifiers that the
|
|
728
|
+
indexer can match, and family vocabulary hygiene.
|
|
729
|
+
|
|
730
|
+
Two pre-existing duplicate IDs in the base catalog (`py-input`,
|
|
731
|
+
`py-os-environ`, `py-open`, plus 14 in the v2 Python block) were fixed
|
|
732
|
+
in this pass — the duplicate-id test surfaced them.
|
|
733
|
+
|
|
734
|
+
### CVE-replay corpus: 8 → 50 entries (6.25×)
|
|
735
|
+
|
|
736
|
+
`bench/cve-replay/generate-corpus.mjs` emits 42 capability-tier fixtures
|
|
737
|
+
across 11 high-priority CWE families and 6 languages:
|
|
738
|
+
|
|
739
|
+
| Family | CWE | Entries |
|
|
740
|
+
|---------------------|------------|--------:|
|
|
741
|
+
| SQL injection | CWE-89 | 5 |
|
|
742
|
+
| XSS | CWE-79 | 4 |
|
|
743
|
+
| Command injection | CWE-78 | 5 |
|
|
744
|
+
| Path traversal | CWE-22 | 5 |
|
|
745
|
+
| SSRF | CWE-918 | 4 |
|
|
746
|
+
| Deserialization | CWE-502 | 4 |
|
|
747
|
+
| XXE | CWE-611 | 3 |
|
|
748
|
+
| Prototype pollution | CWE-1321 | 2 |
|
|
749
|
+
| CSRF | CWE-352 | 2 |
|
|
750
|
+
| Hardcoded secrets | CWE-798 | 3 |
|
|
751
|
+
| Weak crypto | CWE-327/338| 5 |
|
|
752
|
+
|
|
753
|
+
Aggregate F1 against the new corpus is **0.636** (Wilson 95% CI [0.346,
|
|
754
|
+
0.591]) — an honest baseline, replacing the previous F1 number measured
|
|
755
|
+
against 8 cherry-picked fixtures. The regression-tier CI gate still
|
|
756
|
+
passes F1=1.0. Failing capability entries graduate to regression as fixes
|
|
757
|
+
land (CONTRIBUTING.md's 5-snapshot rule).
|
|
758
|
+
|
|
759
|
+
### Continuous CVE alerting daemon
|
|
760
|
+
|
|
761
|
+
New `scanner/src/posture/cve-alert-daemon.js` polls OSV for the project's
|
|
762
|
+
dependency tree and fires the configured webhook when a new advisory
|
|
763
|
+
drops. Multi-ecosystem: npm, PyPI, Ruby, Go, Cargo, Composer, Maven,
|
|
764
|
+
Dart. Reads `.agentic-security/cve-alerts.json` (the schema written by
|
|
765
|
+
`/cve-alerts`), dedupes against `.agentic-security/cve-alerts-state.json`
|
|
766
|
+
so re-runs don't re-page. Slack / Discord / generic webhook payload
|
|
767
|
+
shapes built in.
|
|
768
|
+
|
|
769
|
+
- `agentic-security cve-watch [--alert-url] [--min-severity] [--dry-run]`
|
|
770
|
+
— one-shot run. Schedule it via cron or CI.
|
|
771
|
+
- `scripts/ci-templates/cve-watch.github-actions.yml` — drop-in GitHub
|
|
772
|
+
Actions workflow (daily 08:00 UTC + `workflow_dispatch`). Reads
|
|
773
|
+
`CVE_ALERT_URL` from repo secrets; commits state file with `[skip ci]`.
|
|
774
|
+
|
|
775
|
+
21 unit tests in `scanner/test/cve-alert-daemon.test.js` cover each
|
|
776
|
+
manifest reader, severity normalization, deduplication across runs,
|
|
777
|
+
min-severity floors, payload formatting, and offline-mode refusal.
|
|
778
|
+
|
|
779
|
+
### Migration notes
|
|
780
|
+
|
|
781
|
+
- Re-running `npm run build` is recommended to bundle the new daemon
|
|
782
|
+
binary entry. No breaking changes; all v0.64.0 commands and skills
|
|
783
|
+
still work as before.
|
|
784
|
+
- The capability-tier F1 score in the manifest is intentionally honest
|
|
785
|
+
(0.636, not 0.85). Path to 0.85 is more corpus, not better numbers.
|
|
786
|
+
|
|
787
|
+
## 0.64.0 — auto-activating skills + multi-harness manifests
|
|
788
|
+
|
|
789
|
+
Inspired by patterns from the obra/superpowers plugin's "mandatory workflows,
|
|
790
|
+
not suggestions" stance: the agent shouldn't wait for the user to type
|
|
791
|
+
`/scan` or `/fix` before doing the security thing. Nine new auto-activating
|
|
792
|
+
skills cover the common security/privacy moments where the agent should
|
|
793
|
+
intervene before damage lands. Plus Codex / Cursor / Gemini manifests so the
|
|
794
|
+
12 MCP tools work in those harnesses too.
|
|
795
|
+
|
|
796
|
+
### Auto-activating skills (9 new)
|
|
797
|
+
|
|
798
|
+
Each lives at `skills/<slug>/SKILL.md`. The `description:` frontmatter is
|
|
799
|
+
the activation cue Claude Code's skill router reads. All ≤120 chars,
|
|
800
|
+
enforced by `npm run test:lifecycle`.
|
|
801
|
+
|
|
802
|
+
- **`security-explain-cve`** — fires when user mentions CVE-id / GHSA / asks "what is this vuln". Routes to `lookup_cve` MCP tool + `/explain`.
|
|
803
|
+
- **`security-scan-on-deploy`** — fires on "ship / deploy / launch / is this safe?" intent. Checks `last-scan.json` mtime, runs a fresh scan if stale, renders a verdict (not a wall of findings).
|
|
804
|
+
- **`security-fix-finding`** — fires when user references a finding and asks to fix. Enforces the deterministic toolchain (`synthesize_fix → verify_fix → apply_fix`); refuses raw `Edit`.
|
|
805
|
+
- **`security-weak-crypto`** — fires **before** the agent writes md5/sha1 for passwords, DES/3DES/RC4, static IVs, `Math.random` for tokens, or JWT with `none` algorithm. Refuses the write, proposes the right primitive with literal code.
|
|
806
|
+
- **`security-rotate-leak`** — fires when a leaked secret is mentioned. Masks the value, detects the provider, prints the revoke URL, estimates blast radius BEFORE rotating, refuses to print the value back.
|
|
807
|
+
- **`security-eval-warn`** — fires before `eval()` / `new Function()` / `setTimeout(string,…)` / `pickle.loads` / `eval($x)` / `class_eval`. Diagnoses what the user actually wants, proposes the structured alternative.
|
|
808
|
+
- **`security-sql-injection-warn`** — fires before template-literal queries / `+`-concat into SQL / NoSQL operator injection / LDAP/XPath concat. Shows the literal parameterized form for the user's specific DB driver.
|
|
809
|
+
- **`threat-model-first`** — fires **before** the agent writes new auth / secret / external-API / file-upload / OAuth / deserialization code. Walks STRIDE per touch-point (one sentence per row, no skipping); writes `TM.md` to `.agentic-security/agent-scratchpad/threat-model/<session>/` via `append_scratchpad`. Then proposes implementation with each defensive measure citing its STRIDE row in a code comment.
|
|
810
|
+
- **`privacy-data-flow`** — fires **before** the agent writes code touching PII / PHI / PCI / GDPR-special / confidential data shapes. Classifies the data, traces the destination (storage tier / encryption / third-party processors / logging / retention / backups / replication), maps to jurisdiction (GDPR / HIPAA / CCPA / PCI-DSS), writes `DATA_FLOW.md` to the scratchpad. Refuses hard violations (logging full PAN, sending PHI to non-BAA processor, storing CVV after auth).
|
|
811
|
+
|
|
812
|
+
### Skills-registry integrity test
|
|
813
|
+
|
|
814
|
+
`scanner/test/skills-registry.test.js` enforces:
|
|
815
|
+
- Every `skills/<slug>/SKILL.md` has well-formed YAML frontmatter
|
|
816
|
+
- `name:` equals `agentic-security:<slug>`
|
|
817
|
+
- `description:` is ≤ 120 chars (re-asserted at unit-test time)
|
|
818
|
+
- Auto-activating skills include an "Activate" / "Activate on" cue
|
|
819
|
+
- Every `/<slash-command>` referenced in a skill body resolves to a real
|
|
820
|
+
file under `commands/`
|
|
821
|
+
|
|
822
|
+
7 new tests, all passing.
|
|
823
|
+
|
|
824
|
+
### Multi-harness manifests (3 new)
|
|
825
|
+
|
|
826
|
+
The MCP server is harness-agnostic — same binary, different manifest:
|
|
827
|
+
|
|
828
|
+
| Harness | Manifest |
|
|
829
|
+
|----------------|-----------------------------------|
|
|
830
|
+
| Claude Code | `.claude-plugin/plugin.json` (already shipping) |
|
|
831
|
+
| **Codex CLI** | `.codex-plugin/plugin.json` (new) |
|
|
832
|
+
| **Cursor** | `.cursor-plugin/plugin.json` (new) |
|
|
833
|
+
| **Gemini CLI** | `gemini-extension.json` (root) (new) |
|
|
834
|
+
|
|
835
|
+
Each manifest declares the same `agentic-security` MCP server pointing at
|
|
836
|
+
`scanner/bin/agentic-security-mcp.js`. Each carries an explicit note about
|
|
837
|
+
which surface IS validated vs not. The 12 MCP tools work identically across
|
|
838
|
+
all four harnesses; the slash-command + skill-activation surface is Claude-
|
|
839
|
+
Code-specific today.
|
|
840
|
+
|
|
841
|
+
README updated with an "Install in your harness" table covering all four
|
|
842
|
+
plus the generic MCP-aware-client fallback.
|
|
843
|
+
|
|
844
|
+
### Lint state
|
|
845
|
+
|
|
846
|
+
89 surfaces total (80 commands + 9 skills + add-scan-rule SKILL). All
|
|
847
|
+
within the 120-char description / 200-char argument-hint caps.
|
|
848
|
+
|
|
849
|
+
### Tests
|
|
850
|
+
|
|
851
|
+
619/619 passing (was 612 in v0.63.0; +7 skills-registry tests).
|
|
852
|
+
|
|
853
|
+
## 0.63.0 — Python IR via stdlib ast (real parser, regex fallback)
|
|
854
|
+
|
|
855
|
+
Replaces the hand-rolled regex Python parser with Python 3's stdlib `ast`
|
|
856
|
+
module (zero npm bundle bloat, zero pip install, runs in a per-scan
|
|
857
|
+
subprocess) and keeps the regex parser as a fallback when Python isn't on
|
|
858
|
+
PATH. The new path closes the gaps the regex parser admitted to in its own
|
|
859
|
+
comments: comprehensions, decorators, `match` statements, `async`/`await`,
|
|
860
|
+
lambda bodies, and nested-paren default args (`def f(x=Foo(1,2))`).
|
|
861
|
+
|
|
862
|
+
### What ships
|
|
863
|
+
|
|
864
|
+
- **`scanner/src/ir/parser-py.helper.py`** — Python 3.8+ stdlib script
|
|
865
|
+
that reads `[{file, content}, ...]` JSON on stdin and emits the same
|
|
866
|
+
IR shape as the regex parser, but computed from a real AST. Models
|
|
867
|
+
assign / call / member / subscript / f-string / if / for / while /
|
|
868
|
+
try-except / return / raise / async-for / async-with. Captures every
|
|
869
|
+
function definition (including nested, decorated, async, generic) even
|
|
870
|
+
when the body has unmodeled constructs.
|
|
871
|
+
- **`scanner/src/ir/parser-py-cst.js`** — Node-side dispatcher.
|
|
872
|
+
Batched: ALL Python files in a project go in one subprocess invocation.
|
|
873
|
+
Capability probe cached per-process. 10 s timeout on the whole batch.
|
|
874
|
+
- **`scanner/src/ir/index.js`** — three-mode toggle:
|
|
875
|
+
`AGENTIC_SECURITY_PY_PARSER=auto` (default, falls back silently when
|
|
876
|
+
python3 missing), `cst` (force, error if unavailable), `regex`
|
|
877
|
+
(force legacy).
|
|
878
|
+
- **`scanner/src/ir/CLAUDE.md`** — documents the dual-parser shape,
|
|
879
|
+
the IR contract every parser must produce, and the retirement plan
|
|
880
|
+
for the regex parser.
|
|
881
|
+
|
|
882
|
+
### What's STILL not modeled
|
|
883
|
+
|
|
884
|
+
The CST parser intentionally emits `kind: 'noop'` for these to keep the
|
|
885
|
+
CFG bounded — the regex parser dropped the entire function for the same
|
|
886
|
+
shapes; we capture the function record but skip the body lowering:
|
|
887
|
+
|
|
888
|
+
- `match` statement case bodies (function is captured; per-case taint
|
|
889
|
+
flow not yet routed)
|
|
890
|
+
- destructuring assignment (`a, b = req.body`) — only single-target
|
|
891
|
+
assigns get a precise `target` field
|
|
892
|
+
- comprehension `if` filters and multi-`for` generators — the elt is
|
|
893
|
+
modeled; the generator's own predicates aren't
|
|
894
|
+
|
|
895
|
+
### Cost / risk
|
|
896
|
+
|
|
897
|
+
- One `python3` subprocess per `runScan`, not per file. Batched stdin
|
|
898
|
+
payload. Capability probe runs once and is cached.
|
|
899
|
+
- When python3 isn't installed (or is < 3.8), the regex parser handles
|
|
900
|
+
the scan unchanged. No behavior regression for those customers.
|
|
901
|
+
- Set `AGENTIC_SECURITY_PY_PARSER_DEBUG=1` to surface fallback events
|
|
902
|
+
on stderr.
|
|
903
|
+
|
|
904
|
+
### Tests
|
|
905
|
+
|
|
906
|
+
12 new CST-specific tests in `scanner/test/parser-py-cst.test.js`
|
|
907
|
+
covering decorators, async, nested-paren defaults, match statements, list
|
|
908
|
+
comprehension taint flow, nested function defs, batch behavior, syntax-
|
|
909
|
+
error isolation per file, single-file/batch shim equivalence. All skip
|
|
910
|
+
gracefully when python3 isn't on PATH. Total suite: 612/612 passing.
|
|
911
|
+
|
|
912
|
+
## 0.62.0 — agent-harness hardening + slash-command consolidation
|
|
913
|
+
|
|
914
|
+
Five rounds of analysis applied to the plugin's scanner + MCP server + sub-agent
|
|
915
|
+
harness across this release. Each section corresponds to one external source;
|
|
916
|
+
in-source comments tag the originating thread (`premortem #N`, `post-rec #N`,
|
|
917
|
+
`harness-anatomy #N`) for cross-reference.
|
|
918
|
+
|
|
919
|
+
### Security & integrity (premortem hardening)
|
|
920
|
+
|
|
921
|
+
- **Per-install HMAC key** for `last-scan.json` integrity (was hostname-derived
|
|
922
|
+
and publicly forgeable in CI / containers). Stored at
|
|
923
|
+
`$XDG_CONFIG_HOME/agentic-security/scan-key`; override via
|
|
924
|
+
`$AGENTIC_SECURITY_HMAC_KEY`. Legacy hostname key verified for one release
|
|
925
|
+
to migrate existing signed scans.
|
|
926
|
+
- **MCP reserved-write list expanded** to `.github/`, `.gitlab/`, `.circleci/`,
|
|
927
|
+
`.buildkite/`, `.terraform/`, IaC dirs, every common manifest basename
|
|
928
|
+
(`Dockerfile`, `Jenkinsfile`, `package.json`, lockfiles, `pom.xml`,
|
|
929
|
+
`Cargo.toml`, …) and `*.tf` / `docker-compose.yml`. Closes the
|
|
930
|
+
forged-finding-rewrites-CI-workflow attack path.
|
|
931
|
+
- **`rules.yml disable:` requires signature.** `applyOverrides` now refuses
|
|
932
|
+
the `disable:` list unless `.agentic-security/rules.yml.sig` verifies
|
|
933
|
+
under the per-install HMAC. `severityOverrides`, `custom:`, `ignorePaths`
|
|
934
|
+
are not gated (they don't reduce coverage). Override via
|
|
935
|
+
`$AGENTIC_SECURITY_RULES_UNSIGNED=1`.
|
|
936
|
+
- **MCP `SERVER_VERSION`** reads `package.json` at module load (was a
|
|
937
|
+
hardcoded literal that rotted).
|
|
938
|
+
- **MCP `find_rule_module` tool** for codebase navigation (CWE / family →
|
|
939
|
+
detector file) without grep-and-pray.
|
|
940
|
+
- **MCP `apply_fix`** now passes patch text through unredacted (the prior
|
|
941
|
+
redact-on-output behavior silently corrupted valid patches whose content
|
|
942
|
+
matched a secret-shape).
|
|
943
|
+
- **Per-stableId attempt budget** (default 2) on `apply_fix`. Refuses a
|
|
944
|
+
third attempt with structured `{ budgetExceeded, attempts, maxAttempts }`.
|
|
945
|
+
- **Optional remote audit-log sink.** Set
|
|
946
|
+
`$AGENTIC_SECURITY_AUDIT_WEBHOOK=<url>` and every MCP tool call is
|
|
947
|
+
fire-and-forget POSTed to the witness. Closes the full-file-rewrite
|
|
948
|
+
blind spot of the local-only hash chain.
|
|
949
|
+
|
|
950
|
+
### Scanner correctness
|
|
951
|
+
|
|
952
|
+
- **`SummaryCache` wired** into the taint engine (k=1 monovariant
|
|
953
|
+
return-taint). Was dead code; now the assign-from-call lattice consults
|
|
954
|
+
cached summaries for resolved callees.
|
|
955
|
+
- **Per-flow source attribution** in IR-TAINT (was first-source-globally-
|
|
956
|
+
seen; produced misattributed evidence in findings).
|
|
957
|
+
- **`finding-defaults` backfill** stamps `parser` + `family` on every
|
|
958
|
+
finding before calibration / confidence run. Closes the "0 parser /
|
|
959
|
+
20 family null on a smoke run" silent-no-op.
|
|
960
|
+
- **Tautological Brier removed.** `computeBrierFromHistory` (always
|
|
961
|
+
returned 0) replaced with `computeBrierOnHeldOut(samples)` taking real
|
|
962
|
+
labels. New `posture/holdout-eval.js` evaluator: Brier + ECE + per-family
|
|
963
|
+
TP/FP + Wilson CI.
|
|
964
|
+
- **PoC param-key inference** reads the actual handler file window;
|
|
965
|
+
surfaces `paramKey`, `paramKeyConfidence`, `paramKeyInferred`. Low-
|
|
966
|
+
confidence PoCs trigger `regression-test-gen` to refuse rather than
|
|
967
|
+
ship a fake-passing test.
|
|
968
|
+
- **CVE-replay scoring fixed.** TN branch reachable; pre/post scored
|
|
969
|
+
independently. Per-slice F1 (by CWE, language, source-quality tier).
|
|
970
|
+
Wilson 95% CI on the aggregate TP-rate.
|
|
971
|
+
- **Python parser** switched to a balanced-paren scanner for calls + def
|
|
972
|
+
signatures (was a `[^()]*` regex that rejected `db.execute(sanitize(x))`
|
|
973
|
+
and `def f(x=Foo(1,2))`).
|
|
974
|
+
|
|
975
|
+
### Agent harness
|
|
976
|
+
|
|
977
|
+
- **`security-fixer` writes via MCP, not Edit.** Tool list stripped to
|
|
978
|
+
`Read, Bash, Grep`. The deterministic toolchain (`synthesize_fix` →
|
|
979
|
+
`verify_fix` → `apply_fix`) is the only write path. The LLM is the
|
|
980
|
+
intent layer; the MCP server is the execution layer.
|
|
981
|
+
- **Subagent path-confinement schema** (`agents/_CONFINEMENT.md`) shared
|
|
982
|
+
with the MCP reserved-write list.
|
|
983
|
+
- **`security-fixer` consumes structured `verify_fix.introduced[]`** to
|
|
984
|
+
diagnose template-incomplete vs codebase-prior vs lint-failed outcomes.
|
|
985
|
+
- **PLAN.md decomposition convention** for batched runs:
|
|
986
|
+
`.agentic-security/agent-scratchpad/<agent>/<session>/PLAN.md`. Survives
|
|
987
|
+
context resets; auditable artifact for governance.
|
|
988
|
+
- **AGENTS.md continual learning.** `.agentic-security/AGENTS.md` is the
|
|
989
|
+
append-only narrative file the agent writes to at session end. The
|
|
990
|
+
SessionStart hook reads it; the Stop hook nudges the agent to record an
|
|
991
|
+
entry when work happened.
|
|
992
|
+
- **MCP scratchpad pair** (`append_scratchpad`, `read_scratchpad`)
|
|
993
|
+
confined to `.agentic-security/agent-scratchpad/<agent>/<session>/`.
|
|
994
|
+
Strict path validation; 2 MB / file, 50 MB total caps.
|
|
995
|
+
- **MCP tool-output offloading.** `scan_diff` and `explain_finding`
|
|
996
|
+
results exceeding `OFFLOAD_THRESHOLD` (default 10) write the full payload
|
|
997
|
+
to the scratchpad; the response shrinks to `{ head, tail, total,
|
|
998
|
+
scratchpadPath, pagingHint }`. The agent pages through with
|
|
999
|
+
`read_scratchpad`.
|
|
1000
|
+
- **MCP `lookup_cve`** tool: read-only access to local OSV / KEV / EPSS
|
|
1001
|
+
caches with staleness tiers. Closes the knowledge-cutoff gap for SCA
|
|
1002
|
+
reasoning without triggering a network fetch.
|
|
1003
|
+
- **MCP `append_agents_memory` / `read_agents_memory`** tools wrap the
|
|
1004
|
+
AGENTS.md surface.
|
|
1005
|
+
|
|
1006
|
+
### Evals + benches
|
|
1007
|
+
|
|
1008
|
+
- **CVE-replay corpus tiered** into `regression/` (CI gates here — F1=1.0
|
|
1009
|
+
required) and `capability/` (frontier; failure informational).
|
|
1010
|
+
Graduation policy: 5 consecutive passes → promote.
|
|
1011
|
+
- **`npm run bench:cve-replay:ci`** new CI gate.
|
|
1012
|
+
- **Agent-task corpus** at `bench/agent-tasks/security-fixer/`: end-to-end
|
|
1013
|
+
eval of the deterministic toolchain (synth → verify → apply) against
|
|
1014
|
+
fresh temp copies of fixtures. 7 graders per task; pass@1 reporting.
|
|
1015
|
+
- **`llm-validator` consistency harness** (`scanner/src/llm-validator/
|
|
1016
|
+
consistency.js` + `agentic-security-consistency` bin): pass^k stability
|
|
1017
|
+
measurement across N trials on the same fixture set.
|
|
1018
|
+
- **Human ↔ LLM grader calibration** (`posture/grader-calibration.js`):
|
|
1019
|
+
Cohen's κ between `/triage` human verdicts and validator verdicts on
|
|
1020
|
+
the stableId overlap. Alarm when κ < 0.6 with n ≥ 10.
|
|
1021
|
+
- **`agentic-security-audit` CLI**: `review`, `metrics`, `verify`
|
|
1022
|
+
subcommands for the MCP audit log. `--by-session` aggregation with
|
|
1023
|
+
outlier flagging (default ≥20 calls per tool).
|
|
1024
|
+
- **`audit.js`** stamps `sessionId` on every entry.
|
|
1025
|
+
|
|
1026
|
+
### Repo structure (Claude-Code-at-scale)
|
|
1027
|
+
|
|
1028
|
+
- **`.claude/settings.json`** with team-committed read-deny list
|
|
1029
|
+
(generated bundle, bench caches, scan-state JSON) to keep noise out of
|
|
1030
|
+
context.
|
|
1031
|
+
- **Subdirectory `CLAUDE.md` files** added: `scanner/`,
|
|
1032
|
+
`scanner/src/{sast,posture,dataflow,mcp}/`. Root `CLAUDE.md` trimmed
|
|
1033
|
+
253 → 115 lines (pointers + gotchas only).
|
|
1034
|
+
- **`npm test` split into scoped scripts**: `test:smoke / sast / posture /
|
|
1035
|
+
dataflow / mcp / report / bench-modules / lifecycle`. Full suite chains
|
|
1036
|
+
them.
|
|
1037
|
+
- **Stop hook (`hooks/session-stop-drift-check.js`)** flags new modules
|
|
1038
|
+
in `scanner/src/{sast,posture,dataflow,mcp}/` not yet indexed in the
|
|
1039
|
+
matching subdir CLAUDE.md, plus prompts for an AGENTS.md entry when
|
|
1040
|
+
the session touched tracked files.
|
|
1041
|
+
- **SessionStart self-check (`hooks/session-start-self-check.js`)**
|
|
1042
|
+
validates every command/agent frontmatter shape; surfaces malformed
|
|
1043
|
+
surfaces.
|
|
1044
|
+
- **`skills/add-scan-rule/SKILL.md`** holds the "add a new SAST rule"
|
|
1045
|
+
workflow as an on-demand skill (was in root CLAUDE.md).
|
|
1046
|
+
- **`docs/POSITIONING.md`** — explicit ICP statement (vibecoder-first;
|
|
1047
|
+
pro follow-on).
|
|
1048
|
+
|
|
1049
|
+
### Slash-command consolidation (LangChain harness-anatomy #5)
|
|
1050
|
+
|
|
1051
|
+
The 77-command surface was the exact "tool proliferation" anti-pattern the
|
|
1052
|
+
post warned about. Always-paid frontmatter (description + argument-hint)
|
|
1053
|
+
trimmed **20.3 KB → 11.3 KB (44% reduction)**.
|
|
1054
|
+
|
|
1055
|
+
- **Description cap of 120 chars** + argument-hint cap of 200 chars,
|
|
1056
|
+
enforced by `scripts/lint-command-descriptions.mjs` in
|
|
1057
|
+
`npm run test:lifecycle`. 76 surfaces trimmed.
|
|
1058
|
+
- **Eleven commands folded into canonical forms**, with deprecated
|
|
1059
|
+
aliases kept one release for muscle memory:
|
|
1060
|
+
|
|
1061
|
+
| Old | New |
|
|
1062
|
+
|-----|-----|
|
|
1063
|
+
| `/ci-gate-multi` | `/ci-gate --provider <name>` |
|
|
1064
|
+
| `/rotate-key-auto` | `/rotate-secret --auto` |
|
|
1065
|
+
| `/trim-dead-code` | `/trim --what code` |
|
|
1066
|
+
| `/trim-dependencies` | `/trim --what deps` |
|
|
1067
|
+
| `/story-explain` | `/explain --narrative` |
|
|
1068
|
+
| `/security-badge` | `/security-attestation` (default) |
|
|
1069
|
+
| `/security-onepager` | `/security-attestation --format onepager` |
|
|
1070
|
+
| `/trust-page` | `/security-attestation --format page` |
|
|
1071
|
+
| `/dep-pinning` | `/supply-chain-check --show pinning` |
|
|
1072
|
+
| `/dep-freshness` | `/supply-chain-check --show freshness` |
|
|
1073
|
+
| `/dep-alternatives` | `/supply-chain-check --show alternatives` |
|
|
1074
|
+
|
|
1075
|
+
- **Skipped on purpose:** `/secure` (vibecoder entry point — kept
|
|
1076
|
+
untouched); the LLM-sec cluster (each command serves a distinct
|
|
1077
|
+
workflow). Tier 3 demote-to-skills also skipped after investigation —
|
|
1078
|
+
Claude Code today loads both commands and skills' descriptions in the
|
|
1079
|
+
always-paid surface, so the move wouldn't actually save context.
|
|
1080
|
+
|
|
1081
|
+
### Tests
|
|
1082
|
+
|
|
1083
|
+
600/600 tests passing. CVE-replay CI gate green (regression F1=1.0 on
|
|
1084
|
+
3 entries). Lint gate green (all 80 surfaces within caps).
|
|
1085
|
+
|
|
1086
|
+
## 0.51.0 — 11 of 16 PRD-missing features (5 research items deferred)
|
|
1087
|
+
|
|
1088
|
+
This release lands all 11 tractable FRs from the v2 PRD audit. The 5
|
|
1089
|
+
research-level FRs (k=2 calling context, narrow symbolic execution, hybrid
|
|
1090
|
+
static+dynamic, eBPF/dtrace live instrumentation, LLM-based intent
|
|
1091
|
+
inference) are deferred to Phase 6+ with their reasons documented in the
|
|
1092
|
+
PRD.
|
|
1093
|
+
|
|
1094
|
+
### Shipped
|
|
1095
|
+
|
|
1096
|
+
- **FR-CHAIN-FILTER** (`posture/cross-lang-meta.js`). Cross-language chain
|
|
1097
|
+
detectors only chain to chain-worthy families (sql-injection,
|
|
1098
|
+
command-injection, xss, ssrf, code-injection, deserialization, xxe,
|
|
1099
|
+
path-traversal, idor, mass-assignment, prototype pollution, and others).
|
|
1100
|
+
Eliminates the "queue chain to CSRF" semantic-noise the polyglot bench
|
|
1101
|
+
surfaced.
|
|
1102
|
+
- **FR-FAMILY-REGISTRY** (`posture/cross-lang-meta.js`). Cross-language
|
|
1103
|
+
chains get canonical family names (xlang-openapi / xlang-grpc /
|
|
1104
|
+
xlang-graphql / xlang-queue / xlang-orm / xlang-iac / xlang-unknown).
|
|
1105
|
+
- **FR-LEARN-7** (`bin/agentic-security reset`). Right-to-delete CLI;
|
|
1106
|
+
wipes accumulated learned state while preserving operator-authored
|
|
1107
|
+
config. `--yes` to actually delete; `--keep <names>` to spare specific
|
|
1108
|
+
items.
|
|
1109
|
+
- **FR-PY-SAST** (`sast/python-sinks.js`). Python sink-side coverage:
|
|
1110
|
+
SQLAlchemy text() with f-string, cursor.execute concat, os.system /
|
|
1111
|
+
subprocess shell=True, pickle.loads, yaml.load, marshal.loads, eval/exec
|
|
1112
|
+
on request data, compile() on user input, flask.send_file with user
|
|
1113
|
+
path, send_from_directory, open() with f-string, requests verify=False,
|
|
1114
|
+
ssl._create_unverified_context, requests/urlopen with user URL, lxml/
|
|
1115
|
+
etree on user input. **Closes G3:** polyglot F1 went from 0.727 → 1.00.
|
|
1116
|
+
- **FR-VER-3** (`posture/regression-test-gen.js`). Per finding with a PoC,
|
|
1117
|
+
emit a framework-idiomatic regression test (Jest for Node, pytest for
|
|
1118
|
+
Python). Surfaced as `f.regression_test = { lang, framework, filename,
|
|
1119
|
+
runHint, code }`.
|
|
1120
|
+
- **FR-LIVE-HARNESS** (`posture/verifier-target.js`). Schema for
|
|
1121
|
+
`.agentic-security/verifier-target.yaml` describing how to bring up the
|
|
1122
|
+
customer's app (docker-compose or command shape). The `verify --live`
|
|
1123
|
+
CLI auto-discovers it. Safety: `command` shape requires a known-good
|
|
1124
|
+
start pattern unless `AGENTIC_SECURITY_VERIFY_TARGET_OK=1`.
|
|
1125
|
+
- **FR-XSAT-7** (`posture/iam-policy.js`). AWS IAM policy auditing.
|
|
1126
|
+
Curated dangerous-actions list (iam:*, s3:*, lambda:*, ec2:*, dynamodb:*,
|
|
1127
|
+
rds:*, secretsmanager:*, kms:*). Flag Effect=Allow + wildcard resource
|
|
1128
|
+
+ no Condition.
|
|
1129
|
+
- **FR-XSAT-8** (`posture/container-runtime.js`). Dockerfile + k8s
|
|
1130
|
+
manifest + ECS task def. Detects USER root, privileged: true,
|
|
1131
|
+
hostNetwork, hostPID, runAsUser: 0, capabilities ALL/SYS_ADMIN,
|
|
1132
|
+
/var/run/docker.sock bind-mount, ADD with remote URL.
|
|
1133
|
+
- **FR-LOGIC-1 + FR-LOGIC-2 + FR-LOGIC-7** (`posture/business-logic.js`).
|
|
1134
|
+
AuthZ matrix construction (per-resource consistency check + IDOR
|
|
1135
|
+
detection on mutation routes with :id but no ownership/role check),
|
|
1136
|
+
state-machine extraction (catches writes outside the declared status
|
|
1137
|
+
set), and negative-test-gap detection (auth route + happy-path test +
|
|
1138
|
+
no 401/403 assertion = miss).
|
|
1139
|
+
- **FR-LOGIC-6** (`posture/flow-narration.js`). Per high-severity finding,
|
|
1140
|
+
emit a one-paragraph attacker→impact→cost narrative. Template fallback
|
|
1141
|
+
for 10 CWE families; opt-in LLM mode via
|
|
1142
|
+
`AGENTIC_SECURITY_FLOW_NARRATION_LLM=1`.
|
|
1143
|
+
- **FR-LEARN-6** (`posture/rule-synthesis.js`, `agentic-security rule-synth`).
|
|
1144
|
+
Read triage-feedback.json, cluster FP verdicts by family + dir prefix,
|
|
1145
|
+
propose a YAML suppression rule when ≥ 5 verdicts cluster. Proposes —
|
|
1146
|
+
doesn't activate.
|
|
1147
|
+
- **FR-SDLC-5** (`report/index.js::toSTIX`). `--format stix` emits a STIX
|
|
1148
|
+
2.1 bundle with one Vulnerability + Indicator + Relationship SDO per
|
|
1149
|
+
finding. CWE external_references; x_* custom properties for severity,
|
|
1150
|
+
calibrated confidence, exploitability, verifier verdict.
|
|
1151
|
+
- **FR-SDLC-9** (`posture/policy-gate.js`, `--policy <file.rego>`).
|
|
1152
|
+
Policy-as-code gate. External OPA binary preferred; embedded mini-DSL
|
|
1153
|
+
evaluator for the common case. Supports == != > < >= != comparisons
|
|
1154
|
+
on `finding.<field>` and `sprintf("...", [args])` for messages.
|
|
1155
|
+
|
|
1156
|
+
### Deferred (Phase 6+ research)
|
|
1157
|
+
|
|
1158
|
+
- FR-SEM-2 k=2 calling-context — requires dataflow engine refactor
|
|
1159
|
+
- FR-SEM-5 narrow symbolic execution — needs KLEE-style backend
|
|
1160
|
+
- FR-SEM-6 hybrid static+dynamic — needs customer app instrumentation
|
|
1161
|
+
- FR-VER-5 eBPF/dtrace live instrumentation — Linux/macOS only, opt-in
|
|
1162
|
+
- FR-LOGIC-5 intent inference — LLM-based; pending prompt-injection-safe design
|
|
1163
|
+
|
|
1164
|
+
### Tests, bench, integrity
|
|
1165
|
+
|
|
1166
|
+
- 295 + 26 + 2 unit tests pass (was 240 before this release).
|
|
1167
|
+
- Synthetic-bench F1 = 100% (baseline updated; new IDOR expected entry added
|
|
1168
|
+
for orm-raw-sql:15 — AuthZ-matrix detector finds a genuine missing
|
|
1169
|
+
ownership check that wasn't previously caught).
|
|
1170
|
+
- Polyglot bench F1 = 100% (was 72.7%; Python SAST coverage closed G3 gap).
|
|
1171
|
+
- No dead exports.
|
|
1172
|
+
|
|
1173
|
+
### Honesty correction
|
|
1174
|
+
|
|
1175
|
+
The PRD v2 said all 16 missing features. This release ships 11; 5 are
|
|
1176
|
+
honestly deferred. The PRD-v3 update (next session) should reflect this
|
|
1177
|
+
delivery state.
|
|
1178
|
+
|
|
1179
|
+
## 0.50.0 — next-gen SAST Phase 1 complete (5 of 5 units)
|
|
1180
|
+
|
|
1181
|
+
Closes Phase 1 of `docs/PRD-next-gen-sast-phase1.md`. The two units queued
|
|
1182
|
+
from v0.49.0 (P1.2 verifier sandbox, P1.4 polyglot bench) are now wired.
|
|
1183
|
+
|
|
1184
|
+
### Shipped & wired
|
|
1185
|
+
|
|
1186
|
+
- **P1.2 — Verifier sandbox loop (FR-VER-3, FR-VER-6, FR-VER-7).** New
|
|
1187
|
+
module `scanner/src/posture/verifier.js`. Consumes the `f.poc` artifacts
|
|
1188
|
+
from P1.1 and assigns a per-finding `verifier_verdict`:
|
|
1189
|
+
- `verified-exploit` — PoC ran against a live target and exited 0
|
|
1190
|
+
- `verified-by-llm` — Layer-3 LLM accepted the finding
|
|
1191
|
+
- `verified-sanitizer-absence` — pattern-based proof that no sanitizer
|
|
1192
|
+
appears in a ±10 line window around the sink (9 vuln families covered)
|
|
1193
|
+
- `unverified-by-design` — CWE family where v1 explicitly doesn't ship a PoC
|
|
1194
|
+
- `cannot-verify` — sandbox error, missing target, PoC validation failed
|
|
1195
|
+
|
|
1196
|
+
PoC static validation refuses destructive shell payloads, hardcoded cloud
|
|
1197
|
+
metadata IPs, runaway-length code, and Node PoCs without a deterministic
|
|
1198
|
+
`process.exit(...)`. Sandbox execution mode (opt-in via
|
|
1199
|
+
`AGENTIC_SECURITY_VERIFY_LIVE=1` + `AGENTIC_SECURITY_VERIFY_TARGET=<url>`)
|
|
1200
|
+
runs each PoC under Docker with `--cap-drop=ALL --memory=256m --read-only
|
|
1201
|
+
--user=nobody`; falls back to subprocess with `ulimit` when Docker isn't
|
|
1202
|
+
available. Fail-closed: any error → `cannot-verify`, never silent drop.
|
|
1203
|
+
New CLI subcommand `agentic-security verify [--finding <id>] [--live
|
|
1204
|
+
--target <url>]` re-runs the verifier loop on `last-scan.json` and
|
|
1205
|
+
persists the verdicts. Smoke on `vulnerable-js` fixture: 7 findings get
|
|
1206
|
+
`verified-sanitizer-absence` static proofs; 2 get `unverified-by-design`;
|
|
1207
|
+
the rest are `cannot-verify` pending live execution.
|
|
1208
|
+
|
|
1209
|
+
- **P1.4 — Cross-language polyglot benchmark (G3).** New `bench/polyglot/`
|
|
1210
|
+
with a tiny dependency-free YAML parser, the runner `runner.mjs`, and 4
|
|
1211
|
+
starter cases:
|
|
1212
|
+
- 01 HTTP→Python SQL (canonical Phase-2 detector gap — Python SAST)
|
|
1213
|
+
- 02 Queue→Python cmd (same gap; queue chain detected; sink not yet)
|
|
1214
|
+
- 03 ORM round-trip (Node-only; mass-assignment + data-exposure TPs)
|
|
1215
|
+
- 04 HTTP→Node SQL (clean end-to-end test of the OpenAPI cross-asset bridge)
|
|
1216
|
+
|
|
1217
|
+
Default mode `recall-only` measures "does the chain fire where it
|
|
1218
|
+
should?" rather than penalizing incidental findings (header-hardening,
|
|
1219
|
+
CSRF on test routes, body-parser DoS warnings). Set `mode: strict` in a
|
|
1220
|
+
manifest for full-precision scoring. Current overall F1 = 72.7%; PRD G3
|
|
1221
|
+
target is 85%; the 27pp gap is Python-side detector coverage (Phase 2).
|
|
1222
|
+
New `npm run bench:polyglot`.
|
|
1223
|
+
|
|
1224
|
+
### Tests, bench, integrity
|
|
1225
|
+
|
|
1226
|
+
- 19 new tests in `test/verifier.test.js` (validation, sanitizer proofs,
|
|
1227
|
+
verdict assignment, batch annotation, fail-closed defense-in-depth).
|
|
1228
|
+
- All 218 + 26 + 2 unit tests pass.
|
|
1229
|
+
- Synthetic-bench F1 still 100%.
|
|
1230
|
+
- Polyglot bench F1 72.7% (above 30% v1 floor; below 85% G3 target — the
|
|
1231
|
+
gap is documented in `bench/polyglot/README.md`).
|
|
1232
|
+
- No new dead exports.
|
|
1233
|
+
|
|
1234
|
+
### Honesty correction
|
|
1235
|
+
|
|
1236
|
+
The PRD's G2 target ("≥80% of high+/critical findings ship with a verified
|
|
1237
|
+
PoC") is not measured yet — that requires a labeled run-against-target,
|
|
1238
|
+
which the v1 verifier supports via `--live --target` but we haven't built
|
|
1239
|
+
a target harness. v1 ships the framework; the labeled measurement is
|
|
1240
|
+
Phase 5 work.
|
|
1241
|
+
|
|
1242
|
+
## 0.49.0 — next-gen SAST Phase 1 (3 of 5 units)
|
|
1243
|
+
|
|
1244
|
+
Implements 3 of the 5 Phase-1 shippable units from
|
|
1245
|
+
`docs/PRD-next-gen-sast-phase1.md` (parent `docs/PRD-next-gen-sast.md`).
|
|
1246
|
+
The two queued for the next session are noted at the end.
|
|
1247
|
+
|
|
1248
|
+
### Shipped & wired
|
|
1249
|
+
|
|
1250
|
+
- **P1.1 — PoC generator framework (FR-VER-2).** New module
|
|
1251
|
+
`scanner/src/posture/poc-generator.js` ships runnable proof-of-concept
|
|
1252
|
+
files for the top-10 CWE families from the parent PRD: SQL injection,
|
|
1253
|
+
command injection, XSS, path traversal, SSRF, code injection, CSRF, open
|
|
1254
|
+
redirect, XXE, and insecure deserialization. Each PoC is a self-contained
|
|
1255
|
+
Node script with one `fetch()` call, evidence-pattern detection, and a
|
|
1256
|
+
deterministic exit code (0 = exploit demonstrated, 1 = not demonstrated, 2
|
|
1257
|
+
= error). Templates respect a safety policy: no destructive shell commands,
|
|
1258
|
+
no real cloud-metadata IPs, no outbound network beyond localhost. Smoke:
|
|
1259
|
+
scanning `test/fixtures/vulnerable-js` produces 8 PoCs across 6 distinct
|
|
1260
|
+
CWE families. Findings get a new `f.poc = { lang, kind, cwe, family, runHint, code }`
|
|
1261
|
+
field surfaced in normalizeFindings and SARIF. Families without v1 template
|
|
1262
|
+
coverage get `f.poc = null` and a documented entry in
|
|
1263
|
+
`poc-cwe-map.js::NO_POC_FAMILIES`.
|
|
1264
|
+
- **P1.3 — Brier-calibrated confidence (FR-UX-1, FR-UX-2).** New module
|
|
1265
|
+
`scanner/src/posture/calibration.js` turns the ordinal `confidence` score
|
|
1266
|
+
into a calibrated probability with 95% Wilson confidence interval. Per
|
|
1267
|
+
finding: `calibrated_confidence`, `calibrated_confidence_ci`,
|
|
1268
|
+
`calibrated_n`, `calibration_reason` (set when null — "insufficient-samples"
|
|
1269
|
+
/ "no-family" / "no-history"). Seed corpus in
|
|
1270
|
+
`calibration-seed.json` covers 20 vuln families from the OWASP Benchmark +
|
|
1271
|
+
Juliet labeled runs; the customer's `.agentic-security/validator-metrics.json`
|
|
1272
|
+
overrides per-family when sample count is higher. Calibration is honest
|
|
1273
|
+
about uncertainty: `MIN_SAMPLES_FOR_CALIBRATION = 30`. The PRD G1 target
|
|
1274
|
+
(Brier ≤ 0.10 on a held-out labeled set) is queued for Phase 5; this ships
|
|
1275
|
+
the framework, the math, and the seed data.
|
|
1276
|
+
- **P1.5 — Cross-language message queues (FR-XSAT-4).** New module
|
|
1277
|
+
`scanner/src/posture/cross-lang-queues.js` indexes producer and consumer
|
|
1278
|
+
call sites for Kafka (kafkajs, kafka-clients, confluent-kafka), AWS SQS
|
|
1279
|
+
(aws-sdk, boto3), RabbitMQ (amqplib, pika, Spring `RabbitTemplate`), Redis
|
|
1280
|
+
Streams (XADD / XREAD across Node, Python, Go), and Google Pub/Sub. When
|
|
1281
|
+
producer and consumer agree on a topic name and the consumer file has a
|
|
1282
|
+
high+ finding, we emit a `cross_language: true` chain back to the producer
|
|
1283
|
+
(and vice-versa). Severity is demoted one tier so the chain doesn't double-
|
|
1284
|
+
count in severity bucketing. Honest about uncertainty: only literal-string
|
|
1285
|
+
topic matches; constant-folded names left for Phase 2.
|
|
1286
|
+
|
|
1287
|
+
### Tests, bench, integrity
|
|
1288
|
+
|
|
1289
|
+
- 14 new tests in `test/poc-generator.test.js` (PoC coverage + safety).
|
|
1290
|
+
- 9 new tests in `test/cross-lang-queues.test.js`.
|
|
1291
|
+
- 14 new tests in `test/calibration.test.js` (Wilson + Brier + annotation).
|
|
1292
|
+
- All 199 + 26 + 2 unit tests pass.
|
|
1293
|
+
- Synthetic-bench F1 still 100%.
|
|
1294
|
+
- No new dead exports; `test/no-dead-modules.test.js` both subtests pass.
|
|
1295
|
+
|
|
1296
|
+
### Queued for next session
|
|
1297
|
+
|
|
1298
|
+
- **P1.2 — Verifier sandbox loop (FR-VER-3, FR-VER-6, FR-VER-7).** Needs
|
|
1299
|
+
Docker integration, network isolation, and a sandbox-escape test. The PoC
|
|
1300
|
+
generator already produces files; the verifier executes them in isolation.
|
|
1301
|
+
- **P1.4 — Cross-language polyglot benchmark (G3).** Needs fixture builds
|
|
1302
|
+
across Node → Python → Java → Postgres. Measures the cross-asset claims
|
|
1303
|
+
we've now made for HTTP/gRPC/GraphQL/ORM/IaC/Queues.
|
|
1304
|
+
|
|
1305
|
+
### Honesty correction
|
|
1306
|
+
|
|
1307
|
+
The parent PRD claimed v1.0.0 ships at ~15 months. This release is one
|
|
1308
|
+
session of work; we're at ~v0.49.0 on a path to v0.50.0 (Phase-1 release).
|
|
1309
|
+
The PRD's G1 (Brier ≤ 0.10 on a held-out set) is not yet measured — the
|
|
1310
|
+
shipped calibration is on the SEED corpus, which is by definition not held
|
|
1311
|
+
out. We surface this in the `_caveat` field of `calibration-seed.json`.
|
|
1312
|
+
|
|
1313
|
+
## 0.48.0 — fourth-round premortem + CI bench failure
|
|
1314
|
+
|
|
1315
|
+
### Bench regression fix
|
|
1316
|
+
|
|
1317
|
+
The synthetic-bench CI job started failing at v0.47.0. Two issues:
|
|
1318
|
+
|
|
1319
|
+
- **Root-cause clustering over-merged across detectors.** Two distinct
|
|
1320
|
+
detectors (structural `Open Redirect` and `host-header`) that share CWE-601
|
|
1321
|
+
on the same `res.redirect(...)` line were collapsing into one finding,
|
|
1322
|
+
hiding the host-header bug. `sinkKey` now includes `f.parser` so two
|
|
1323
|
+
detectors never merge. Empty `sinkExpr` keys are skipped (was bucketing all
|
|
1324
|
+
rate-limit findings into one).
|
|
1325
|
+
- **Two expected entries pointed at the same post-clustered line.** Cleaned
|
|
1326
|
+
up `expected.json` for `orm-raw-sql` and added six new `csrf` family
|
|
1327
|
+
expected entries for fixtures that legitimately lack CSRF protection.
|
|
1328
|
+
Baseline refreshed.
|
|
1329
|
+
|
|
1330
|
+
### Node 20 deprecation
|
|
1331
|
+
|
|
1332
|
+
Bumped `actions/{checkout,setup-node,upload-artifact}` to v5 and
|
|
1333
|
+
`actions/github-script` to v8 (Node 24 native). Dropped the
|
|
1334
|
+
`FORCE_JAVASCRIPT_ACTIONS_TO_NODE24` workaround env.
|
|
1335
|
+
|
|
1336
|
+
### Fourth-round premortem — 15 findings closed
|
|
1337
|
+
|
|
1338
|
+
- **4R-1**: rule-pack signing is fail-closed in CI. When `CI=true` (and the
|
|
1339
|
+
common variants) and no signing keys are configured, pass-through mode
|
|
1340
|
+
refuses rather than silently accepting. Opt-in via
|
|
1341
|
+
`AGENTIC_SECURITY_ALLOW_PASSTHROUGH_IN_CI=1`.
|
|
1342
|
+
- **4R-2**: `scanner/dist/agentic-security.mjs` is now correctly tracked in
|
|
1343
|
+
`.gitignore`. The previous "Not committed" comment lied — the bundle was
|
|
1344
|
+
always committed, the comment was wrong. Now `dist/*` is ignored except
|
|
1345
|
+
`agentic-security.mjs` and `agentic-security.mjs.sha256`.
|
|
1346
|
+
- **4R-3**: `scan.yml` downloads the bundle with checksum verification. New
|
|
1347
|
+
`scanner-ref` workflow input lets callers pin to a release tag or commit SHA
|
|
1348
|
+
for supply-chain hardening. `scanner/dist/agentic-security.mjs.sha256` is
|
|
1349
|
+
generated by `npm run build` and committed.
|
|
1350
|
+
- **4R-4**: catalog `filterByProvenance` memoizes per (entries, mode) so the
|
|
1351
|
+
taint hot path no longer allocates a fresh array per match.
|
|
1352
|
+
- **4R-5**: LSP `_depCache` is granularly invalidated on manifest save — only
|
|
1353
|
+
the saved file's entry is refreshed, not the whole project tree.
|
|
1354
|
+
- **4R-6**: `no-dead-modules.test.js` has a sister "allowlist decay" check.
|
|
1355
|
+
Stale ALLOWLIST entries (25 of them, from v0.47.0) were removed.
|
|
1356
|
+
- **4R-7**: `version.js` warns to stderr when `package.json` can't be read
|
|
1357
|
+
instead of silently falling back to `'unknown'`.
|
|
1358
|
+
- **4R-8**: `applyFix` accepts `stableId` from the caller (`bin/` and `mcp/`)
|
|
1359
|
+
rather than re-deriving via `findingId`, which rotates on line-shift.
|
|
1360
|
+
- **4R-9**: fix-history stale-lock reap is PID-aware. Only unlinks when the
|
|
1361
|
+
PID is dead OR the file's old AND the PID is unkillable. Atomic re-read of
|
|
1362
|
+
the lockfile before unlink avoids racing a fresh acquirer.
|
|
1363
|
+
- **4R-10**: SARIF emits a tri-state `signatureStatus: 'verified' | 'unsigned'
|
|
1364
|
+
| 'pass-through'` field. The legacy `_unsigned` / `_passThroughSigning`
|
|
1365
|
+
flags are emitted alongside for one release of grace.
|
|
1366
|
+
- **4R-11**: CLI and Markdown reports now render `validator_verdict` so SCA
|
|
1367
|
+
findings tagged `not-applicable` aren't invisible to the reader.
|
|
1368
|
+
- **4R-12**: custom-rules deadline is per-scanRoot, accumulating across calls
|
|
1369
|
+
within a process. New `resetCustomRulesBudget(scanRoot)` for long-lived LSP
|
|
1370
|
+
scans; wired into the LSP server.
|
|
1371
|
+
- **4R-13**: `prepublishOnly` refuses to overwrite a locally-edited
|
|
1372
|
+
`scanner/CHANGELOG.md` that differs from the canonical `../CHANGELOG.md`.
|
|
1373
|
+
- **4R-14**: new `scripts/nist-compliance/test_regex_redos.py` asserts every
|
|
1374
|
+
import regex runs in linear time on pathological input — guards against
|
|
1375
|
+
re-introducing the `(?:[^)]|\n)+?` ReDoS fixed in `e0c669b`.
|
|
1376
|
+
- **4R-15**: `PROMPT_VERSION` is now a public export of `llm-validator/index.js`.
|
|
1377
|
+
The `validator-cache gc` subcommand no longer reaches through the
|
|
1378
|
+
underscore-prefixed `_internal` private API and fails loudly if the version
|
|
1379
|
+
can't be read.
|
|
1380
|
+
|
|
1381
|
+
### Honesty note
|
|
1382
|
+
|
|
1383
|
+
All 15 fourth-round findings are closed without dead code (verified by the
|
|
1384
|
+
no-dead-modules test). The bench failure was a real regression introduced
|
|
1385
|
+
in v0.47.0 (clustering by CWE alone) — caught by CI, fixed by adding
|
|
1386
|
+
`f.parser` to the cluster key.
|
|
1387
|
+
|
|
1388
|
+
## 0.47.0 — third-round premortem remediation
|
|
1389
|
+
|
|
1390
|
+
Third adversarial premortem identified 17 findings against the v0.46.0
|
|
1391
|
+
remediation. All 17 are now closed. Highlights:
|
|
1392
|
+
|
|
1393
|
+
- **3R-1: integration test for dead exports** — new `test/no-dead-modules.test.js`
|
|
1394
|
+
walks `scanner/src/{posture,llm-validator,dataflow,lsp,ir,mcp}` and asserts
|
|
1395
|
+
every exported symbol has at least one external call site (`.js` files and
|
|
1396
|
+
`commands/*.md`). Allowlist for legitimate library-style exports. Closes the
|
|
1397
|
+
recurring "wired in code review, dead in code" failure mode.
|
|
1398
|
+
|
|
1399
|
+
- **3R-2 / 3R-3: single-sourced version** — `scanner/src/posture/version.js`
|
|
1400
|
+
reads `scanner/package.json#version` at module load; SARIF `tool.driver.version`
|
|
1401
|
+
and `CURRENT_RULESET_VERSION` now derive from it instead of independently
|
|
1402
|
+
hardcoded constants that diverged on every release.
|
|
1403
|
+
|
|
1404
|
+
- **3R-4: signing graceful degradation** — `rule-pack-signing.js` operates in a
|
|
1405
|
+
pass-through mode when both bundled and project keys are absent. One audit
|
|
1406
|
+
warning per session; findings carry `_passThroughSigning:true`. Set
|
|
1407
|
+
`AGENTIC_SECURITY_STRICT_SIGNING=1` to disable pass-through.
|
|
1408
|
+
|
|
1409
|
+
- **3R-5: CLI keygen safety rails** — `agentic-security-rule keygen` refuses
|
|
1410
|
+
`--out` paths under `.agentic-security/`; warns on non-TTY stdout without
|
|
1411
|
+
`--out`; writes private-key files mode 0600. `--i-understand-private-keys`
|
|
1412
|
+
to override.
|
|
1413
|
+
|
|
1414
|
+
- **3R-6: provenance surfaced in reports** — `normalizeFindings` carries
|
|
1415
|
+
`_unsigned` and `_passThroughSigning` through; SARIF `result.properties`
|
|
1416
|
+
emits `unsigned:true` / `passThroughSigning:true`; SARIF
|
|
1417
|
+
`invocations[].properties` now includes `rulesetVersion`, `rulesetVersionSource`,
|
|
1418
|
+
and `rulesetVersionMismatch` for trend attribution.
|
|
1419
|
+
|
|
1420
|
+
- **3R-7: requiresReAudit is now load-bearing** — `bench-realworld.js` reads
|
|
1421
|
+
curated expected JSONs' `requiresReAudit:true`, emits a stderr warning per
|
|
1422
|
+
affected corpus, and tags the corpus result with
|
|
1423
|
+
`requiresReAudit:true` so consumers know its F1 is informational.
|
|
1424
|
+
|
|
1425
|
+
- **3R-8: global deadline for custom rules** — `applyCustomRules()` now caps
|
|
1426
|
+
the total scan time across all files and all rules at 30s (overridable via
|
|
1427
|
+
`AGENTIC_SECURITY_CUSTOM_RULES_BUDGET_MS`), guarding against ReDoS sprees
|
|
1428
|
+
across many files even when each individual regex respects its 200ms budget.
|
|
1429
|
+
|
|
1430
|
+
- **3R-9: LSP dep-cache invalidation on manifest save** — saving any
|
|
1431
|
+
`package.json`/`pyproject.toml`/`Cargo.toml`/etc. now invalidates the cached
|
|
1432
|
+
dep snapshot before re-scanning, so freshly added vulnerable packages and
|
|
1433
|
+
removed ones reflect immediately in editor diagnostics.
|
|
1434
|
+
|
|
1435
|
+
- **3R-10: catalog OFFICIAL_ONLY is per-match** — `AGENTIC_SECURITY_CATALOG_OFFICIAL_ONLY=1`
|
|
1436
|
+
is now read per source/sink match instead of once at module load, so CI lanes
|
|
1437
|
+
that toggle strict mode just before invocation are actually honored.
|
|
1438
|
+
|
|
1439
|
+
- **3R-11: validator preflight handles SCA locators** — findings with
|
|
1440
|
+
`parser:'SCA'` or `pkg`/`component`/`purl` set are tagged
|
|
1441
|
+
`validator_verdict:'not-applicable'` rather than `'unvalidated'`, which
|
|
1442
|
+
was misleading for findings that an LLM cannot meaningfully judge.
|
|
1443
|
+
|
|
1444
|
+
- **3R-12: applyFix recover() cross-checks against last-scan.json** — the
|
|
1445
|
+
fix-history log entry records the matching finding's stableId at apply
|
|
1446
|
+
time; `recover()` after a crash now tags promoted entries as
|
|
1447
|
+
`applied-stale` when the finding has vanished from last-scan.json.
|
|
1448
|
+
|
|
1449
|
+
- **3R-13: file lock around log writes** — concurrent `applyFix`, `recover`,
|
|
1450
|
+
and `undo` invocations no longer race the `log.json` write; serialization
|
|
1451
|
+
via `log.lock` with 30s stale-lock reaping and 5s contention timeout.
|
|
1452
|
+
|
|
1453
|
+
- **3R-14: validator-cache GC subcommand** — `agentic-security validator-cache
|
|
1454
|
+
stats|gc [--older-than N] [--dry-run]` prunes `.agentic-security/llm-cache/`
|
|
1455
|
+
by age and prompt-version mismatch.
|
|
1456
|
+
|
|
1457
|
+
- **3R-15: tier cutoffs stable under 2-decimal rounding** — confidence tier
|
|
1458
|
+
(`high|medium|low|very-low`) is now derived from the 2-decimal display value,
|
|
1459
|
+
so a finding reported as "0.75" never lands in two tiers depending on the
|
|
1460
|
+
viewer's rounding.
|
|
1461
|
+
|
|
1462
|
+
- **3R-16: CHANGELOG ships with npm package** — `prepublishOnly` copies
|
|
1463
|
+
CHANGELOG.md into `scanner/`; added to `package.json#files`. The repo-root
|
|
1464
|
+
copy remains canonical; the in-package copy is gitignored.
|
|
1465
|
+
|
|
1466
|
+
- **3R-17: fix-history log compaction** — `agentic-security undo --compact
|
|
1467
|
+
[--retain-days N] [--prune-backups]` archives terminal entries (reverted,
|
|
1468
|
+
failed, applied-stale) older than the retention window into
|
|
1469
|
+
`log-archive-YYYY-MM.json`, optionally pruning their `.bak` files.
|
|
1470
|
+
|
|
1471
|
+
### Honesty correction
|
|
1472
|
+
|
|
1473
|
+
No claims in this release exceeded what shipped. v0.47.0 closes the 17
|
|
1474
|
+
third-round premortem findings against v0.46.0 cleanly; the round-4 premortem
|
|
1475
|
+
will surely find more, and that is fine.
|
|
1476
|
+
|
|
1477
|
+
## 0.46.0 — second-round premortem remediation + honesty correction
|
|
1478
|
+
|
|
1479
|
+
### Honesty correction for v0.45.0
|
|
1480
|
+
|
|
1481
|
+
The v0.45.0 commit message (`3acca6b fix(security): premortem remediation —
|
|
1482
|
+
all 15 findings`) claimed all 15 first-round premortem findings were
|
|
1483
|
+
remediated. A second-round adversarial premortem identified five of those
|
|
1484
|
+
"closures" as dead code or wire-up regressions:
|
|
1485
|
+
|
|
1486
|
+
- `posture/fix-history.js::recover()` was exported but never called from
|
|
1487
|
+
any startup path → pending entries from a crashed `applyFix` accumulated
|
|
1488
|
+
forever. **Now fixed**: wired into `runScan.js` at top of every scan.
|
|
1489
|
+
|
|
1490
|
+
- `posture/ruleset-version.js::stampScan()` / `effectiveVersion()` were
|
|
1491
|
+
exported but never imported → ruleset-pinning was documentation only.
|
|
1492
|
+
**Now fixed**: wired into `runScan.js` to stamp every scan result.
|
|
1493
|
+
|
|
1494
|
+
- `posture/validator-metrics.js::recordTriage()` was exported but the
|
|
1495
|
+
`/triage` slash command did not invoke it → per-CWE production metrics
|
|
1496
|
+
never accumulated. **Now fixed**: `/triage` now calls `recordTriage` on
|
|
1497
|
+
every verdict (subject to the new symmetric learn gate).
|
|
1498
|
+
|
|
1499
|
+
- The custom-rules pipeline tagged unsigned RULES with `_unsigned: true`
|
|
1500
|
+
but the per-finding emitter (`toFinding`) did not copy the marker →
|
|
1501
|
+
the audit chain promised by the warning log did not exist in the data.
|
|
1502
|
+
**Now fixed**: findings now carry `_unsigned: true` when their rule does.
|
|
1503
|
+
|
|
1504
|
+
- `engine.js:6941` called the LLM validator with `concurrency: 4`,
|
|
1505
|
+
overriding the validator's `concurrency: 1` determinism default →
|
|
1506
|
+
cache-cold runs produced non-deterministic SARIF in the same commit
|
|
1507
|
+
that promised determinism. **Now fixed**: respects `AGENTIC_SECURITY_LLM_CONCURRENCY` env (default 1).
|
|
1508
|
+
|
|
1509
|
+
### Other second-round fixes
|
|
1510
|
+
|
|
1511
|
+
- **String-aware JSON parser** in the LLM validator. Previous
|
|
1512
|
+
`parseLastJsonObject` ignored string-state and could be fooled by braces
|
|
1513
|
+
inside JSON string literals. Rewritten to walk forward with full string-
|
|
1514
|
+
and escape-state tracking, then return the LAST valid candidate.
|
|
1515
|
+
|
|
1516
|
+
- **Empty file/line pre-flight** in `validateOne`. A validator response of
|
|
1517
|
+
`{"file":"","line":0,...}` trivially satisfied the cross-check on findings
|
|
1518
|
+
without precise location. Now refused with `unvalidated`.
|
|
1519
|
+
|
|
1520
|
+
- **Protected signing trust root**: trusted keys come from a built-in
|
|
1521
|
+
constant (`BUNDLED_OFFICIAL_KEYS`); project-local `.agentic-security/trusted-keys.json`
|
|
1522
|
+
is refused unless `AGENTIC_SECURITY_ALLOW_PROJECT_KEYS=1` is set
|
|
1523
|
+
(audit-logged). A PR contributor can no longer bootstrap a key into trust.
|
|
1524
|
+
|
|
1525
|
+
- **Key revocation**: trusted-keys.json `crl[]` honored (signature-hash
|
|
1526
|
+
blacklist); `revokedAt` field on each key honored (signatures dated after
|
|
1527
|
+
revocation refused).
|
|
1528
|
+
|
|
1529
|
+
- **`agentic-security-rule` CLI** for `keygen` / `sign` / `verify` with a
|
|
1530
|
+
first-time setup walkthrough and explicit private-key-handling warnings.
|
|
1531
|
+
|
|
1532
|
+
- **Symmetric AGENTIC_SECURITY_LEARN gate**: `/triage` no longer writes
|
|
1533
|
+
verdicts to `triage-feedback.json` without explicit opt-in. Prevents an
|
|
1534
|
+
attacker from poisoning the file in advance of someone flipping the
|
|
1535
|
+
read-side flag.
|
|
1536
|
+
|
|
1537
|
+
- **Worklist deadline check**: deep-mode taint engine honors `deadlineMs`
|
|
1538
|
+
inside `analyzeFunction`'s worklist (every 128 iterations). Pathological
|
|
1539
|
+
CFGs can no longer hold past the global timeout.
|
|
1540
|
+
|
|
1541
|
+
- **LSP loads dep-manifest files**: per-save scan in `lsp/server.js` now
|
|
1542
|
+
pre-walks the project tree once for `package.json` / `pom.xml` / `.proto`
|
|
1543
|
+
/ `.graphql` / `.tf` so SCA + cross-language passes have their inputs.
|
|
1544
|
+
|
|
1545
|
+
- **SARIF notifications for caveats**: `tool.driver.notifications` and
|
|
1546
|
+
`invocations.toolExecutionNotifications` now carry the load-bearing
|
|
1547
|
+
warnings (priority scores are ordinal, OWASP Benchmark numbers are
|
|
1548
|
+
benchmark-tuned). Customer CI ingesters see them without reading docs.
|
|
1549
|
+
|
|
1550
|
+
- **Re-sanitization on cache read**: validator reasoning passes through
|
|
1551
|
+
`sanitizeReasoning` again on cache hit (defense in depth against any
|
|
1552
|
+
future write-path regression).
|
|
1553
|
+
|
|
1554
|
+
- **Provenance + requiresReAudit fields** added to all 25 bootstrapped GT
|
|
1555
|
+
files under `bench/.../expected/`. Machine-readable signal that the
|
|
1556
|
+
bootstrap origin is self-referential.
|
|
1557
|
+
|
|
1558
|
+
### What this commit honestly does NOT close
|
|
1559
|
+
|
|
1560
|
+
- BUNDLED_OFFICIAL_KEYS is empty — a production deployment needs the
|
|
1561
|
+
maintainers to generate a real keypair, distribute the private key
|
|
1562
|
+
offline, and ship the public key. Today's effective behavior is "no
|
|
1563
|
+
official keys, project keys via opt-in."
|
|
1564
|
+
- The CVE-replay corpus is still 1 starter entry (G1 second half remains
|
|
1565
|
+
not delivered).
|
|
1566
|
+
- Real-world Java F1 generalization is still unmeasured.
|
|
1567
|
+
|
|
1568
|
+
## 0.45.0 — first-round premortem remediation
|
|
1569
|
+
|
|
1570
|
+
(See commit 3acca6b. Some closures were dead-code; see honesty correction
|
|
1571
|
+
above.)
|
|
1572
|
+
|
|
1573
|
+
## 0.44.0 — multi-session items: gRPC/GraphQL/ORM cross-lang, IDE plugins
|
|
1574
|
+
|
|
1575
|
+
## 0.43.0 — small engineering items: MCP verify_fix/synthesize_fix,
|
|
1576
|
+
SentQL path predicates, conversation-context hook, fix-plan,
|
|
1577
|
+
per-CWE metrics
|
|
1578
|
+
|
|
1579
|
+
## 0.42.0 — Layer 1 IR + Layer 2 interprocedural taint, F1=0.907 on
|
|
1580
|
+
OWASP Bench v1.2 (blind, strict)
|