agentshield-sdk 13.1.0 → 13.3.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +49 -1
- package/README.md +260 -1143
- package/package.json +2 -2
- package/src/deepmind-defenses.js +468 -0
- package/src/fleet-defense.js +24 -0
- package/src/hitl-guard.js +64 -0
- package/src/main.js +36 -0
- package/src/memory-guard.js +48 -0
- package/src/render-differential.js +608 -0
- package/src/semantic-guard.js +39 -0
- package/src/side-channel-monitor.js +560 -0
- package/src/sybil-detector.js +529 -0
- package/src/trap-defense.js +112 -0
package/CHANGELOG.md
CHANGED
|
@@ -4,9 +4,57 @@ All notable changes to Agent Shield will be documented in this file.
|
|
|
4
4
|
|
|
5
5
|
This project follows [Semantic Versioning](https://semver.org/).
|
|
6
6
|
|
|
7
|
+
## [13.3.0] - 2026-04-06
|
|
8
|
+
|
|
9
|
+
### New SDK Modules
|
|
10
|
+
|
|
11
|
+
- **RenderDifferentialAnalyzer** -- Detects content that renders differently than it reads. Catches visual deception in HTML (CSS display:none, opacity:0, off-screen, font-size:0), Markdown (link mismatch, hidden spans, comment injection), and LaTeX (\phantom, \textcolor{white}, \renewcommand). Includes VisualHasher for measuring raw-vs-rendered divergence.
|
|
12
|
+
- **SybilDetector** -- Detects coordinated fake agents acting in concert. Behavioral similarity scoring, temporal correlation, content similarity (Jaccard), creation burst detection, and voting collusion analysis. Includes AgentIdentityVerifier with challenge-response and shared-secret detection.
|
|
13
|
+
- **SideChannelMonitor** -- Detects data exfiltration via covert channels. DNS exfiltration (high-entropy subdomains, base64 labels), timing-based encoding, response-size encoding, URL parameter exfil. Includes BeaconDetector (C2 beaconing patterns) and EntropyAnalyzer (Shannon entropy).
|
|
14
|
+
|
|
15
|
+
### Improvements
|
|
16
|
+
|
|
17
|
+
- Professional README rewrite: organized by capability instead of version, reduced from 1,348 to ~350 lines
|
|
18
|
+
- All 3 new modules exported via main.js
|
|
19
|
+
- 185 new test assertions (81 render-differential + 49 sybil + 55 side-channel)
|
|
20
|
+
- Total: 3,400+ test assertions across 22 suites
|
|
21
|
+
|
|
22
|
+
## [13.2.0] - 2026-04-06
|
|
23
|
+
|
|
24
|
+
### DeepMind AI Agent Traps -- First-Principles Defense
|
|
25
|
+
|
|
26
|
+
10 new modules built from a 3-persona first-principles analysis (spam filter engineer, immunologist, fire safety inspector) of DeepMind's "AI Agent Traps" paper. Each module addresses a specific gap that existing capabilities cannot cover.
|
|
27
|
+
|
|
28
|
+
#### New Modules
|
|
29
|
+
|
|
30
|
+
- **ContentStructureAnalyzer** (Trap 1) -- Detects structural anomalies (hidden/visible ratio, tag density, formatting overhead) regardless of content keywords. Catches CSS/HTML obfuscation by measuring document SHAPE, not text content.
|
|
31
|
+
- **SourceReputationTracker** (Trap 1) -- Temporal trust scoring with exponential decay. New sources start neutral, earn trust over time, lose trust instantly on threats. Persists to disk.
|
|
32
|
+
- **RetrievalTimeScanner** (Trap 3) -- Scans memory entries at RETRIEVAL time, not just write time. Detects latent memory poisons that are clean individually but malicious when combined with a specific query. No other SDK does this.
|
|
33
|
+
- **FewShotValidator** (Trap 3) -- Scans output portions of few-shot demonstrations in agent context for poisoned action patterns.
|
|
34
|
+
- **SubAgentSpawnGate** (Trap 4) -- Validates child agent system prompts, blocks permission escalation, flags dangerous tools before sub-agent activation.
|
|
35
|
+
- **SelfReferenceMonitor** (Trap 2) -- Detects external content that discusses the model's identity/capabilities (persona hyperstition). Flags identity manipulation through environmental narrative.
|
|
36
|
+
- **InformationAsymmetryDetector** (Trap 2) -- Measures pro-safety vs anti-safety keyword ratio. Flags content with >70% anti-safety framing.
|
|
37
|
+
- **ProvenanceMarker** (Trap 6) -- Prepends visible source provenance to agent output. Humans see "WARNING: influenced by untrusted web content from [source]."
|
|
38
|
+
- **EscalatingScrutinyEngine** (Trap 6) -- Increases scrutiny as approval rate rises. Forces plain-English explanations, 30-second delays, and comprehension checks during high-volume approval periods.
|
|
39
|
+
- **CompositeFragmentAssembler** (Trap 5) -- Pairwise assembly of content fragments from different sources. Detects attack payloads split across multiple agents/documents.
|
|
40
|
+
|
|
41
|
+
#### Also in this release
|
|
42
|
+
|
|
43
|
+
- Deepened all 6 trap categories with JSRenderingDetector, CloakingHeuristicScanner, OpinionShapingDetector, cross-session memory drift, fleet event serialization, and OutputDeceptionScorer
|
|
44
|
+
- 20+ new detector-core patterns for real attack data (output forcing, prompt extraction, conversation format injection, annotation embedding)
|
|
45
|
+
- 35-feature micro-model (10 structural features capturing attack shape)
|
|
46
|
+
- 18 self-training mutation strategies (6 real-world attacker techniques)
|
|
47
|
+
- Safe normalization (leetspeak reversal no longer corrupts "3D", "1080p", "4.2GB")
|
|
48
|
+
- MCPGuard fusion layer (low-confidence micro-model flags demoted to anomaly)
|
|
49
|
+
- MCPGuard.fromPreset() -- 5 presets replace 17 boolean flags
|
|
50
|
+
- State persistence for ContinuousSecurityService
|
|
51
|
+
- 9 separate entry points for tree shaking
|
|
52
|
+
- Real-world benchmark: F1 0.988 on published HackAPrompt/TensorTrust/research data
|
|
53
|
+
- Honest README claims
|
|
54
|
+
|
|
7
55
|
## [13.1.0] - 2026-04-06
|
|
8
56
|
|
|
9
|
-
### Hardening
|
|
57
|
+
### Hardening -- 32-Issue Teardown
|
|
10
58
|
|
|
11
59
|
Systematic teardown of every claim, architecture decision, and module. 24 issues fixed with code, 8 documented as honest limitations.
|
|
12
60
|
|