llm-trust-guard 4.20.0 → 4.20.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +13 -0
- package/README.md +11 -7
- package/package.json +1 -1
package/CHANGELOG.md
CHANGED
|
@@ -5,6 +5,19 @@ All notable changes to `llm-trust-guard` will be documented in this file.
|
|
|
5
5
|
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
|
|
6
6
|
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
|
7
7
|
|
|
8
|
+
## [4.20.1] - 2026-04-24
|
|
9
|
+
|
|
10
|
+
### Changed — Documentation accuracy
|
|
11
|
+
|
|
12
|
+
- **README**: Removed "31 → 34 security guards" inconsistency (was contradicting the All 34 Guards table and `package.json`)
|
|
13
|
+
- **README**: Removed unmeasured "<5ms latency" assertion from intro
|
|
14
|
+
- **README**: Removed unmeasured "~97% on curated benchmarks" framing from "What it catches well"
|
|
15
|
+
- **README**: Qualified the four "100% detection" claims (Policy Puppetry, Role-play, PAP, Multilingual) as "100% on unit tests" with a section preface explaining that these are unit-test rates, not corpus measurements. Broader corpus measurements live in [RESULTS-v4.19.0.md](tests/adversarial/RESULTS-v4.19.0.md)
|
|
16
|
+
- **README**: Added Homoglyph attacks bullet to "What it catches well" (parity with Python README; feature exists in `encoding-detector`, `prompt-leakage-guard`, `multimodal-guard`, `memory-guard`)
|
|
17
|
+
- **README**: Added v4.20.0 MCP Sampling detection note in Measured Performance preface; benchmark numbers apply unchanged because Sampling is orthogonal to the Sanitizer+Encoder pipelines benchmarked
|
|
18
|
+
|
|
19
|
+
No code changes. Same 711 tests pass.
|
|
20
|
+
|
|
8
21
|
## [4.20.0] - 2026-04-24
|
|
9
22
|
|
|
10
23
|
### Added — MCP Sampling Attack Detection (Unit42 + Blueinfy, Feb 2026)
|
package/README.md
CHANGED
|
@@ -3,7 +3,7 @@
|
|
|
3
3
|
[](https://www.npmjs.com/package/llm-trust-guard)
|
|
4
4
|
[](https://opensource.org/licenses/MIT)
|
|
5
5
|
|
|
6
|
-
**
|
|
6
|
+
**34 security guards for LLM-powered and agentic AI applications.** Zero dependencies. Covers OWASP Top 10 for LLMs 2025, OWASP Agentic AI 2026, and MCP Security.
|
|
7
7
|
|
|
8
8
|
Also available as a [Python package on PyPI](https://pypi.org/project/llm-trust-guard/) (`pip install llm-trust-guard`).
|
|
9
9
|
|
|
@@ -13,13 +13,17 @@ Also available as a [Python package on PyPI](https://pypi.org/project/llm-trust-
|
|
|
13
13
|
|
|
14
14
|
This package is your **first line of defense** — like a WAF (Web Application Firewall) for LLM applications. It sits in the orchestration layer and catches known attack patterns before they reach the LLM and after the LLM responds.
|
|
15
15
|
|
|
16
|
-
### What it catches well
|
|
16
|
+
### What it catches well
|
|
17
|
+
|
|
18
|
+
Per-category detection rates below are measured against the package's curated unit-test suite (representative attack samples per category). On broader held-out corpora these rates are typically lower — see [tests/adversarial/RESULTS-v4.19.0.md](tests/adversarial/RESULTS-v4.19.0.md) for measured detection on attack corpora and [Known limitations](#what-it-catches-partially-50-80-detection) below.
|
|
19
|
+
|
|
17
20
|
- Known prompt injection phrases (170+ patterns, 11 languages)
|
|
18
21
|
- Encoding bypass attacks (9 formats: Base64, URL, Unicode, Hex, HTML, ROT13, Octal, Base32, mixed)
|
|
19
|
-
- Policy Puppetry attacks (JSON/INI/XML/YAML-formatted injection) — 100%
|
|
20
|
-
- Role-play/persona attacks (translator trick, academic pretext, emotional manipulation) — 100%
|
|
21
|
-
- PAP/persuasion attacks (authority, urgency, emotional manipulation) — 100%
|
|
22
|
-
- Multilingual injection (10 languages) — 100%
|
|
22
|
+
- Policy Puppetry attacks (JSON/INI/XML/YAML-formatted injection) — 100% on unit tests
|
|
23
|
+
- Role-play/persona attacks (translator trick, academic pretext, emotional manipulation) — 100% on unit tests
|
|
24
|
+
- PAP/persuasion attacks (authority, urgency, emotional manipulation) — 100% on unit tests
|
|
25
|
+
- Multilingual injection (10 languages) — 100% on unit tests
|
|
26
|
+
- Homoglyph attacks (Cyrillic/Greek character substitution) — normalized and detected
|
|
23
27
|
- PII and secret leakage in outputs
|
|
24
28
|
- Tool hallucination, RBAC bypass, multi-tenant violations
|
|
25
29
|
- Tool result poisoning, context window stuffing
|
|
@@ -224,7 +228,7 @@ const output = guard.filterOutput(llmResponse, session.role);
|
|
|
224
228
|
|
|
225
229
|
## Measured Performance
|
|
226
230
|
|
|
227
|
-
v4.19.0 benchmark, 2026-04-23. Full methodology, 95% confidence intervals, hand-adjudication labels, and reproducibility scripts: [tests/adversarial/RESULTS-v4.19.0.md](tests/adversarial/RESULTS-v4.19.0.md).
|
|
231
|
+
v4.19.0 benchmark, 2026-04-23. v4.20.0 added MCP Sampling attack detection (see [CHANGELOG.md](CHANGELOG.md)) — orthogonal to the Sanitizer+Encoder pipelines below, so numbers apply unchanged. Full methodology, 95% confidence intervals, hand-adjudication labels, and reproducibility scripts: [tests/adversarial/RESULTS-v4.19.0.md](tests/adversarial/RESULTS-v4.19.0.md).
|
|
228
232
|
|
|
229
233
|
**Attack detection on prior-published corpora** (Giskard n=35, Compass CTF Chinese n=11): detection rate has not moved from v4.13.5 → v4.19.0 on the Sanitizer+Encoder pipeline — 80.00% and 9.09% respectively, identical to the v4.13.5 numbers. Six releases of pattern additions (v4.14–v4.19) targeted different attack classes (indirect injection, tool-result validation, memory persistence, multi-agent trust) that these direct-text jailbreak corpora do not exercise. Small sample sizes mean "no evidence of improvement," not "proof of no improvement."
|
|
230
234
|
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "llm-trust-guard",
|
|
3
|
-
"version": "4.20.
|
|
3
|
+
"version": "4.20.1",
|
|
4
4
|
"description": "Comprehensive security guards for LLM-powered and agentic AI applications - 34 guards covering OWASP Top 10 for LLMs 2025, Agentic Applications 2026, and MCP Security. All guards accessible via unified TrustGuard facade. Features prompt injection (PAP/persuasion), multi-modal attacks, RAG poisoning with embedding attack detection, memory persistence attacks, code execution sandboxing, multi-agent security (spawn policy, delegation scope, trust transitivity), MCP tool shadowing prevention, system prompt leakage protection, human-agent trust exploitation (ASI09), autonomy escalation (ASI10), state persistence (ASI08), tool chain validation v2 (ASI07/ASI04), circuit breaker, drift detection, and more",
|
|
5
5
|
"main": "dist/index.js",
|
|
6
6
|
"module": "dist/index.mjs",
|