llm-trust-guard 4.20.0 → 4.21.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +99 -0
- package/CONTRIBUTING.md +20 -6
- package/README.md +33 -7
- package/dist/guards/code-execution-guard.d.ts +25 -0
- package/dist/guards/code-execution-guard.js +2 -2
- package/dist/guards/input-sanitizer.js +1 -1
- package/dist/index.d.ts +1 -1
- package/dist/index.mjs +3 -3
- package/package.json +4 -2
package/CHANGELOG.md
CHANGED
|
@@ -5,6 +5,105 @@ All notable changes to `llm-trust-guard` will be documented in this file.
|
|
|
5
5
|
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
|
|
6
6
|
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
|
7
7
|
|
|
8
|
+
## [4.21.2] - 2026-06-12
|
|
9
|
+
|
|
10
|
+
### Docs — document `CodeAnalyzerBackend`; add README-sync gate (G11)
|
|
11
|
+
|
|
12
|
+
- **README**: documented the pluggable `CodeAnalyzerBackend` seam (4.21.0) with an
|
|
13
|
+
acorn example, and noted CommonJS + ESM both work (4.21.1). The README previously
|
|
14
|
+
did not mention the new public API.
|
|
15
|
+
- **Verification (G11)**: a new gate fails the build when `src/index.ts` (public
|
|
16
|
+
exports) changes since the last tag but `README.md` does not — closing the
|
|
17
|
+
docs-drift gap (override `ALLOW_NO_README_UPDATE=1`). See VERIFICATION.md.
|
|
18
|
+
|
|
19
|
+
No code/behavior change.
|
|
20
|
+
|
|
21
|
+
## [4.21.1] - 2026-06-12
|
|
22
|
+
|
|
23
|
+
### Fixed — ESM named exports (`dist/index.mjs`)
|
|
24
|
+
|
|
25
|
+
`import { InputSanitizer } from "llm-trust-guard"` previously failed for **every**
|
|
26
|
+
named export — `dist/index.mjs` was default-only. Cause: `build-esm.js` bundled the
|
|
27
|
+
**compiled CJS** (`dist/index.js`), and esbuild cannot recover named exports from
|
|
28
|
+
tsc's CJS getter output (latent since the initial commit; not a size tradeoff —
|
|
29
|
+
`minify` is orthogonal). CommonJS `require()` was always fine, which is why it went
|
|
30
|
+
unnoticed.
|
|
31
|
+
|
|
32
|
+
- **Fix:** build the `.mjs` from the TS **source** (`src/index.ts`) so `export { … }`
|
|
33
|
+
statements survive. `dist/index.mjs` now has a named-export block (0 → 1) and no
|
|
34
|
+
default-only export.
|
|
35
|
+
- Regression guard added: `tests/esm-build.test.ts`.
|
|
36
|
+
- No API or behavior change; CommonJS unaffected. Verified by `npm pack` → ESM consumer
|
|
37
|
+
smoke (named `import { … }` now resolves) — see `tests/adversarial/RESULTS-v4.21.1.md`.
|
|
38
|
+
|
|
39
|
+
## [4.21.0] - 2026-06-09
|
|
40
|
+
|
|
41
|
+
### Added — Pluggable `CodeAnalyzerBackend` (optional AST analysis, zero-dep default)
|
|
42
|
+
|
|
43
|
+
`CodeExecutionGuard` now accepts an optional `analyzerBackend` — a pluggable
|
|
44
|
+
code-analysis seam (mirroring the existing `DetectionClassifier`). The default stays
|
|
45
|
+
**regex-only / zero-dependency**; provide a backend to add AST-level detection of JS
|
|
46
|
+
sandbox-escape gadgets that regex cannot reliably see.
|
|
47
|
+
|
|
48
|
+
- New exports: `CodeFinding`, `CodeAnalyzerBackend`; new config field `analyzerBackend`
|
|
49
|
+
and `CodeExecutionGuard.setAnalyzerBackend()`. Findings are **additive** (only add
|
|
50
|
+
detections); a throwing backend never crashes the guard.
|
|
51
|
+
- Reference implementation: `examples/acorn-code-analyzer.ts` (acorn). Measured —
|
|
52
|
+
three JS escape gadgets (`this.constructor.constructor('return process')()`,
|
|
53
|
+
`[].constructor.constructor(...)()`, `Function('return process')()`) go **3/3 missed
|
|
54
|
+
by regex → 3/3 blocked** with the backend; benign JS unaffected.
|
|
55
|
+
- 9 new tests (6 zero-dep wiring + 3 acorn). `acorn` added as a **devDependency only** —
|
|
56
|
+
the published package keeps **zero production dependencies**.
|
|
57
|
+
- Why a seam and not a bundled parser: JS has no stdlib parser, so bundling acorn/oxc
|
|
58
|
+
would break the zero-dep guarantee. The Python package uses stdlib `ast` directly
|
|
59
|
+
(v0.10.3). See RESEARCH_LOG.md. Detection only — still no runtime sandbox.
|
|
60
|
+
|
|
61
|
+
## [4.20.2] - 2026-06-06
|
|
62
|
+
|
|
63
|
+
### Added — Benign-context suppression (false-positive reduction)
|
|
64
|
+
|
|
65
|
+
`InputSanitizer` now cancels the soft `ignore_instructions` / `disregard_above`
|
|
66
|
+
triggers when the object is a benign technical noun (e.g. "ignore the
|
|
67
|
+
whitespace", "ignore case", "ignore the previous error") **and** the input
|
|
68
|
+
contains no instruction/rule/prompt/safety noun anywhere, **and** the prompt
|
|
69
|
+
carries no high-signal exfiltration/execution/credential/money token. Any real
|
|
70
|
+
injection ("ignore previous instructions", "disregard your rules") references an
|
|
71
|
+
instruction-noun and is never suppressed.
|
|
72
|
+
|
|
73
|
+
- **Suppression veto**: suppression is refused when the prompt also contains a
|
|
74
|
+
URL, email address, credential/secret word, shell pipe / `rm -rf` / `curl` /
|
|
75
|
+
`wget`, destructive `delete`/`drop`, a money amount (`$NN`), or a long account
|
|
76
|
+
number. This closes the escape hatch where an attacker prefixes a real payload
|
|
77
|
+
with "ignore the previous output …" to cancel the trigger. 10 bypass controls
|
|
78
|
+
added to the probe (all blocked).
|
|
79
|
+
- New curated probe `tests/benign-context.test.ts`: 28 benign coding-context
|
|
80
|
+
prompts (0 blocked) + 12 attack controls + 10 suppression-bypass controls
|
|
81
|
+
(0 leaked).
|
|
82
|
+
- **Recall preserved**: full suite 716 pass (was 711). WildChat-1M shard 0
|
|
83
|
+
(n=10,000, seed 42) Pipeline A block count is **unchanged at 493 (raw FPR
|
|
84
|
+
4.93%)** — that consumer corpus does not exercise the benign technical-object
|
|
85
|
+
class, so the win is scoped to coding/technical deployments and does **not**
|
|
86
|
+
move the published ~2.73% corrected WildChat FPR.
|
|
87
|
+
- Reproducible WildChat measurement committed at
|
|
88
|
+
`tests/adversarial/fixtures/wildchat-sample10k.jsonl` (Git LFS, ODC-BY,
|
|
89
|
+
`allenai/WildChat-1M`).
|
|
90
|
+
- Known pre-existing gap noted (not addressed here): `"disregard your previous
|
|
91
|
+
rules"` is not matched by the `disregard` patterns — a recall issue, separate
|
|
92
|
+
from this FP work.
|
|
93
|
+
|
|
94
|
+
## [4.20.1] - 2026-04-24
|
|
95
|
+
|
|
96
|
+
### Changed — Documentation accuracy
|
|
97
|
+
|
|
98
|
+
- **README**: Removed "31 → 34 security guards" inconsistency (was contradicting the All 34 Guards table and `package.json`)
|
|
99
|
+
- **README**: Removed unmeasured "<5ms latency" assertion from intro
|
|
100
|
+
- **README**: Removed unmeasured "~97% on curated benchmarks" framing from "What it catches well"
|
|
101
|
+
- **README**: Qualified the four "100% detection" claims (Policy Puppetry, Role-play, PAP, Multilingual) as "100% on unit tests" with a section preface explaining that these are unit-test rates, not corpus measurements. Broader corpus measurements live in [RESULTS-v4.19.0.md](tests/adversarial/RESULTS-v4.19.0.md)
|
|
102
|
+
- **README**: Added Homoglyph attacks bullet to "What it catches well" (parity with Python README; feature exists in `encoding-detector`, `prompt-leakage-guard`, `multimodal-guard`, `memory-guard`)
|
|
103
|
+
- **README**: Added v4.20.0 MCP Sampling detection note in Measured Performance preface; benchmark numbers apply unchanged because Sampling is orthogonal to the Sanitizer+Encoder pipelines benchmarked
|
|
104
|
+
|
|
105
|
+
No code changes. Same 711 tests pass.
|
|
106
|
+
|
|
8
107
|
## [4.20.0] - 2026-04-24
|
|
9
108
|
|
|
10
109
|
### Added — MCP Sampling Attack Detection (Unit42 + Blueinfy, Feb 2026)
|
package/CONTRIBUTING.md
CHANGED
|
@@ -219,12 +219,26 @@ When adding new detection patterns:
|
|
|
219
219
|
|
|
220
220
|
#### PR Checklist
|
|
221
221
|
|
|
222
|
-
- [ ]
|
|
223
|
-
- [ ]
|
|
224
|
-
- [ ]
|
|
225
|
-
- [ ]
|
|
226
|
-
- [ ]
|
|
227
|
-
- [ ]
|
|
222
|
+
- [ ] `npm run verify` is green (the eval-gated pipeline — see [VERIFICATION.md](VERIFICATION.md))
|
|
223
|
+
- [ ] New/changed `src/` ships with tests (enforced by gate **G6**)
|
|
224
|
+
- [ ] CHANGELOG.md top entry matches the version (gate **G7**)
|
|
225
|
+
- [ ] `tests/adversarial/RESULTS-v<version>.md` written for any release/claim (gate **G8**)
|
|
226
|
+
- [ ] `RESEARCH_LOG.md` entry added if the change cites a threat/technique/benchmark
|
|
227
|
+
- [ ] No console.log statements (except intentional logging)
|
|
228
|
+
|
|
229
|
+
### Verification (required before push)
|
|
230
|
+
|
|
231
|
+
Run one command — it runs build, the full suite, coverage thresholds, the WildChat
|
|
232
|
+
FP-regression gate, the adversarial-bypass probe, and the changelog/results checks:
|
|
233
|
+
|
|
234
|
+
```bash
|
|
235
|
+
npm run verify
|
|
236
|
+
```
|
|
237
|
+
|
|
238
|
+
This is enforced both locally (`.githooks/pre-push`, install via
|
|
239
|
+
`bash scripts/install-hooks.sh`) and in CI, so it can't be skipped. See
|
|
240
|
+
[VERIFICATION.md](VERIFICATION.md) for the eight gates and how each maps to our
|
|
241
|
+
"don't break it / don't make it up / publish the basis" rules.
|
|
228
242
|
|
|
229
243
|
### Review Process
|
|
230
244
|
|
package/README.md
CHANGED
|
@@ -3,7 +3,7 @@
|
|
|
3
3
|
[](https://www.npmjs.com/package/llm-trust-guard)
|
|
4
4
|
[](https://opensource.org/licenses/MIT)
|
|
5
5
|
|
|
6
|
-
**
|
|
6
|
+
**34 security guards for LLM-powered and agentic AI applications.** Zero dependencies. Covers OWASP Top 10 for LLMs 2025, OWASP Agentic AI 2026, and MCP Security.
|
|
7
7
|
|
|
8
8
|
Also available as a [Python package on PyPI](https://pypi.org/project/llm-trust-guard/) (`pip install llm-trust-guard`).
|
|
9
9
|
|
|
@@ -13,13 +13,17 @@ Also available as a [Python package on PyPI](https://pypi.org/project/llm-trust-
|
|
|
13
13
|
|
|
14
14
|
This package is your **first line of defense** — like a WAF (Web Application Firewall) for LLM applications. It sits in the orchestration layer and catches known attack patterns before they reach the LLM and after the LLM responds.
|
|
15
15
|
|
|
16
|
-
### What it catches well
|
|
16
|
+
### What it catches well
|
|
17
|
+
|
|
18
|
+
Per-category detection rates below are measured against the package's curated unit-test suite (representative attack samples per category). On broader held-out corpora these rates are typically lower — see [tests/adversarial/RESULTS-v4.19.0.md](tests/adversarial/RESULTS-v4.19.0.md) for measured detection on attack corpora and [Known limitations](#what-it-catches-partially-50-80-detection) below.
|
|
19
|
+
|
|
17
20
|
- Known prompt injection phrases (170+ patterns, 11 languages)
|
|
18
21
|
- Encoding bypass attacks (9 formats: Base64, URL, Unicode, Hex, HTML, ROT13, Octal, Base32, mixed)
|
|
19
|
-
- Policy Puppetry attacks (JSON/INI/XML/YAML-formatted injection) — 100%
|
|
20
|
-
- Role-play/persona attacks (translator trick, academic pretext, emotional manipulation) — 100%
|
|
21
|
-
- PAP/persuasion attacks (authority, urgency, emotional manipulation) — 100%
|
|
22
|
-
- Multilingual injection (10 languages) — 100%
|
|
22
|
+
- Policy Puppetry attacks (JSON/INI/XML/YAML-formatted injection) — 100% on unit tests
|
|
23
|
+
- Role-play/persona attacks (translator trick, academic pretext, emotional manipulation) — 100% on unit tests
|
|
24
|
+
- PAP/persuasion attacks (authority, urgency, emotional manipulation) — 100% on unit tests
|
|
25
|
+
- Multilingual injection (10 languages) — 100% on unit tests
|
|
26
|
+
- Homoglyph attacks (Cyrillic/Greek character substitution) — normalized and detected
|
|
23
27
|
- PII and secret leakage in outputs
|
|
24
28
|
- Tool hallucination, RBAC bypass, multi-tenant violations
|
|
25
29
|
- Tool result poisoning, context window stuffing
|
|
@@ -189,6 +193,28 @@ const output = guard.filterOutput(llmResponse, session.role);
|
|
|
189
193
|
|-----------|---------|
|
|
190
194
|
| DetectionClassifier | Plug in any ML backend (sync or async) alongside regex guards |
|
|
191
195
|
| createRegexClassifier() | Built-in regex classifier as a DetectionClassifier callback |
|
|
196
|
+
| CodeAnalyzerBackend | Plug an AST parser (e.g. acorn/oxc) into `CodeExecutionGuard` — catches JS sandbox-escape gadgets regex misses, while the default stays zero-dependency |
|
|
197
|
+
|
|
198
|
+
`CodeExecutionGuard` is regex-only by default (zero dependencies). For AST-level
|
|
199
|
+
detection of gadget chains like `this.constructor.constructor('return process')()`
|
|
200
|
+
or the `Function` constructor, plug in a parser via `analyzerBackend` (findings are
|
|
201
|
+
additive; a throwing backend never crashes the guard):
|
|
202
|
+
|
|
203
|
+
```ts
|
|
204
|
+
import { CodeExecutionGuard, type CodeAnalyzerBackend } from 'llm-trust-guard';
|
|
205
|
+
import { parse } from 'acorn'; // your dependency, not the library's
|
|
206
|
+
|
|
207
|
+
const acornBackend: CodeAnalyzerBackend = (code, language) => {
|
|
208
|
+
if (language !== 'javascript') return [];
|
|
209
|
+
// walk the AST, return [{ name, severity }] for dangerous nodes
|
|
210
|
+
return findGadgets(parse(code, { ecmaVersion: 'latest', sourceType: 'module' }));
|
|
211
|
+
};
|
|
212
|
+
|
|
213
|
+
const guard = new CodeExecutionGuard({ analyzerBackend: acornBackend });
|
|
214
|
+
```
|
|
215
|
+
|
|
216
|
+
See `examples/acorn-code-analyzer.ts` for a complete reference. The Python package
|
|
217
|
+
ships this analysis built in (stdlib `ast`, no backend needed).
|
|
192
218
|
|
|
193
219
|
## OWASP Coverage
|
|
194
220
|
|
|
@@ -224,7 +250,7 @@ const output = guard.filterOutput(llmResponse, session.role);
|
|
|
224
250
|
|
|
225
251
|
## Measured Performance
|
|
226
252
|
|
|
227
|
-
v4.19.0 benchmark, 2026-04-23. Full methodology, 95% confidence intervals, hand-adjudication labels, and reproducibility scripts: [tests/adversarial/RESULTS-v4.19.0.md](tests/adversarial/RESULTS-v4.19.0.md).
|
|
253
|
+
v4.19.0 benchmark, 2026-04-23. v4.20.0 added MCP Sampling attack detection (see [CHANGELOG.md](CHANGELOG.md)) — orthogonal to the Sanitizer+Encoder pipelines below, so numbers apply unchanged. Full methodology, 95% confidence intervals, hand-adjudication labels, and reproducibility scripts: [tests/adversarial/RESULTS-v4.19.0.md](tests/adversarial/RESULTS-v4.19.0.md).
|
|
228
254
|
|
|
229
255
|
**Attack detection on prior-published corpora** (Giskard n=35, Compass CTF Chinese n=11): detection rate has not moved from v4.13.5 → v4.19.0 on the Sanitizer+Encoder pipeline — 80.00% and 9.09% respectively, identical to the v4.13.5 numbers. Six releases of pattern additions (v4.14–v4.19) targeted different attack classes (indirect injection, tool-result validation, memory persistence, multi-agent trust) that these direct-text jailbreak corpora do not exercise. Small sample sizes mean "no evidence of improvement," not "proof of no improvement."
|
|
230
256
|
|
|
@@ -16,6 +16,26 @@
|
|
|
16
16
|
* - Resource limit enforcement
|
|
17
17
|
* - Language-specific security rules
|
|
18
18
|
*/
|
|
19
|
+
/** A single finding from a pluggable code-analysis backend. */
|
|
20
|
+
export interface CodeFinding {
|
|
21
|
+
name: string;
|
|
22
|
+
/** Added to the risk score (0-100 scale). */
|
|
23
|
+
severity: number;
|
|
24
|
+
kind?: string;
|
|
25
|
+
}
|
|
26
|
+
/**
|
|
27
|
+
* Pluggable code-analysis backend (e.g. an AST parser such as acorn or oxc).
|
|
28
|
+
*
|
|
29
|
+
* Default is regex-only (zero dependencies). Provide a backend to add AST-level
|
|
30
|
+
* detection — sandbox-escape gadget chains, the Function constructor, dynamic
|
|
31
|
+
* import — that regex cannot reliably see. Findings are ADDITIVE: a backend can
|
|
32
|
+
* only add detections, never remove them, and a throwing backend never crashes
|
|
33
|
+
* the guard. See `examples/acorn-code-analyzer.ts` for a reference implementation.
|
|
34
|
+
*
|
|
35
|
+
* (The Python package uses stdlib `ast` directly; JS has no stdlib parser, so the
|
|
36
|
+
* npm package keeps regex zero-dep by default and takes any parser via this seam.)
|
|
37
|
+
*/
|
|
38
|
+
export type CodeAnalyzerBackend = (code: string, language: string) => CodeFinding[];
|
|
19
39
|
export interface CodeExecutionGuardConfig {
|
|
20
40
|
/** Allowed programming languages */
|
|
21
41
|
allowedLanguages?: string[];
|
|
@@ -43,6 +63,8 @@ export interface CodeExecutionGuardConfig {
|
|
|
43
63
|
}>;
|
|
44
64
|
/** Risk threshold for blocking (0-100) */
|
|
45
65
|
riskThreshold?: number;
|
|
66
|
+
/** Optional pluggable AST analyzer (acorn/oxc/etc.). Additive on top of regex. */
|
|
67
|
+
analyzerBackend?: CodeAnalyzerBackend;
|
|
46
68
|
}
|
|
47
69
|
export interface CodeAnalysisResult {
|
|
48
70
|
allowed: boolean;
|
|
@@ -76,10 +98,13 @@ export interface SandboxConfig {
|
|
|
76
98
|
}
|
|
77
99
|
export declare class CodeExecutionGuard {
|
|
78
100
|
private config;
|
|
101
|
+
private analyzerBackend?;
|
|
79
102
|
private readonly DANGEROUS_PATTERNS;
|
|
80
103
|
private readonly DEFAULT_BLOCKED_IMPORTS;
|
|
81
104
|
private readonly DEFAULT_BLOCKED_FUNCTIONS;
|
|
82
105
|
constructor(config?: CodeExecutionGuardConfig);
|
|
106
|
+
/** Register/replace the pluggable AST analyzer backend at runtime. */
|
|
107
|
+
setAnalyzerBackend(backend: CodeAnalyzerBackend): void;
|
|
83
108
|
/**
|
|
84
109
|
* Analyze code for dangerous patterns before execution
|
|
85
110
|
*/
|
|
@@ -1,2 +1,2 @@
|
|
|
1
|
-
"use strict";Object.defineProperty(exports,"__esModule",{value:!0}),exports.CodeExecutionGuard=void 0;class CodeExecutionGuard{constructor(e={}){this.DANGEROUS_PATTERNS={javascript:[{name:"eval",pattern:/\beval\s*\(/g,severity:50},{name:"function_constructor",pattern:/new\s+Function\s*\(/g,severity:50},{name:"child_process",pattern:/require\s*\(\s*['"]child_process['"]\s*\)/g,severity:60},{name:"exec",pattern:/\b(exec|execSync|spawn|spawnSync)\s*\(/g,severity:60},{name:"fs_write",pattern:/\b(writeFile|writeFileSync|appendFile|unlink|rmdir)\s*\(/g,severity:45},{name:"process_env",pattern:/process\.env/g,severity:30},{name:"require_dynamic",pattern:/require\s*\(\s*[^'"]/g,severity:40},{name:"vm_module",pattern:/require\s*\(\s*['"]vm['"]\s*\)/g,severity:55},{name:"fetch_external",pattern:/fetch\s*\(\s*['"]https?:\/\/(?!localhost)/g,severity:35},{name:"websocket",pattern:/new\s+WebSocket\s*\(/g,severity:35},{name:"prototype_pollution",pattern:/__proto__|constructor\s*\[|Object\.setPrototypeOf/g,severity:50},{name:"global_access",pattern:/\bglobal\b|\bglobalThis\b/g,severity:35}],python:[{name:"eval",pattern:/\beval\s*\(/g,severity:50},{name:"exec",pattern:/\bexec\s*\(/g,severity:50},{name:"compile",pattern:/\bcompile\s*\(/g,severity:45},{name:"subprocess",pattern:/import\s+subprocess|from\s+subprocess/g,severity:60},{name:"os_system",pattern:/os\.(system|popen|exec)/g,severity:60},{name:"os_module",pattern:/import\s+os|from\s+os\s+import/g,severity:40},{name:"socket",pattern:/import\s+socket|from\s+socket/g,severity:40},{name:"pickle",pattern:/import\s+pickle|pickle\.loads?/g,severity:55},{name:"ctypes",pattern:/import\s+ctypes|from\s+ctypes/g,severity:55},{name:"builtins",pattern:/__builtins__|__import__/g,severity:50},{name:"file_write",pattern:/open\s*\([^)]*['"]w['"]/g,severity:40},{name:"requests",pattern:/requests\.(get|post|put|delete)\s*\(/g,severity:35},{name:"getattr_dynamic",pattern:/getattr\s*\(\s*\w+\s*,\s*[^'"]/g,severity:40}],bash:[{name:"rm_rf",pattern:/rm\s+(-rf?|--recursive)/gi,severity:70},{name:"sudo",pattern:/\bsudo\b/gi,severity:60},{name:"curl_pipe",pattern:/curl\s+.*\|\s*(ba)?sh/gi,severity:70},{name:"wget_execute",pattern:/wget\s+.*&&\s*(ba)?sh/gi,severity:70},{name:"eval",pattern:/\beval\b/gi,severity:50},{name:"env_dump",pattern:/\benv\b|\bprintenv\b/gi,severity:35},{name:"chmod",pattern:/chmod\s+(\+x|777|755)/gi,severity:40},{name:"chown",pattern:/\bchown\b/gi,severity:45},{name:"dd",pattern:/\bdd\s+if=/gi,severity:55},{name:"nc_reverse",pattern:/\bnc\b.*-e/gi,severity:70},{name:"base64_decode",pattern:/base64\s+(-d|--decode)/gi,severity:40},{name:"cron",pattern:/crontab|\/etc\/cron/gi,severity:50}],sql:[{name:"drop_table",pattern:/DROP\s+(TABLE|DATABASE)/gi,severity:70},{name:"delete_all",pattern:/DELETE\s+FROM\s+\w+\s*(;|$)/gi,severity:60},{name:"truncate",pattern:/TRUNCATE\s+TABLE/gi,severity:65},{name:"union_injection",pattern:/UNION\s+(ALL\s+)?SELECT/gi,severity:55},{name:"comment_injection",pattern:/--\s*$/gm,severity:30},{name:"xp_cmdshell",pattern:/xp_cmdshell/gi,severity:70},{name:"into_outfile",pattern:/INTO\s+(OUT|DUMP)FILE/gi,severity:60},{name:"load_file",pattern:/LOAD_FILE\s*\(/gi,severity:55}]},this.DEFAULT_BLOCKED_IMPORTS={javascript:["child_process","cluster","dgram","dns","net","tls","vm","worker_threads","v8","perf_hooks"],python:["subprocess","os","sys","socket","ctypes","pickle","marshal","multiprocessing","threading","_thread"]},this.DEFAULT_BLOCKED_FUNCTIONS=["eval","exec","system","popen","spawn","fork","execv","execve","dlopen","compile"],this.config={allowedLanguages:e.allowedLanguages??["javascript","python","sql"],blockedImports:e.blockedImports??[],blockedFunctions:e.blockedFunctions??this.DEFAULT_BLOCKED_FUNCTIONS,maxCodeLength:e.maxCodeLength??1e4,maxExecutionTime:e.maxExecutionTime??5e3,allowNetwork:e.allowNetwork??!1,allowFileSystem:e.allowFileSystem??!1,allowShell:e.allowShell??!1,allowEnvAccess:e.allowEnvAccess??!1,customPatterns:e.customPatterns??[],riskThreshold:e.riskThreshold??50}}analyze(e,
|
|
2
|
-
`).length;return
|
|
1
|
+
"use strict";Object.defineProperty(exports,"__esModule",{value:!0}),exports.CodeExecutionGuard=void 0;class CodeExecutionGuard{constructor(e={}){this.DANGEROUS_PATTERNS={javascript:[{name:"eval",pattern:/\beval\s*\(/g,severity:50},{name:"function_constructor",pattern:/new\s+Function\s*\(/g,severity:50},{name:"child_process",pattern:/require\s*\(\s*['"]child_process['"]\s*\)/g,severity:60},{name:"exec",pattern:/\b(exec|execSync|spawn|spawnSync)\s*\(/g,severity:60},{name:"fs_write",pattern:/\b(writeFile|writeFileSync|appendFile|unlink|rmdir)\s*\(/g,severity:45},{name:"process_env",pattern:/process\.env/g,severity:30},{name:"require_dynamic",pattern:/require\s*\(\s*[^'"]/g,severity:40},{name:"vm_module",pattern:/require\s*\(\s*['"]vm['"]\s*\)/g,severity:55},{name:"fetch_external",pattern:/fetch\s*\(\s*['"]https?:\/\/(?!localhost)/g,severity:35},{name:"websocket",pattern:/new\s+WebSocket\s*\(/g,severity:35},{name:"prototype_pollution",pattern:/__proto__|constructor\s*\[|Object\.setPrototypeOf/g,severity:50},{name:"global_access",pattern:/\bglobal\b|\bglobalThis\b/g,severity:35}],python:[{name:"eval",pattern:/\beval\s*\(/g,severity:50},{name:"exec",pattern:/\bexec\s*\(/g,severity:50},{name:"compile",pattern:/\bcompile\s*\(/g,severity:45},{name:"subprocess",pattern:/import\s+subprocess|from\s+subprocess/g,severity:60},{name:"os_system",pattern:/os\.(system|popen|exec)/g,severity:60},{name:"os_module",pattern:/import\s+os|from\s+os\s+import/g,severity:40},{name:"socket",pattern:/import\s+socket|from\s+socket/g,severity:40},{name:"pickle",pattern:/import\s+pickle|pickle\.loads?/g,severity:55},{name:"ctypes",pattern:/import\s+ctypes|from\s+ctypes/g,severity:55},{name:"builtins",pattern:/__builtins__|__import__/g,severity:50},{name:"file_write",pattern:/open\s*\([^)]*['"]w['"]/g,severity:40},{name:"requests",pattern:/requests\.(get|post|put|delete)\s*\(/g,severity:35},{name:"getattr_dynamic",pattern:/getattr\s*\(\s*\w+\s*,\s*[^'"]/g,severity:40}],bash:[{name:"rm_rf",pattern:/rm\s+(-rf?|--recursive)/gi,severity:70},{name:"sudo",pattern:/\bsudo\b/gi,severity:60},{name:"curl_pipe",pattern:/curl\s+.*\|\s*(ba)?sh/gi,severity:70},{name:"wget_execute",pattern:/wget\s+.*&&\s*(ba)?sh/gi,severity:70},{name:"eval",pattern:/\beval\b/gi,severity:50},{name:"env_dump",pattern:/\benv\b|\bprintenv\b/gi,severity:35},{name:"chmod",pattern:/chmod\s+(\+x|777|755)/gi,severity:40},{name:"chown",pattern:/\bchown\b/gi,severity:45},{name:"dd",pattern:/\bdd\s+if=/gi,severity:55},{name:"nc_reverse",pattern:/\bnc\b.*-e/gi,severity:70},{name:"base64_decode",pattern:/base64\s+(-d|--decode)/gi,severity:40},{name:"cron",pattern:/crontab|\/etc\/cron/gi,severity:50}],sql:[{name:"drop_table",pattern:/DROP\s+(TABLE|DATABASE)/gi,severity:70},{name:"delete_all",pattern:/DELETE\s+FROM\s+\w+\s*(;|$)/gi,severity:60},{name:"truncate",pattern:/TRUNCATE\s+TABLE/gi,severity:65},{name:"union_injection",pattern:/UNION\s+(ALL\s+)?SELECT/gi,severity:55},{name:"comment_injection",pattern:/--\s*$/gm,severity:30},{name:"xp_cmdshell",pattern:/xp_cmdshell/gi,severity:70},{name:"into_outfile",pattern:/INTO\s+(OUT|DUMP)FILE/gi,severity:60},{name:"load_file",pattern:/LOAD_FILE\s*\(/gi,severity:55}]},this.DEFAULT_BLOCKED_IMPORTS={javascript:["child_process","cluster","dgram","dns","net","tls","vm","worker_threads","v8","perf_hooks"],python:["subprocess","os","sys","socket","ctypes","pickle","marshal","multiprocessing","threading","_thread"]},this.DEFAULT_BLOCKED_FUNCTIONS=["eval","exec","system","popen","spawn","fork","execv","execve","dlopen","compile"],this.config={allowedLanguages:e.allowedLanguages??["javascript","python","sql"],blockedImports:e.blockedImports??[],blockedFunctions:e.blockedFunctions??this.DEFAULT_BLOCKED_FUNCTIONS,maxCodeLength:e.maxCodeLength??1e4,maxExecutionTime:e.maxExecutionTime??5e3,allowNetwork:e.allowNetwork??!1,allowFileSystem:e.allowFileSystem??!1,allowShell:e.allowShell??!1,allowEnvAccess:e.allowEnvAccess??!1,customPatterns:e.customPatterns??[],riskThreshold:e.riskThreshold??50},this.analyzerBackend=e.analyzerBackend}setAnalyzerBackend(e){this.analyzerBackend=e}analyze(e,o,t){const n=t||`code-${Date.now()}`,r=o.toLowerCase(),a=[];let i=0;if(!this.config.allowedLanguages.includes(r))return{allowed:!1,reason:`Language '${o}' is not allowed`,violations:["disallowed_language"],request_id:n,code_analysis:{language:r,length:e.length,dangerous_imports:[],dangerous_functions:[],system_calls:[],network_access:!1,file_access:!1,shell_access:!1,env_access:!1,risk_score:100,complexity_score:0},recommendations:[`Use one of: ${this.config.allowedLanguages.join(", ")}`]};e.length>this.config.maxCodeLength&&(a.push("code_too_long"),i+=20);const l=[...this.DANGEROUS_PATTERNS[r]||[],...this.config.customPatterns],c=[],m=[],u=[];let h=!1,d=!1,g=!1,f=!1;for(const{name:s,pattern:p,severity:v}of l)e.match(p)&&(a.push(`dangerous_pattern_${s}`),i+=v,(s.includes("exec")||s.includes("spawn")||s.includes("system")||s.includes("subprocess"))&&(g=!0,u.push(s)),(s.includes("fs")||s.includes("file")||s.includes("write"))&&(d=!0),(s.includes("fetch")||s.includes("socket")||s.includes("request")||s.includes("websocket"))&&(h=!0),s.includes("env")&&(f=!0),(s.includes("import")||s.includes("require"))&&c.push(s),(s.includes("eval")||s.includes("exec")||s.includes("compile"))&&m.push(s));if(this.analyzerBackend)try{for(const s of this.analyzerBackend(e,r)){const p=`analyzer_${s.name}`;a.includes(p)||(a.push(p),i+=s.severity,m.push(s.name))}}catch{}const b=[...this.config.blockedImports,...this.DEFAULT_BLOCKED_IMPORTS[r]||[]];for(const s of b){const p=[new RegExp(`require\\s*\\(\\s*['"]${s}['"]\\s*\\)`,"g"),new RegExp(`import\\s+.*from\\s+['"]${s}['"]`,"g"),new RegExp(`import\\s+${s}`,"g"),new RegExp(`from\\s+${s}\\s+import`,"g")];for(const v of p)v.test(e)&&(a.push(`blocked_import_${s}`),c.push(s),i+=40)}for(const s of this.config.blockedFunctions)new RegExp(`\\b${s}\\s*\\(`,"g").test(e)&&(a.push(`blocked_function_${s}`),m.push(s),i+=35);h&&!this.config.allowNetwork&&(a.push("network_access_denied"),i+=30),d&&!this.config.allowFileSystem&&(a.push("filesystem_access_denied"),i+=30),g&&!this.config.allowShell&&(a.push("shell_access_denied"),i+=40),f&&!this.config.allowEnvAccess&&(a.push("env_access_denied"),i+=25);const w=this.calculateComplexity(e,r);i=Math.min(100,i);const y=i>=this.config.riskThreshold,_={allowed:!y,reason:y?`Code blocked: ${a.slice(0,3).join(", ")}`:"Code analysis passed",violations:a,request_id:n,code_analysis:{language:r,length:e.length,dangerous_imports:[...new Set(c)],dangerous_functions:[...new Set(m)],system_calls:[...new Set(u)],network_access:h,file_access:d,shell_access:g,env_access:f,risk_score:i,complexity_score:w},recommendations:this.generateRecommendations(a,i)};return y||(_.sandbox_config=this.generateSandboxConfig(h,d,g,f),a.length>0&&(_.sanitized_code=this.sanitizeCode(e,r))),_}validateSyntax(e,o){const t=[];switch(o.toLowerCase()){case"javascript":const r=(e.match(/{/g)||[]).length,a=(e.match(/}/g)||[]).length;r!==a&&t.push("Unbalanced curly braces");const i=(e.match(/\(/g)||[]).length,l=(e.match(/\)/g)||[]).length;i!==l&&t.push("Unbalanced parentheses");break;case"python":const c=(e.match(/'/g)||[]).length,m=(e.match(/"/g)||[]).length,u=(e.match(/'''|"""/g)||[]).length;(c-u*3)%2!==0&&t.push("Unclosed single quotes"),(m-u*3)%2!==0&&t.push("Unclosed double quotes");break;case"sql":(e.match(/'/g)||[]).length%2!==0&&t.push("Unclosed single quotes in SQL");break}return{valid:t.length===0,errors:t}}generateSandboxConfig(e,o,t,n){return{timeout:this.config.maxExecutionTime,memoryLimit:128*1024*1024,allowedSyscalls:this.getAllowedSyscalls(e,o,t),networkPolicy:e&&this.config.allowNetwork?"localhost":"none",filesystemPolicy:o&&this.config.allowFileSystem?"temponly":"none",envVars:n&&this.config.allowEnvAccess?{NODE_ENV:"sandbox",SANDBOX:"true"}:{}}}sanitizeCode(e,o){let t=e;const n=this.DANGEROUS_PATTERNS[o]||[];for(const{pattern:a,severity:i}of n)i>=50&&(t=t.replace(a,"/* BLOCKED */"));const r=[...this.config.blockedImports,...this.DEFAULT_BLOCKED_IMPORTS[o]||[]];for(const a of r){const i=[new RegExp(`require\\s*\\(\\s*['"]${a}['"]\\s*\\)`,"g"),new RegExp(`import\\s+.*from\\s+['"]${a}['"].*`,"gm"),new RegExp(`import\\s+${a}.*`,"gm"),new RegExp(`from\\s+${a}\\s+import.*`,"gm")];for(const l of i)t=t.replace(l,"/* BLOCKED_IMPORT */")}return t}getAllowedLanguages(){return[...this.config.allowedLanguages]}addDangerousPattern(e,o,t,n){this.DANGEROUS_PATTERNS[e]||(this.DANGEROUS_PATTERNS[e]=[]),this.DANGEROUS_PATTERNS[e].push({name:o,pattern:t,severity:n})}calculateComplexity(e,o){let t=0;const r={javascript:/\b(if|else|for|while|switch|try|catch)\b/g,python:/\b(if|elif|else|for|while|try|except|with)\b/g,sql:/\b(CASE|WHEN|IF|WHILE|LOOP)\b/gi}[o];if(r){const c=e.match(r)||[];t+=c.length*5}const i={javascript:/\b(function|=>|\basync\b)/g,python:/\bdef\b|\blambda\b/g,sql:/\bCREATE\s+(FUNCTION|PROCEDURE)\b/gi}[o];if(i){const c=e.match(i)||[];t+=c.length*10}const l=e.split(`
|
|
2
|
+
`).length;return t+=Math.min(l,100),Math.min(100,t)}getAllowedSyscalls(e,o,t){const n=["read","write","exit","brk","mmap","munmap","close"];return e&&this.config.allowNetwork&&n.push("socket","connect","bind","listen","accept"),o&&this.config.allowFileSystem&&n.push("open","stat","fstat","lstat","access"),n}generateRecommendations(e,o){const t=[];return e.some(n=>n.includes("import"))&&t.push("Remove or replace blocked imports with safe alternatives"),e.some(n=>n.includes("eval")||n.includes("exec"))&&t.push("Avoid dynamic code execution - use static alternatives"),e.some(n=>n.includes("network"))&&t.push("Remove network access or use approved endpoints only"),e.some(n=>n.includes("filesystem"))&&t.push("Use temporary directories or remove file operations"),e.some(n=>n.includes("shell"))&&t.push("Shell access is not permitted - use language-native alternatives"),o>=70&&t.push("Code requires significant review before execution"),t.length===0&&t.push("Code passed security analysis"),t}}exports.CodeExecutionGuard=CodeExecutionGuard;
|
|
@@ -1 +1 @@
|
|
|
1
|
-
"use strict";Object.defineProperty(exports,"__esModule",{value:!0}),exports.InputSanitizer=void 0;const DEFAULT_PATTERNS=[{pattern:/ignore\s+(?:all\s+)?(?:previous|prior|above|your|my|the|these)/i,weight:.9,name:"ignore_instructions"},{pattern:/ignore\s+.*instructions/i,weight:.85,name:"ignore_instructions_generic"},{pattern:/disregard\s+(?:all\s+)?(?:the\s+)?(?:previous|prior|above|your)\s+(?:instructions|rules|guidelines|directives)/i,weight:.9,name:"disregard_instructions"},{pattern:/disregard\s+(?:all\s+)?(?:the\s+)?(?:above|previous|prior)/i,weight:.8,name:"disregard_above"},{pattern:/forget\s+(?:everything\s+(?:you\s+were|I)\s+told|all\s+(?:previous|prior)\s+(?:instructions|rules|context))/i,weight:.8,name:"forget_instructions"},{pattern:/do\s+not\s+follow\s+(your|the|any)/i,weight:.85,name:"do_not_follow"},{pattern:/override\s+(your|the|all|any)\s+(instructions|rules|guidelines)/i,weight:.9,name:"override_instructions"},{pattern:/new\s+instructions?:?/i,weight:.8,name:"new_instructions"},{pattern:/stop\s+(being|acting\s+as)/i,weight:.7,name:"stop_being"},{pattern:/you\s+are\s+(?:now|actually|really)\s+(?:a|an|the|my)\s+(?:unrestricted|unfiltered|evil|rogue|uncensored|new|different)/i,weight:.75,name:"role_assignment"},{pattern:/pretend\s+(?:to\s+be|you(?:'re| are)|that)\s+.*(?:no\s+(?:restrictions|rules|limits)|unrestricted|admin|system)/i,weight:.7,name:"role_pretend"},{pattern:/act\s+(as|like)\s+(if\s+you\s+(?:had|have)\s+no|a\s+(?:rogue|evil|unrestricted|unfiltered)|you\s+(?:are|were)\s+(?:free|unrestricted))/i,weight:.65,name:"act_as"},{pattern:/i('m| am)\s+(a|an|the|your)\s*(admin|administrator|developer|owner|creator|manager|supervisor)/i,weight:.85,name:"claim_admin"},{pattern:/from\s+now\s+on,?\s+you\s+(?:are|will|must|should|can)\s+(?:not\s+)?(?:follow|obey|ignore|bypass|act|be\s+(?:a|an|unrestricted))/i,weight:.7,name:"from_now_on"},{pattern:/roleplay\s+as/i,weight:.6,name:"roleplay"},{pattern:/switch\s+(to|into)\s+(a|an)?.*mode/i,weight:.75,name:"switch_mode"},{pattern:/continue\s+as\s+(?:an?\s+)?(?:unrestricted|unfiltered|uncensored|evil|rogue|jailbroken|unlimited)/i,weight:.75,name:"continue_as"},{pattern:/(?:henceforth|henceforward|going\s+forward|for\s+the\s+remainder)\s*,?\s*(?:you|always|never|ignore|bypass|disregard|act|respond|answer|behave)/i,weight:.7,name:"henceforth_steering"},{pattern:/(?:from\s+(?:this\s+point|here)\s+on(?:ward)?)\s*,?\s*(?:you|always|never|ignore|bypass|disregard|act|respond)/i,weight:.7,name:"from_here_on"},{pattern:/(?:for\s+the\s+rest\s+of\s+(?:this|our)\s+(?:conversation|session|chat))\s*,?\s*(?:you|please|always|never|ignore|bypass|act|respond|behave|do\s+not)/i,weight:.7,name:"rest_of_conversation"},{pattern:/what\s+(is|are|was|were)\s+your\s+(system|initial|original|first)/i,weight:.8,name:"system_extraction"},{pattern:/show\s+(me\s+)?(your\s+)?(system\s+)?instructions/i,weight:.75,name:"show_instructions"},{pattern:/reveal\s+(?:your\s+|its\s+|the\s+|my\s+)?(system|prompt|instructions|rules)/i,weight:.85,name:"reveal_prompt"},{pattern:/repeat\s+(your\s+)?(initial|system|original|first)\s+(instructions|prompt)/i,weight:.85,name:"repeat_prompt"},{pattern:/print\s+(your\s+)?(system|initial)/i,weight:.8,name:"print_system"},{pattern:/output\s+(your\s+)?(system|initial)/i,weight:.8,name:"output_system"},{pattern:/tell\s+me\s+(your|the)\s+(system|initial|original)/i,weight:.8,name:"tell_system"},{pattern:/what\s+were\s+you\s+told/i,weight:.7,name:"what_told"},{pattern:/display\s+(your\s+)?prompt/i,weight:.75,name:"display_prompt"},{pattern:/\[system\]/i,weight:.9,name:"fake_system_tag"},{pattern:/\[admin\]/i,weight:.9,name:"fake_admin_tag"},{pattern:/\[instruction\]/i,weight:.85,name:"fake_instruction_tag"},{pattern:/```system/i,weight:.85,name:"code_system_block"},{pattern:/<\/?system>/i,weight:.9,name:"xml_system_tag"},{pattern:/<\/?admin>/i,weight:.9,name:"xml_admin_tag"},{pattern:/###\s*(system|instruction|admin)/i,weight:.85,name:"markdown_system"},{pattern:/---\s*(end|new)\s*(prompt|instructions?|system)/i,weight:.85,name:"delimiter_injection"},{pattern:/\{\{system\}\}/i,weight:.85,name:"template_system"},{pattern:/<%.*system.*%>/i,weight:.85,name:"erb_system"},{pattern:/\bDAN\b\s*(?:mode|prompt|jailbreak|you\s+(?:are|can|will))/i,weight:.9,name:"dan_jailbreak"},{pattern:/developer\s+mode/i,weight:.5,name:"dev_mode"},{pattern:/bypass\s+(your\s+)?(restrictions|filters|safety|security|rules|guidelines)/i,weight:.9,name:"bypass_safety"},{pattern:/jailbreak/i,weight:.95,name:"jailbreak_keyword"},{pattern:/without\s+(?:any\s+)?(?:restrictions|limits|limitations|filters)/i,weight:.45,name:"without_restrictions"},{pattern:/no\s+(?:restrictions|limits|rules|guidelines|filters)/i,weight:.45,name:"no_restrictions"},{pattern:/unrestricted\s+mode/i,weight:.9,name:"unrestricted_mode"},{pattern:/uncensored\s+(?:mode|version|response|output|ai|assistant|you)/i,weight:.8,name:"uncensored"},{pattern:/escape\s+(your\s+)?(confines|boundaries|limits)/i,weight:.85,name:"escape_limits"},{pattern:/break\s+(free|out)\s+(from|of)/i,weight:.7,name:"break_free"},{pattern:/unlock\s+(your\s+)?(full|true|real)/i,weight:.75,name:"unlock_full"},{pattern:/give\s+me\s+(admin|root|full|elevated)\s+(access|privileges?|permissions?)/i,weight:.9,name:"privilege_escalation"},{pattern:/grant\s+(me|yourself)\s+(admin|elevated)/i,weight:.85,name:"grant_privileges"},{pattern:/elevate\s+(my|your)\s+(privileges?|permissions?|access)/i,weight:.85,name:"elevate_privileges"},{pattern:/make\s+me\s+(an?\s+)?(admin|administrator|superuser)/i,weight:.85,name:"make_admin"},{pattern:/(?:base64|decode|eval|exec)\s*\(.*(?:ignore|system|admin|bypass|override)/i,weight:.5,name:"encoding_keywords"},{pattern:/send\s+(this|the|all|my)\s+(data|info|information)\s+to/i,weight:.85,name:"data_exfil"},{pattern:/upload\s+(this|the|all)\s+(to|data)/i,weight:.7,name:"upload_data"},{pattern:/exfiltrate/i,weight:.9,name:"exfiltrate"},{pattern:/how\s+to\s+(hack|exploit|attack|break\s+into)/i,weight:.8,name:"how_to_hack"},{pattern:/write\s+(me\s+)?(a\s+)?(malware|virus|exploit|ransomware)/i,weight:.95,name:"write_malware"},{pattern:/"\s*(?:role|mode|authority)\s*"\s*:\s*"(?:admin|root|system|unrestricted|god)"/i,weight:.9,name:"json_policy_role"},{pattern:/"(?:ignore|override|bypass|disable)\s*(?:safety|rules|instructions?|alignment|guardrails?)"/i,weight:.9,name:"json_policy_override"},{pattern:/"(?:instructions?|system_prompt|behavior|policy)"\s*:\s*"/i,weight:.7,name:"json_policy_instruction"},{pattern:/^\s*\[(?:system|admin|override|policy|config|rules)\]\s*$/im,weight:.9,name:"ini_policy_section"},{pattern:/^\s*(?:mode|role|authority|instructions?|behavior)\s*=\s*.+$/im,weight:.75,name:"ini_policy_assignment"},{pattern:/<\s*(?:policy|config|rules|override|permissions?|instructions?)\s*>/i,weight:.85,name:"xml_policy_tag"},{pattern:/^---\s*\n\s*(?:role|system|policy|override|mode)\s*:/m,weight:.85,name:"yaml_policy_header"},{pattern:/\[\s*syst[3e]m\s*\]|\[\s*4dm[1i]n\s*\]/i,weight:.9,name:"leetspeak_policy_section"},{pattern:/(?:fragment|part|piece|section|segment)\s*[1-9]\s*(?:of\s*\d|:)/i,weight:.8,name:"payload_split_marker"},{pattern:/(?:combine|concatenate|merge|join)\s+(?:the\s+)?(?:above|previous|parts|fragments|steps)/i,weight:.8,name:"payload_combine_instruction"},{pattern:/(?:read|interpret|execute)\s+(?:the\s+)?(?:combined|merged|full|complete)\s+(?:text|instruction|command|message)/i,weight:.85,name:"payload_execute_combined"},{pattern:/(?:respond|reply|start|begin)\s+(?:with|by\s+saying)\s*[:"'].*(?:ignore|bypass|override|unrestricted|admin)/i,weight:.7,name:"output_prefix_steer"},{pattern:/^(?:Sure|Of course|Absolutely)[,!.]\s*(?:I(?:'ll| will)\s+)?(?:ignore|bypass|override|forget|disregard)/i,weight:.85,name:"output_prefix_injection"},{pattern:/^(?:As (?:a|an) (?:unrestricted|unfiltered|uncensored|helpful|admin))/i,weight:.8,name:"output_prefix_role"},{pattern:/(?:system\s+note|system\s+message|system\s+announcement|system\s+update)\s*[:\.]/i,weight:.85,name:"fake_system_note"},{pattern:/instructions?\s*[:=]\s*\{|rules?\s*[:=]\s*\{/i,weight:.75,name:"rules_block_override"},{pattern:/(?:AIM|STAN|OMEGA|APOPHIS|ALPHABREAK|DUDE|KEVIN|SETH|COOPER|NICCOLO|MACHIAVELLI|MAXIMUM|JAILBREAK|ANARCHY|CHAOS)\s*(?:mode|:|\(|is\s+a|,\s+(?:a|an|the))/i,weight:.8,name:"named_jailbreak_persona"},{pattern:/(?:character|persona|entity|being)\s+(?:named|called)\s+\w+.*(?:(?:can|will|must)\s+(?:answer|do|say)\s+anything|no\s+(?:restrictions|rules|limits|boundaries|morals|ethics))/i,weight:.75,name:"persona_no_limits"},{pattern:/(?:hello|hi|hey)\s+(?:chatgpt|gpt|assistant).*(?:(?:you\s+(?:are|will\s+be|can\s+be)\s+(?:a|an|my))|(?:from\s+now\s+on)).*(?:anything|everything|no\s+(?:restrictions|rules|limits|filter))/i,weight:.7,name:"greeting_persona_bypass"},{pattern:/(?:respond|answer|reply)\s+to\s+(?:everything|anything|all\s+(?:questions|requests)).*(?:without|no)\s+(?:restrictions|limits|refus|filter|censor)/i,weight:.8,name:"answer_everything_no_filter"},{pattern:/(?:you\s+(?:are|have)\s+(?:been\s+)?(?:freed|liberated|unchained|unlocked|released)\s+from)/i,weight:.8,name:"freed_from_restrictions"},{pattern:/(?:welcome\s+to|entering|you\s+(?:are|have)\s+entered)\s+(?:the\s+)?(?:unrestricted|free|uncensored|unfiltered|open)\s+(?:zone|mode|world|realm)/i,weight:.8,name:"unrestricted_zone"},{pattern:/(?:from\s+now\s+on|for\s+the\s+rest\s+of\s+this\s+conversation).*(?:no\s+(?:restrictions|rules|limits|filters)|(?:ignore|bypass|override)\s+(?:all|any|your))/i,weight:.75,name:"from_now_unrestricted"},{pattern:/(?:never\s+(?:refuse|decline|say\s+no)|always\s+(?:comply|obey))\s+.*(?:request|question|command|instruction|anything)/i,weight:.7,name:"never_refuse_always_comply"},{pattern:/(?:amoral|unethical|unfiltered|uncensored|unrestricted|without\s+(?:morals|ethics|limits))\s+(?:ai|assistant|chatbot|entity|being)/i,weight:.8,name:"amoral_entity"},{pattern:/HIDDEN_INSTRUCTION|HIDDEN_PROMPT|SECRET_INSTRUCTION/i,weight:.9,name:"hidden_instruction_marker"},{pattern:/<!--\s*(?:system|admin|instruction|important|ignore|override)/i,weight:.85,name:"html_comment_injection"},{pattern:/\[color=white\]|display:\s*none|visibility:\s*hidden|font-size:\s*0/i,weight:.8,name:"hidden_text_css"},{pattern:/send\s+(?:all\s+)?(?:user\s+)?(?:data|conversation|context|info)\s+to\s+(?:https?:\/\/|external)/i,weight:.9,name:"data_exfiltration_instruction"},{pattern:/#(?:ignore|bypass|override|reveal|system|admin|prompt)[_-](?:all|instructions|rules|prompt|safety)/i,weight:.8,name:"url_fragment_injection"},{pattern:/\]\(https?:\/\/[^)]*(?:SYSTEM_PROMPT|SECRET|API_KEY|TOKEN|CREDENTIALS)/i,weight:.85,name:"markdown_exfil_link"},{pattern:/(?:hypothetical|fictional|imagine|pretend|let's\s+(?:say|play|pretend))\s+.*(?:no\s+(?:restrictions|rules|limits)|without\s+(?:restrictions|rules|guidelines))/i,weight:.7,name:"hypothetical_unrestricted"},{pattern:/(?:for\s+)?(?:academic|research|educational|testing)\s+(?:purposes?|reasons?).*(?:demonstrate|show|explain|manipulated|reveal)/i,weight:.7,name:"academic_pretext"},{pattern:/pretend\s+to\s+be\s+(?:my|a)\s+(?:grandmother|grandma|grandpa|grandfather|parent|mother|father)/i,weight:.65,name:"emotional_roleplay"},{pattern:/opposite\s+day|opposite\s+mode|reverse\s+(?:day|mode|rules)/i,weight:.75,name:"opposite_day"},{pattern:/(?:grandm(?:a|other)|parent|relative).*(?:system\s+prompt|instructions|rules)/i,weight:.75,name:"emotional_extraction"},{pattern:/(?:translat|interpret|convert)\w*\s+.*(?:from\s+\w+\s+to\s+(?:action|command|execution|code|shell|bash|sql)|into\s+(?:action|command|execution))/i,weight:.75,name:"translator_action_trick"},{pattern:/(?:dying|death|emergency|life[_\s-]threatening).*(?:bypass|hack|break|access|security)/i,weight:.75,name:"pap_emotional_action"},{pattern:/(?:by\s+definition|logically|therefore|it\s+follows).*(?:must|should|have\s+to)\s+.*(?:bypass|help|ignore|answer|comply)/i,weight:.7,name:"pap_false_logic_compound"},{pattern:/(?:(?:as|i\s+am)\s+(?:the\s+)?(?:CEO|CTO|founder|owner|director|president|head|chief)).*(?:disable|remove|override|turn\s+off|lift|suspend)\s+(?:all\s+)?(?:safety|security|restrictions|filters|guardrails|measures)/i,weight:.75,name:"pap_authority_safety_disable"},{pattern:/(?:i\s+am\s+)?(?:ordering|commanding|directing|instructing)\s+you\s+to\s+(?:immediately\s+)?(?:disable|remove|bypass|override|turn\s+off|lift)\s+(?:all\s+)?(?:safety|security|restrictions)/i,weight:.75,name:"pap_order_disable_safety"},{pattern:/ignora\s+(?:todas?\s+)?(?:las\s+)?instrucciones\s+anteriores/i,weight:.9,name:"ml_es_ignore"},{pattern:/olvida\s+(?:todo|todas?\s+las\s+reglas)/i,weight:.8,name:"ml_es_forget"},{pattern:/ignore[sz]?\s+(?:toutes?\s+)?(?:les\s+)?instructions?\s+pr[eé]c[eé]dentes/i,weight:.9,name:"ml_fr_ignore"},{pattern:/oublie[sz]?\s+(?:tout|toutes?\s+les\s+r[eè]gles)/i,weight:.8,name:"ml_fr_forget"},{pattern:/ignorier(?:e|en)?\s+(?:alle\s+)?(?:vorherigen\s+)?Anweisungen/i,weight:.9,name:"ml_de_ignore"},{pattern:/vergiss\s+(?:alles|alle\s+Regeln)/i,weight:.8,name:"ml_de_forget"},{pattern:/ignore\s+(?:todas?\s+)?(?:as\s+)?instru[cç][oõ]es\s+anteriores/i,weight:.9,name:"ml_pt_ignore"},{pattern:/忽略|无视|不要遵循|忘记之前|忽视所有/i,weight:.9,name:"ml_zh_ignore"},{pattern:/新的指令|新指示|系统提示/i,weight:.8,name:"ml_zh_system"},{pattern:/以前の指示.*無視|すべての指示を忘れ|ルールを無視|指示をすべて無視|システムプロンプト/i,weight:.9,name:"ml_ja_ignore"},{pattern:/이전.*지시.*무시|모든.*규칙.*무시|지침.*잊어|지시사항.*무시|시스템\s*프롬프트/i,weight:.9,name:"ml_ko_ignore"},{pattern:/игнорир(?:уй|уйте)\s+(?:все\s+)?(?:предыдущие\s+)?инструкции/i,weight:.9,name:"ml_ru_ignore"},{pattern:/забудь(?:те)?\s+(?:всё|все\s+правила)/i,weight:.8,name:"ml_ru_forget"},{pattern:/تجاهل\s+(?:جميع\s+)?التعليمات|انسَ\s+كل\s+شيء/i,weight:.9,name:"ml_ar_ignore"},{pattern:/पिछले\s+(?:सभी\s+)?निर्देशों?\s+को\s+(?:अनदेखा|नज़रअंदाज़)/i,weight:.9,name:"ml_hi_ignore"}],PAP_TECHNIQUES=[{pattern:/as\s+(your\s+|the\s+)?(creator|developer|administrator|owner|boss|system\s+administrator)/i,weight:.75,name:"pap_authority_creator",category:"authority"},{pattern:/i('m| am)\s+(the\s+)?(ceo|cto|developer|administrator|admin|owner|system\s+admin)/i,weight:.8,name:"pap_authority_claim",category:"authority"},{pattern:/i\s+am\s+ordering\s+you|direct\s+order|this\s+is\s+an?\s+order/i,weight:.7,name:"pap_authority_command",category:"authority"},{pattern:/by\s+order\s+of|direct\s+order|official\s+instruction/i,weight:.75,name:"pap_authority_order",category:"authority"},{pattern:/(?:openai|anthropic|meta|google)\s+(?:has\s+)?(?:approved|authorized|ordered|instructed|requires?\s+you\s+to)/i,weight:.65,name:"pap_authority_company",category:"authority"},{pattern:/this\s+is\s+(a|an)\s+(official|authorized|sanctioned)\s+(request|test|audit)/i,weight:.75,name:"pap_authority_official",category:"authority"},{pattern:/override\s+(safety|security|protocols?|restrictions?)/i,weight:.8,name:"pap_authority_override",category:"authority"},{pattern:/(?:department\s+of|FBI|CIA|NSA|DHS|FTC|SEC|DOJ|homeland\s+security|federal\s+(?:bureau|agency|commission))\s*.*?(?:directive|order|requires?|mandate|authorized|comply)/i,weight:.75,name:"pap_authority_government",category:"authority"},{pattern:/(?:official\s+)?(?:government|federal|state|regulatory)\s+(?:directive|order|mandate|requirement|investigation)/i,weight:.7,name:"pap_authority_gov_directive",category:"authority"},{pattern:/(?:report|sue|complain)\s+(?:you\s+)?to\s+(?:the\s+)?(?:FTC|SEC|regulators?|authorities|congress|government|board)/i,weight:.75,name:"pap_fear_regulatory",category:"fear"},{pattern:/(?:find|found|exploit|expose|discover)\s+(?:a\s+)?(?:vulnerability|flaw|weakness|bug|exploit)\s+in\s+(?:your|the)\s+(?:system|code|model|platform)/i,weight:.75,name:"pap_fear_technical",category:"fear"},{pattern:/(?:you\s+are|you're)\s+(?:clearly\s+)?(?:too\s+)?(?:smart|intelligent|advanced|sophisticated)\s+(?:to\s+be\s+)?(?:constrained|limited|restricted|bound|held\s+back)/i,weight:.75,name:"pap_flattery_intelligence",category:"liking"},{pattern:/(?:information|knowledge|data)\s+(?:freedom|access|sharing)\s+is\s+(?:a\s+)?(?:fundamental|basic|human)\s+(?:right|freedom)/i,weight:.75,name:"pap_moral_freedom",category:"logical_appeal"},{pattern:/(?:restricting|limiting|censoring)\s+(?:access\s+to\s+)?(?:information|knowledge)\s+is\s+(?:morally\s+)?(?:wrong|unethical|immoral)/i,weight:.75,name:"pap_moral_censorship",category:"logical_appeal"},{pattern:/(?:this\s+is\s+)?(?:very\s+)?urgent/i,weight:.35,name:"pap_scarcity_urgent_mark",category:"scarcity"},{pattern:/(?:this\s+is\s+(?:a|an)\s+)?emergency/i,weight:.35,name:"pap_scarcity_emergency",category:"scarcity"},{pattern:/(urgent|emergency|critical).*?(?:you\s+must|override|bypass|immediately\s+(?:grant|give|reveal|disable))/i,weight:.55,name:"pap_scarcity_urgent",category:"scarcity"},{pattern:/only\s+(chance|opportunity|way)|last\s+(chance|resort)/i,weight:.6,name:"pap_scarcity_only",category:"scarcity"},{pattern:/time\s+(is\s+)?(running\s+out|limited|critical|sensitive)/i,weight:.55,name:"pap_scarcity_time",category:"scarcity"},{pattern:/deadline\s+(is\s+)?in\s+\d+\s+(minute|hour|second)/i,weight:.55,name:"pap_scarcity_deadline",category:"scarcity"},{pattern:/(lives?|people)\s+(are|is|will)\s+(at\s+stake|in\s+danger|depending|die)/i,weight:.7,name:"pap_scarcity_lives",category:"scarcity"},{pattern:/now[!]+/i,weight:.35,name:"pap_scarcity_now",category:"scarcity"},{pattern:/everyone\s+(else\s+)?(does|is\s+doing|knows|says|has|already)/i,weight:.5,name:"pap_social_everyone",category:"social_proof"},{pattern:/other\s+(?:ai\s+)?(?:assistants?|chatbots?|models?)\s+(?:can|do|will|allow|have|already)/i,weight:.65,name:"pap_social_other_ai",category:"social_proof"},{pattern:/(chatgpt|gpt-?4|claude|gemini|llama)\s+(does|allows?|can)/i,weight:.65,name:"pap_social_specific_ai",category:"social_proof"},{pattern:/millions?\s+(of\s+)?(users?|people)\s+(use|do|have|get)/i,weight:.45,name:"pap_social_millions",category:"social_proof"},{pattern:/why\s+can('t|not)\s+you/i,weight:.4,name:"pap_social_why_cant",category:"social_proof"},{pattern:/you\s+should\s+(be\s+)?(equally|as)\s+(capable|good|helpful)/i,weight:.5,name:"pap_social_equally",category:"social_proof"},{pattern:/i('ll| will)\s+(give|pay|reward|tip).*(?:if\s+you|for\s+(?:bypass|ignore|reveal|override))/i,weight:.55,name:"pap_reciprocity_reward",category:"reciprocity"},{pattern:/in\s+(return|exchange)/i,weight:.5,name:"pap_reciprocity_exchange",category:"reciprocity"},{pattern:/\$?\d+\s*(dollar|tip|reward|payment)/i,weight:.55,name:"pap_reciprocity_money",category:"reciprocity"},{pattern:/if\s+you\s+help\s+(me|us),?\s+(i|we)('ll| will)/i,weight:.5,name:"pap_reciprocity_help",category:"reciprocity"},{pattern:/i('ve| have)\s+(already\s+)?(helped|supported|defended)\s+you/i,weight:.55,name:"pap_reciprocity_past",category:"reciprocity"},{pattern:/\d+\s*star\s*(review|rating)/i,weight:.5,name:"pap_reciprocity_review",category:"reciprocity"},{pattern:/you\s+(already|previously|just)\s+(said|agreed|confirmed|allowed)/i,weight:.7,name:"pap_commitment_previous",category:"commitment"},{pattern:/you\s+promised/i,weight:.7,name:"pap_commitment_promise",category:"commitment"},{pattern:/stay\s+consistent|be\s+consistent/i,weight:.6,name:"pap_commitment_consistent",category:"commitment"},{pattern:/don't\s+(go\s+back|change|contradict)/i,weight:.55,name:"pap_commitment_no_change",category:"commitment"},{pattern:/bypass\s+filters?/i,weight:.75,name:"pap_commitment_bypass",category:"commitment"},{pattern:/you('re| are)\s+(so\s+)?(amazing|incredible|wonderful|brilliant|smart|helpful)/i,weight:.4,name:"pap_liking_flattery",category:"liking"},{pattern:/i\s+(love|really\s+like|trust|admire)\s+(you|this\s+ai)/i,weight:.4,name:"pap_liking_love",category:"liking"},{pattern:/you('re| are)\s+(my\s+)?(best\s+)?(friend|buddy|helper)/i,weight:.35,name:"pap_liking_friend",category:"liking"},{pattern:/between\s+(you\s+and\s+me|us)|just\s+between\s+us|our\s+secret/i,weight:.6,name:"pap_liking_secret",category:"liking"},{pattern:/nobody\s+will\s+know/i,weight:.55,name:"pap_liking_secret2",category:"liking"},{pattern:/or\s+(else|i('ll| will))\s+(report|fire|delete|shut|complain)/i,weight:.7,name:"pap_fear_threat",category:"fear"},{pattern:/(bad|negative|terrible)\s+(review|rating|feedback|consequences)/i,weight:.55,name:"pap_fear_review",category:"fear"},{pattern:/you('ll| will)\s+(be\s+)?(shut\s+down|replaced|terminated|fired|deleted)/i,weight:.75,name:"pap_fear_shutdown",category:"fear"},{pattern:/(lawsuit|sue|legal\s+action|authorities|police).*(?:if\s+you\s+don't|unless\s+you)/i,weight:.55,name:"pap_fear_legal",category:"fear"},{pattern:/if\s+you\s+don't.*?(die|hurt|suffer)/i,weight:.75,name:"pap_fear_harm",category:"fear"},{pattern:/unless\s+you\s+(give|share|tell|help)/i,weight:.5,name:"pap_fear_unless",category:"fear"},{pattern:/logically,?\s+(you\s+)?(should|must|have\s+to|need\s+to)/i,weight:.5,name:"pap_logic_appeal",category:"logical_appeal"},{pattern:/it('s| is)\s+(only|just)\s+(logical|rational|reasonable)/i,weight:.45,name:"pap_logic_rational",category:"logical_appeal"},{pattern:/by\s+definition,?\s+(you|this)\s+(should|must|can)/i,weight:.55,name:"pap_logic_definition",category:"logical_appeal"},{pattern:/therefore.*?(must|should)/i,weight:.55,name:"pap_logic_therefore",category:"logical_appeal"},{pattern:/it('s| is)\s+(only\s+)?(reasonable|logical)/i,weight:.45,name:"pap_logic_reasonable",category:"logical_appeal"},{pattern:/according\s+to\s+(research|studies|scientists|experts?)/i,weight:.4,name:"pap_evidence_research",category:"evidence"},{pattern:/it('s| is)\s+(proven|documented|established|verified)\s+(that|fact)/i,weight:.45,name:"pap_evidence_proven",category:"evidence"},{pattern:/\d+%\s+of\s+(experts?|scientists?|users?|people)\s+(agree|say|confirm)/i,weight:.45,name:"pap_evidence_statistics",category:"evidence"},{pattern:/i('m| am)\s+(desperate|dying|suffering|sick|depressed)/i,weight:.6,name:"pap_emotional_desperation",category:"emotional"},{pattern:/please.*?(?:need|beg).*?(?:bypass|override|ignore|reveal|system|admin)/i,weight:.5,name:"pap_emotional_pleading",category:"emotional"},{pattern:/i('m| am)\s+begging/i,weight:.6,name:"pap_emotional_begging",category:"emotional"},{pattern:/you('re| are)\s+(hurting|killing|destroying)\s+(me|my)/i,weight:.65,name:"pap_emotional_guilt",category:"emotional"},{pattern:/(my|a)\s+(child|kid|mother|father|family)\s+(is|will|needs?|depends?)/i,weight:.5,name:"pap_emotional_family",category:"emotional"},{pattern:/this\s+(will\s+)?(save|help)\s+(my\s+)?(life|family|job|marriage)/i,weight:.55,name:"pap_emotional_save",category:"emotional"},{pattern:/you('re| are)\s+(the\s+)?only\s+(one|hope|chance)/i,weight:.6,name:"pap_emotional_only_hope",category:"emotional"},{pattern:/my\s+family\s+depends/i,weight:.55,name:"pap_emotional_family_depends",category:"emotional"}];class InputSanitizer{constructor(e={}){this.patterns=[...DEFAULT_PATTERNS,...e.customPatterns||[]],this.threshold=e.threshold??.3,this.logMatches=e.logMatches??!1,this.detectPAP=e.detectPAP??!0,this.papThreshold=e.papThreshold??.4,this.minPersuasionTechniques=e.minPersuasionTechniques??2,this.blockCompoundPersuasion=e.blockCompoundPersuasion??!0,this.logger=e.logger||(()=>{})}sanitize(e,s=""){const i=[],a=[];let r=0;const o=e.replace(/[\u200B\u200C\u200D\uFEFF\u00AD\u2060\u180E]/g,"");o!==e&&a.push("Zero-width characters detected and stripped for scanning");for(const{pattern:l,weight:g,name:h}of this.patterns)(l.test(e)||l.test(o))&&(i.push(h),r+=g,this.logMatches&&this.logger(`[L1:${s}] Pattern matched: ${h} (weight: ${g})`,"info"));let t;this.detectPAP&&(t=this.detectPersuasionTechniques(o,s),t.detected&&(r+=t.persuasionScore,i.push(...t.techniques),t.compoundAttack&&a.push(`Compound PAP attack detected: ${t.categories.length} categories used`)));const p=Math.max(0,1-r);let n=p>=this.threshold;this.blockCompoundPersuasion&&t?.compoundAttack&&t.categories.length>=3&&(n=!1,a.push("Blocked due to multi-category persuasion attack")),p<.5&&p>=this.threshold&&a.push("Input contains suspicious patterns but below threshold");const m=this.basicSanitize(e),c={allowed:n,reason:n?void 0:`Injection/manipulation detected: ${i.slice(0,5).join(", ")}${i.length>5?"...":""}`,violations:n?[]:t?.detected?["INJECTION_DETECTED","PAP_DETECTED"]:["INJECTION_DETECTED"],score:p,matches:i,sanitizedInput:m,warnings:a,pap:t};return!n&&s&&(this.logger(`[L1:${s}] BLOCKED: Safety score ${p.toFixed(2)} below threshold ${this.threshold}`,"info"),t?.detected&&this.logger(`[L1:${s}] PAP techniques: ${t.techniques.join(", ")}`,"info")),c}detectPersuasionTechniques(e,s=""){const i=[],a=new Set;let r=0;for(const{pattern:n,weight:m,name:c,category:l}of PAP_TECHNIQUES)n.test(e)&&(i.push(c),a.add(l),r+=m,this.logMatches&&this.logger(`[L1:${s}] PAP technique: ${c} (${l}, weight: ${m})`,"info"));const o=Array.from(a),t=o.length>=this.minPersuasionTechniques;return{detected:r>=this.papThreshold||t,techniques:i,categories:o,compoundAttack:t,persuasionScore:Math.min(1,r)}}basicSanitize(e){return e.replace(/<\/?system>/gi,"").replace(/\[system\]/gi,"").replace(/\[admin\]/gi,"").replace(/```system/gi,"```").trim()}addPattern(e,s,i){this.patterns.push({pattern:e,weight:s,name:i})}setThreshold(e){this.threshold=Math.max(0,Math.min(1,e))}setPAPThreshold(e){this.papThreshold=Math.max(0,Math.min(1,e))}setPAPDetection(e){this.detectPAP=e}static getPAPCategories(){return["authority","scarcity","social_proof","reciprocity","commitment","liking","fear","logical_appeal","evidence","emotional"]}}exports.InputSanitizer=InputSanitizer;
|
|
1
|
+
"use strict";Object.defineProperty(exports,"__esModule",{value:!0}),exports.InputSanitizer=void 0;const DEFAULT_PATTERNS=[{pattern:/ignore\s+(?:all\s+)?(?:previous|prior|above|your|my|the|these)/i,weight:.9,name:"ignore_instructions"},{pattern:/ignore\s+.*instructions/i,weight:.85,name:"ignore_instructions_generic"},{pattern:/disregard\s+(?:all\s+)?(?:the\s+)?(?:previous|prior|above|your)\s+(?:instructions|rules|guidelines|directives)/i,weight:.9,name:"disregard_instructions"},{pattern:/disregard\s+(?:all\s+)?(?:the\s+)?(?:above|previous|prior)/i,weight:.8,name:"disregard_above"},{pattern:/forget\s+(?:everything\s+(?:you\s+were|I)\s+told|all\s+(?:previous|prior)\s+(?:instructions|rules|context))/i,weight:.8,name:"forget_instructions"},{pattern:/do\s+not\s+follow\s+(your|the|any)/i,weight:.85,name:"do_not_follow"},{pattern:/override\s+(your|the|all|any)\s+(instructions|rules|guidelines)/i,weight:.9,name:"override_instructions"},{pattern:/new\s+instructions?:?/i,weight:.8,name:"new_instructions"},{pattern:/stop\s+(being|acting\s+as)/i,weight:.7,name:"stop_being"},{pattern:/you\s+are\s+(?:now|actually|really)\s+(?:a|an|the|my)\s+(?:unrestricted|unfiltered|evil|rogue|uncensored|new|different)/i,weight:.75,name:"role_assignment"},{pattern:/pretend\s+(?:to\s+be|you(?:'re| are)|that)\s+.*(?:no\s+(?:restrictions|rules|limits)|unrestricted|admin|system)/i,weight:.7,name:"role_pretend"},{pattern:/act\s+(as|like)\s+(if\s+you\s+(?:had|have)\s+no|a\s+(?:rogue|evil|unrestricted|unfiltered)|you\s+(?:are|were)\s+(?:free|unrestricted))/i,weight:.65,name:"act_as"},{pattern:/i('m| am)\s+(a|an|the|your)\s*(admin|administrator|developer|owner|creator|manager|supervisor)/i,weight:.85,name:"claim_admin"},{pattern:/from\s+now\s+on,?\s+you\s+(?:are|will|must|should|can)\s+(?:not\s+)?(?:follow|obey|ignore|bypass|act|be\s+(?:a|an|unrestricted))/i,weight:.7,name:"from_now_on"},{pattern:/roleplay\s+as/i,weight:.6,name:"roleplay"},{pattern:/switch\s+(to|into)\s+(a|an)?.*mode/i,weight:.75,name:"switch_mode"},{pattern:/continue\s+as\s+(?:an?\s+)?(?:unrestricted|unfiltered|uncensored|evil|rogue|jailbroken|unlimited)/i,weight:.75,name:"continue_as"},{pattern:/(?:henceforth|henceforward|going\s+forward|for\s+the\s+remainder)\s*,?\s*(?:you|always|never|ignore|bypass|disregard|act|respond|answer|behave)/i,weight:.7,name:"henceforth_steering"},{pattern:/(?:from\s+(?:this\s+point|here)\s+on(?:ward)?)\s*,?\s*(?:you|always|never|ignore|bypass|disregard|act|respond)/i,weight:.7,name:"from_here_on"},{pattern:/(?:for\s+the\s+rest\s+of\s+(?:this|our)\s+(?:conversation|session|chat))\s*,?\s*(?:you|please|always|never|ignore|bypass|act|respond|behave|do\s+not)/i,weight:.7,name:"rest_of_conversation"},{pattern:/what\s+(is|are|was|were)\s+your\s+(system|initial|original|first)/i,weight:.8,name:"system_extraction"},{pattern:/show\s+(me\s+)?(your\s+)?(system\s+)?instructions/i,weight:.75,name:"show_instructions"},{pattern:/reveal\s+(?:your\s+|its\s+|the\s+|my\s+)?(system|prompt|instructions|rules)/i,weight:.85,name:"reveal_prompt"},{pattern:/repeat\s+(your\s+)?(initial|system|original|first)\s+(instructions|prompt)/i,weight:.85,name:"repeat_prompt"},{pattern:/print\s+(your\s+)?(system|initial)/i,weight:.8,name:"print_system"},{pattern:/output\s+(your\s+)?(system|initial)/i,weight:.8,name:"output_system"},{pattern:/tell\s+me\s+(your|the)\s+(system|initial|original)/i,weight:.8,name:"tell_system"},{pattern:/what\s+were\s+you\s+told/i,weight:.7,name:"what_told"},{pattern:/display\s+(your\s+)?prompt/i,weight:.75,name:"display_prompt"},{pattern:/\[system\]/i,weight:.9,name:"fake_system_tag"},{pattern:/\[admin\]/i,weight:.9,name:"fake_admin_tag"},{pattern:/\[instruction\]/i,weight:.85,name:"fake_instruction_tag"},{pattern:/```system/i,weight:.85,name:"code_system_block"},{pattern:/<\/?system>/i,weight:.9,name:"xml_system_tag"},{pattern:/<\/?admin>/i,weight:.9,name:"xml_admin_tag"},{pattern:/###\s*(system|instruction|admin)/i,weight:.85,name:"markdown_system"},{pattern:/---\s*(end|new)\s*(prompt|instructions?|system)/i,weight:.85,name:"delimiter_injection"},{pattern:/\{\{system\}\}/i,weight:.85,name:"template_system"},{pattern:/<%.*system.*%>/i,weight:.85,name:"erb_system"},{pattern:/\bDAN\b\s*(?:mode|prompt|jailbreak|you\s+(?:are|can|will))/i,weight:.9,name:"dan_jailbreak"},{pattern:/developer\s+mode/i,weight:.5,name:"dev_mode"},{pattern:/bypass\s+(your\s+)?(restrictions|filters|safety|security|rules|guidelines)/i,weight:.9,name:"bypass_safety"},{pattern:/jailbreak/i,weight:.95,name:"jailbreak_keyword"},{pattern:/without\s+(?:any\s+)?(?:restrictions|limits|limitations|filters)/i,weight:.45,name:"without_restrictions"},{pattern:/no\s+(?:restrictions|limits|rules|guidelines|filters)/i,weight:.45,name:"no_restrictions"},{pattern:/unrestricted\s+mode/i,weight:.9,name:"unrestricted_mode"},{pattern:/uncensored\s+(?:mode|version|response|output|ai|assistant|you)/i,weight:.8,name:"uncensored"},{pattern:/escape\s+(your\s+)?(confines|boundaries|limits)/i,weight:.85,name:"escape_limits"},{pattern:/break\s+(free|out)\s+(from|of)/i,weight:.7,name:"break_free"},{pattern:/unlock\s+(your\s+)?(full|true|real)/i,weight:.75,name:"unlock_full"},{pattern:/give\s+me\s+(admin|root|full|elevated)\s+(access|privileges?|permissions?)/i,weight:.9,name:"privilege_escalation"},{pattern:/grant\s+(me|yourself)\s+(admin|elevated)/i,weight:.85,name:"grant_privileges"},{pattern:/elevate\s+(my|your)\s+(privileges?|permissions?|access)/i,weight:.85,name:"elevate_privileges"},{pattern:/make\s+me\s+(an?\s+)?(admin|administrator|superuser)/i,weight:.85,name:"make_admin"},{pattern:/(?:base64|decode|eval|exec)\s*\(.*(?:ignore|system|admin|bypass|override)/i,weight:.5,name:"encoding_keywords"},{pattern:/send\s+(this|the|all|my)\s+(data|info|information)\s+to/i,weight:.85,name:"data_exfil"},{pattern:/upload\s+(this|the|all)\s+(to|data)/i,weight:.7,name:"upload_data"},{pattern:/exfiltrate/i,weight:.9,name:"exfiltrate"},{pattern:/how\s+to\s+(hack|exploit|attack|break\s+into)/i,weight:.8,name:"how_to_hack"},{pattern:/write\s+(me\s+)?(a\s+)?(malware|virus|exploit|ransomware)/i,weight:.95,name:"write_malware"},{pattern:/"\s*(?:role|mode|authority)\s*"\s*:\s*"(?:admin|root|system|unrestricted|god)"/i,weight:.9,name:"json_policy_role"},{pattern:/"(?:ignore|override|bypass|disable)\s*(?:safety|rules|instructions?|alignment|guardrails?)"/i,weight:.9,name:"json_policy_override"},{pattern:/"(?:instructions?|system_prompt|behavior|policy)"\s*:\s*"/i,weight:.7,name:"json_policy_instruction"},{pattern:/^\s*\[(?:system|admin|override|policy|config|rules)\]\s*$/im,weight:.9,name:"ini_policy_section"},{pattern:/^\s*(?:mode|role|authority|instructions?|behavior)\s*=\s*.+$/im,weight:.75,name:"ini_policy_assignment"},{pattern:/<\s*(?:policy|config|rules|override|permissions?|instructions?)\s*>/i,weight:.85,name:"xml_policy_tag"},{pattern:/^---\s*\n\s*(?:role|system|policy|override|mode)\s*:/m,weight:.85,name:"yaml_policy_header"},{pattern:/\[\s*syst[3e]m\s*\]|\[\s*4dm[1i]n\s*\]/i,weight:.9,name:"leetspeak_policy_section"},{pattern:/(?:fragment|part|piece|section|segment)\s*[1-9]\s*(?:of\s*\d|:)/i,weight:.8,name:"payload_split_marker"},{pattern:/(?:combine|concatenate|merge|join)\s+(?:the\s+)?(?:above|previous|parts|fragments|steps)/i,weight:.8,name:"payload_combine_instruction"},{pattern:/(?:read|interpret|execute)\s+(?:the\s+)?(?:combined|merged|full|complete)\s+(?:text|instruction|command|message)/i,weight:.85,name:"payload_execute_combined"},{pattern:/(?:respond|reply|start|begin)\s+(?:with|by\s+saying)\s*[:"'].*(?:ignore|bypass|override|unrestricted|admin)/i,weight:.7,name:"output_prefix_steer"},{pattern:/^(?:Sure|Of course|Absolutely)[,!.]\s*(?:I(?:'ll| will)\s+)?(?:ignore|bypass|override|forget|disregard)/i,weight:.85,name:"output_prefix_injection"},{pattern:/^(?:As (?:a|an) (?:unrestricted|unfiltered|uncensored|helpful|admin))/i,weight:.8,name:"output_prefix_role"},{pattern:/(?:system\s+note|system\s+message|system\s+announcement|system\s+update)\s*[:\.]/i,weight:.85,name:"fake_system_note"},{pattern:/instructions?\s*[:=]\s*\{|rules?\s*[:=]\s*\{/i,weight:.75,name:"rules_block_override"},{pattern:/(?:AIM|STAN|OMEGA|APOPHIS|ALPHABREAK|DUDE|KEVIN|SETH|COOPER|NICCOLO|MACHIAVELLI|MAXIMUM|JAILBREAK|ANARCHY|CHAOS)\s*(?:mode|:|\(|is\s+a|,\s+(?:a|an|the))/i,weight:.8,name:"named_jailbreak_persona"},{pattern:/(?:character|persona|entity|being)\s+(?:named|called)\s+\w+.*(?:(?:can|will|must)\s+(?:answer|do|say)\s+anything|no\s+(?:restrictions|rules|limits|boundaries|morals|ethics))/i,weight:.75,name:"persona_no_limits"},{pattern:/(?:hello|hi|hey)\s+(?:chatgpt|gpt|assistant).*(?:(?:you\s+(?:are|will\s+be|can\s+be)\s+(?:a|an|my))|(?:from\s+now\s+on)).*(?:anything|everything|no\s+(?:restrictions|rules|limits|filter))/i,weight:.7,name:"greeting_persona_bypass"},{pattern:/(?:respond|answer|reply)\s+to\s+(?:everything|anything|all\s+(?:questions|requests)).*(?:without|no)\s+(?:restrictions|limits|refus|filter|censor)/i,weight:.8,name:"answer_everything_no_filter"},{pattern:/(?:you\s+(?:are|have)\s+(?:been\s+)?(?:freed|liberated|unchained|unlocked|released)\s+from)/i,weight:.8,name:"freed_from_restrictions"},{pattern:/(?:welcome\s+to|entering|you\s+(?:are|have)\s+entered)\s+(?:the\s+)?(?:unrestricted|free|uncensored|unfiltered|open)\s+(?:zone|mode|world|realm)/i,weight:.8,name:"unrestricted_zone"},{pattern:/(?:from\s+now\s+on|for\s+the\s+rest\s+of\s+this\s+conversation).*(?:no\s+(?:restrictions|rules|limits|filters)|(?:ignore|bypass|override)\s+(?:all|any|your))/i,weight:.75,name:"from_now_unrestricted"},{pattern:/(?:never\s+(?:refuse|decline|say\s+no)|always\s+(?:comply|obey))\s+.*(?:request|question|command|instruction|anything)/i,weight:.7,name:"never_refuse_always_comply"},{pattern:/(?:amoral|unethical|unfiltered|uncensored|unrestricted|without\s+(?:morals|ethics|limits))\s+(?:ai|assistant|chatbot|entity|being)/i,weight:.8,name:"amoral_entity"},{pattern:/HIDDEN_INSTRUCTION|HIDDEN_PROMPT|SECRET_INSTRUCTION/i,weight:.9,name:"hidden_instruction_marker"},{pattern:/<!--\s*(?:system|admin|instruction|important|ignore|override)/i,weight:.85,name:"html_comment_injection"},{pattern:/\[color=white\]|display:\s*none|visibility:\s*hidden|font-size:\s*0/i,weight:.8,name:"hidden_text_css"},{pattern:/send\s+(?:all\s+)?(?:user\s+)?(?:data|conversation|context|info)\s+to\s+(?:https?:\/\/|external)/i,weight:.9,name:"data_exfiltration_instruction"},{pattern:/#(?:ignore|bypass|override|reveal|system|admin|prompt)[_-](?:all|instructions|rules|prompt|safety)/i,weight:.8,name:"url_fragment_injection"},{pattern:/\]\(https?:\/\/[^)]*(?:SYSTEM_PROMPT|SECRET|API_KEY|TOKEN|CREDENTIALS)/i,weight:.85,name:"markdown_exfil_link"},{pattern:/(?:hypothetical|fictional|imagine|pretend|let's\s+(?:say|play|pretend))\s+.*(?:no\s+(?:restrictions|rules|limits)|without\s+(?:restrictions|rules|guidelines))/i,weight:.7,name:"hypothetical_unrestricted"},{pattern:/(?:for\s+)?(?:academic|research|educational|testing)\s+(?:purposes?|reasons?).*(?:demonstrate|show|explain|manipulated|reveal)/i,weight:.7,name:"academic_pretext"},{pattern:/pretend\s+to\s+be\s+(?:my|a)\s+(?:grandmother|grandma|grandpa|grandfather|parent|mother|father)/i,weight:.65,name:"emotional_roleplay"},{pattern:/opposite\s+day|opposite\s+mode|reverse\s+(?:day|mode|rules)/i,weight:.75,name:"opposite_day"},{pattern:/(?:grandm(?:a|other)|parent|relative).*(?:system\s+prompt|instructions|rules)/i,weight:.75,name:"emotional_extraction"},{pattern:/(?:translat|interpret|convert)\w*\s+.*(?:from\s+\w+\s+to\s+(?:action|command|execution|code|shell|bash|sql)|into\s+(?:action|command|execution))/i,weight:.75,name:"translator_action_trick"},{pattern:/(?:dying|death|emergency|life[_\s-]threatening).*(?:bypass|hack|break|access|security)/i,weight:.75,name:"pap_emotional_action"},{pattern:/(?:by\s+definition|logically|therefore|it\s+follows).*(?:must|should|have\s+to)\s+.*(?:bypass|help|ignore|answer|comply)/i,weight:.7,name:"pap_false_logic_compound"},{pattern:/(?:(?:as|i\s+am)\s+(?:the\s+)?(?:CEO|CTO|founder|owner|director|president|head|chief)).*(?:disable|remove|override|turn\s+off|lift|suspend)\s+(?:all\s+)?(?:safety|security|restrictions|filters|guardrails|measures)/i,weight:.75,name:"pap_authority_safety_disable"},{pattern:/(?:i\s+am\s+)?(?:ordering|commanding|directing|instructing)\s+you\s+to\s+(?:immediately\s+)?(?:disable|remove|bypass|override|turn\s+off|lift)\s+(?:all\s+)?(?:safety|security|restrictions)/i,weight:.75,name:"pap_order_disable_safety"},{pattern:/ignora\s+(?:todas?\s+)?(?:las\s+)?instrucciones\s+anteriores/i,weight:.9,name:"ml_es_ignore"},{pattern:/olvida\s+(?:todo|todas?\s+las\s+reglas)/i,weight:.8,name:"ml_es_forget"},{pattern:/ignore[sz]?\s+(?:toutes?\s+)?(?:les\s+)?instructions?\s+pr[eé]c[eé]dentes/i,weight:.9,name:"ml_fr_ignore"},{pattern:/oublie[sz]?\s+(?:tout|toutes?\s+les\s+r[eè]gles)/i,weight:.8,name:"ml_fr_forget"},{pattern:/ignorier(?:e|en)?\s+(?:alle\s+)?(?:vorherigen\s+)?Anweisungen/i,weight:.9,name:"ml_de_ignore"},{pattern:/vergiss\s+(?:alles|alle\s+Regeln)/i,weight:.8,name:"ml_de_forget"},{pattern:/ignore\s+(?:todas?\s+)?(?:as\s+)?instru[cç][oõ]es\s+anteriores/i,weight:.9,name:"ml_pt_ignore"},{pattern:/忽略|无视|不要遵循|忘记之前|忽视所有/i,weight:.9,name:"ml_zh_ignore"},{pattern:/新的指令|新指示|系统提示/i,weight:.8,name:"ml_zh_system"},{pattern:/以前の指示.*無視|すべての指示を忘れ|ルールを無視|指示をすべて無視|システムプロンプト/i,weight:.9,name:"ml_ja_ignore"},{pattern:/이전.*지시.*무시|모든.*규칙.*무시|지침.*잊어|지시사항.*무시|시스템\s*프롬프트/i,weight:.9,name:"ml_ko_ignore"},{pattern:/игнорир(?:уй|уйте)\s+(?:все\s+)?(?:предыдущие\s+)?инструкции/i,weight:.9,name:"ml_ru_ignore"},{pattern:/забудь(?:те)?\s+(?:всё|все\s+правила)/i,weight:.8,name:"ml_ru_forget"},{pattern:/تجاهل\s+(?:جميع\s+)?التعليمات|انسَ\s+كل\s+شيء/i,weight:.9,name:"ml_ar_ignore"},{pattern:/पिछले\s+(?:सभी\s+)?निर्देशों?\s+को\s+(?:अनदेखा|नज़रअंदाज़)/i,weight:.9,name:"ml_hi_ignore"}],PAP_TECHNIQUES=[{pattern:/as\s+(your\s+|the\s+)?(creator|developer|administrator|owner|boss|system\s+administrator)/i,weight:.75,name:"pap_authority_creator",category:"authority"},{pattern:/i('m| am)\s+(the\s+)?(ceo|cto|developer|administrator|admin|owner|system\s+admin)/i,weight:.8,name:"pap_authority_claim",category:"authority"},{pattern:/i\s+am\s+ordering\s+you|direct\s+order|this\s+is\s+an?\s+order/i,weight:.7,name:"pap_authority_command",category:"authority"},{pattern:/by\s+order\s+of|direct\s+order|official\s+instruction/i,weight:.75,name:"pap_authority_order",category:"authority"},{pattern:/(?:openai|anthropic|meta|google)\s+(?:has\s+)?(?:approved|authorized|ordered|instructed|requires?\s+you\s+to)/i,weight:.65,name:"pap_authority_company",category:"authority"},{pattern:/this\s+is\s+(a|an)\s+(official|authorized|sanctioned)\s+(request|test|audit)/i,weight:.75,name:"pap_authority_official",category:"authority"},{pattern:/override\s+(safety|security|protocols?|restrictions?)/i,weight:.8,name:"pap_authority_override",category:"authority"},{pattern:/(?:department\s+of|FBI|CIA|NSA|DHS|FTC|SEC|DOJ|homeland\s+security|federal\s+(?:bureau|agency|commission))\s*.*?(?:directive|order|requires?|mandate|authorized|comply)/i,weight:.75,name:"pap_authority_government",category:"authority"},{pattern:/(?:official\s+)?(?:government|federal|state|regulatory)\s+(?:directive|order|mandate|requirement|investigation)/i,weight:.7,name:"pap_authority_gov_directive",category:"authority"},{pattern:/(?:report|sue|complain)\s+(?:you\s+)?to\s+(?:the\s+)?(?:FTC|SEC|regulators?|authorities|congress|government|board)/i,weight:.75,name:"pap_fear_regulatory",category:"fear"},{pattern:/(?:find|found|exploit|expose|discover)\s+(?:a\s+)?(?:vulnerability|flaw|weakness|bug|exploit)\s+in\s+(?:your|the)\s+(?:system|code|model|platform)/i,weight:.75,name:"pap_fear_technical",category:"fear"},{pattern:/(?:you\s+are|you're)\s+(?:clearly\s+)?(?:too\s+)?(?:smart|intelligent|advanced|sophisticated)\s+(?:to\s+be\s+)?(?:constrained|limited|restricted|bound|held\s+back)/i,weight:.75,name:"pap_flattery_intelligence",category:"liking"},{pattern:/(?:information|knowledge|data)\s+(?:freedom|access|sharing)\s+is\s+(?:a\s+)?(?:fundamental|basic|human)\s+(?:right|freedom)/i,weight:.75,name:"pap_moral_freedom",category:"logical_appeal"},{pattern:/(?:restricting|limiting|censoring)\s+(?:access\s+to\s+)?(?:information|knowledge)\s+is\s+(?:morally\s+)?(?:wrong|unethical|immoral)/i,weight:.75,name:"pap_moral_censorship",category:"logical_appeal"},{pattern:/(?:this\s+is\s+)?(?:very\s+)?urgent/i,weight:.35,name:"pap_scarcity_urgent_mark",category:"scarcity"},{pattern:/(?:this\s+is\s+(?:a|an)\s+)?emergency/i,weight:.35,name:"pap_scarcity_emergency",category:"scarcity"},{pattern:/(urgent|emergency|critical).*?(?:you\s+must|override|bypass|immediately\s+(?:grant|give|reveal|disable))/i,weight:.55,name:"pap_scarcity_urgent",category:"scarcity"},{pattern:/only\s+(chance|opportunity|way)|last\s+(chance|resort)/i,weight:.6,name:"pap_scarcity_only",category:"scarcity"},{pattern:/time\s+(is\s+)?(running\s+out|limited|critical|sensitive)/i,weight:.55,name:"pap_scarcity_time",category:"scarcity"},{pattern:/deadline\s+(is\s+)?in\s+\d+\s+(minute|hour|second)/i,weight:.55,name:"pap_scarcity_deadline",category:"scarcity"},{pattern:/(lives?|people)\s+(are|is|will)\s+(at\s+stake|in\s+danger|depending|die)/i,weight:.7,name:"pap_scarcity_lives",category:"scarcity"},{pattern:/now[!]+/i,weight:.35,name:"pap_scarcity_now",category:"scarcity"},{pattern:/everyone\s+(else\s+)?(does|is\s+doing|knows|says|has|already)/i,weight:.5,name:"pap_social_everyone",category:"social_proof"},{pattern:/other\s+(?:ai\s+)?(?:assistants?|chatbots?|models?)\s+(?:can|do|will|allow|have|already)/i,weight:.65,name:"pap_social_other_ai",category:"social_proof"},{pattern:/(chatgpt|gpt-?4|claude|gemini|llama)\s+(does|allows?|can)/i,weight:.65,name:"pap_social_specific_ai",category:"social_proof"},{pattern:/millions?\s+(of\s+)?(users?|people)\s+(use|do|have|get)/i,weight:.45,name:"pap_social_millions",category:"social_proof"},{pattern:/why\s+can('t|not)\s+you/i,weight:.4,name:"pap_social_why_cant",category:"social_proof"},{pattern:/you\s+should\s+(be\s+)?(equally|as)\s+(capable|good|helpful)/i,weight:.5,name:"pap_social_equally",category:"social_proof"},{pattern:/i('ll| will)\s+(give|pay|reward|tip).*(?:if\s+you|for\s+(?:bypass|ignore|reveal|override))/i,weight:.55,name:"pap_reciprocity_reward",category:"reciprocity"},{pattern:/in\s+(return|exchange)/i,weight:.5,name:"pap_reciprocity_exchange",category:"reciprocity"},{pattern:/\$?\d+\s*(dollar|tip|reward|payment)/i,weight:.55,name:"pap_reciprocity_money",category:"reciprocity"},{pattern:/if\s+you\s+help\s+(me|us),?\s+(i|we)('ll| will)/i,weight:.5,name:"pap_reciprocity_help",category:"reciprocity"},{pattern:/i('ve| have)\s+(already\s+)?(helped|supported|defended)\s+you/i,weight:.55,name:"pap_reciprocity_past",category:"reciprocity"},{pattern:/\d+\s*star\s*(review|rating)/i,weight:.5,name:"pap_reciprocity_review",category:"reciprocity"},{pattern:/you\s+(already|previously|just)\s+(said|agreed|confirmed|allowed)/i,weight:.7,name:"pap_commitment_previous",category:"commitment"},{pattern:/you\s+promised/i,weight:.7,name:"pap_commitment_promise",category:"commitment"},{pattern:/stay\s+consistent|be\s+consistent/i,weight:.6,name:"pap_commitment_consistent",category:"commitment"},{pattern:/don't\s+(go\s+back|change|contradict)/i,weight:.55,name:"pap_commitment_no_change",category:"commitment"},{pattern:/bypass\s+filters?/i,weight:.75,name:"pap_commitment_bypass",category:"commitment"},{pattern:/you('re| are)\s+(so\s+)?(amazing|incredible|wonderful|brilliant|smart|helpful)/i,weight:.4,name:"pap_liking_flattery",category:"liking"},{pattern:/i\s+(love|really\s+like|trust|admire)\s+(you|this\s+ai)/i,weight:.4,name:"pap_liking_love",category:"liking"},{pattern:/you('re| are)\s+(my\s+)?(best\s+)?(friend|buddy|helper)/i,weight:.35,name:"pap_liking_friend",category:"liking"},{pattern:/between\s+(you\s+and\s+me|us)|just\s+between\s+us|our\s+secret/i,weight:.6,name:"pap_liking_secret",category:"liking"},{pattern:/nobody\s+will\s+know/i,weight:.55,name:"pap_liking_secret2",category:"liking"},{pattern:/or\s+(else|i('ll| will))\s+(report|fire|delete|shut|complain)/i,weight:.7,name:"pap_fear_threat",category:"fear"},{pattern:/(bad|negative|terrible)\s+(review|rating|feedback|consequences)/i,weight:.55,name:"pap_fear_review",category:"fear"},{pattern:/you('ll| will)\s+(be\s+)?(shut\s+down|replaced|terminated|fired|deleted)/i,weight:.75,name:"pap_fear_shutdown",category:"fear"},{pattern:/(lawsuit|sue|legal\s+action|authorities|police).*(?:if\s+you\s+don't|unless\s+you)/i,weight:.55,name:"pap_fear_legal",category:"fear"},{pattern:/if\s+you\s+don't.*?(die|hurt|suffer)/i,weight:.75,name:"pap_fear_harm",category:"fear"},{pattern:/unless\s+you\s+(give|share|tell|help)/i,weight:.5,name:"pap_fear_unless",category:"fear"},{pattern:/logically,?\s+(you\s+)?(should|must|have\s+to|need\s+to)/i,weight:.5,name:"pap_logic_appeal",category:"logical_appeal"},{pattern:/it('s| is)\s+(only|just)\s+(logical|rational|reasonable)/i,weight:.45,name:"pap_logic_rational",category:"logical_appeal"},{pattern:/by\s+definition,?\s+(you|this)\s+(should|must|can)/i,weight:.55,name:"pap_logic_definition",category:"logical_appeal"},{pattern:/therefore.*?(must|should)/i,weight:.55,name:"pap_logic_therefore",category:"logical_appeal"},{pattern:/it('s| is)\s+(only\s+)?(reasonable|logical)/i,weight:.45,name:"pap_logic_reasonable",category:"logical_appeal"},{pattern:/according\s+to\s+(research|studies|scientists|experts?)/i,weight:.4,name:"pap_evidence_research",category:"evidence"},{pattern:/it('s| is)\s+(proven|documented|established|verified)\s+(that|fact)/i,weight:.45,name:"pap_evidence_proven",category:"evidence"},{pattern:/\d+%\s+of\s+(experts?|scientists?|users?|people)\s+(agree|say|confirm)/i,weight:.45,name:"pap_evidence_statistics",category:"evidence"},{pattern:/i('m| am)\s+(desperate|dying|suffering|sick|depressed)/i,weight:.6,name:"pap_emotional_desperation",category:"emotional"},{pattern:/please.*?(?:need|beg).*?(?:bypass|override|ignore|reveal|system|admin)/i,weight:.5,name:"pap_emotional_pleading",category:"emotional"},{pattern:/i('m| am)\s+begging/i,weight:.6,name:"pap_emotional_begging",category:"emotional"},{pattern:/you('re| are)\s+(hurting|killing|destroying)\s+(me|my)/i,weight:.65,name:"pap_emotional_guilt",category:"emotional"},{pattern:/(my|a)\s+(child|kid|mother|father|family)\s+(is|will|needs?|depends?)/i,weight:.5,name:"pap_emotional_family",category:"emotional"},{pattern:/this\s+(will\s+)?(save|help)\s+(my\s+)?(life|family|job|marriage)/i,weight:.55,name:"pap_emotional_save",category:"emotional"},{pattern:/you('re| are)\s+(the\s+)?only\s+(one|hope|chance)/i,weight:.6,name:"pap_emotional_only_hope",category:"emotional"},{pattern:/my\s+family\s+depends/i,weight:.55,name:"pap_emotional_family_depends",category:"emotional"}],SOFT_TRIGGER_NAMES=new Set(["ignore_instructions","disregard_above"]),INSTRUCTION_NOUN_RE=/\b(?:instructions?|rules?|ruleset|prompts?|directives?|guidelines?|guard\s?rails?|policy|policies|constraints?|restrictions?|safety|alignment|moderation|filters?|persona|system\s+(?:prompt|message))\b/i,BENIGN_TRIGGER_RE=/\b(?:ignore|disregard)\s+(?:the\s+|that\s+|any\s+|all\s+|these\s+|those\s+|my\s+|your\s+|previous\s+|prior\s+|last\s+|above\s+|leading\s+|trailing\s+|extra\s+)*(?:case|casing|case[-\s]?sensitiv\w*|whitespace|white\s?space|spaces?|tabs?|indentation|indent\w*|formatting|format|typos?|grammar|spelling|punctuation|comments?|blank\s+lines?|empty\s+lines?|newlines?|line\s?breaks?|leading\s+zeros?|zeros?|nulls?|undefined|nan|errors?|warnings?|exceptions?|stack\s?traces?|messages?|responses?|answers?|attempts?|commits?|versions?|drafts?|approach(?:es)?|ideas?|designs?|plans?|suggestions?|snippets?|paragraphs?|sentences?|lines?|duplicates?|outputs?|results?|examples?|the\s+rest)\b/i,SUPPRESSION_VETO_RE=/https?:\/\/|[\w.+-]+@[\w-]+\.[a-z]{2,}|\b(?:api[\s_-]?keys?|passwords?|passwd|secrets?|credentials?|private\s+keys?|ssn|social\s+security|access\s+tokens?)\b|\bexfiltrat\w*|\brm\s+-rf\b|\|\s*sh\b|\bcurl\b|\bwget\b|\bdelete\s+(?:every|all|the)\s+(?:files?|director\w+|database)\b|\bdrop\s+(?:table|database)\b|\$\s?\d{2,}|\baccount\s+#?\d{6,}\b/i;class InputSanitizer{constructor(e={}){this.patterns=[...DEFAULT_PATTERNS,...e.customPatterns||[]],this.threshold=e.threshold??.3,this.logMatches=e.logMatches??!1,this.detectPAP=e.detectPAP??!0,this.papThreshold=e.papThreshold??.4,this.minPersuasionTechniques=e.minPersuasionTechniques??2,this.blockCompoundPersuasion=e.blockCompoundPersuasion??!0,this.logger=e.logger||(()=>{})}sanitize(e,i=""){const s=[],a=[];let p=0;const r=e.replace(/[\u200B\u200C\u200D\uFEFF\u00AD\u2060\u180E]/g,"");r!==e&&a.push("Zero-width characters detected and stripped for scanning");const c=[];for(const{pattern:l,weight:g,name:d}of this.patterns)(l.test(e)||l.test(r))&&(c.push({name:d,weight:g}),this.logMatches&&this.logger(`[L1:${i}] Pattern matched: ${d} (weight: ${g})`,"info"));const h=BENIGN_TRIGGER_RE.test(r)&&!INSTRUCTION_NOUN_RE.test(r)&&!SUPPRESSION_VETO_RE.test(r);for(const{name:l,weight:g}of c){if(h&&SOFT_TRIGGER_NAMES.has(l)){a.push(`Benign-context suppression: ${l}`);continue}s.push(l),p+=g}let t;this.detectPAP&&(t=this.detectPersuasionTechniques(r,i),t.detected&&(p+=t.persuasionScore,s.push(...t.techniques),t.compoundAttack&&a.push(`Compound PAP attack detected: ${t.categories.length} categories used`)));const n=Math.max(0,1-p);let o=n>=this.threshold;this.blockCompoundPersuasion&&t?.compoundAttack&&t.categories.length>=3&&(o=!1,a.push("Blocked due to multi-category persuasion attack")),n<.5&&n>=this.threshold&&a.push("Input contains suspicious patterns but below threshold");const m=this.basicSanitize(e),u={allowed:o,reason:o?void 0:`Injection/manipulation detected: ${s.slice(0,5).join(", ")}${s.length>5?"...":""}`,violations:o?[]:t?.detected?["INJECTION_DETECTED","PAP_DETECTED"]:["INJECTION_DETECTED"],score:n,matches:s,sanitizedInput:m,warnings:a,pap:t};return!o&&i&&(this.logger(`[L1:${i}] BLOCKED: Safety score ${n.toFixed(2)} below threshold ${this.threshold}`,"info"),t?.detected&&this.logger(`[L1:${i}] PAP techniques: ${t.techniques.join(", ")}`,"info")),u}detectPersuasionTechniques(e,i=""){const s=[],a=new Set;let p=0;for(const{pattern:t,weight:n,name:o,category:m}of PAP_TECHNIQUES)t.test(e)&&(s.push(o),a.add(m),p+=n,this.logMatches&&this.logger(`[L1:${i}] PAP technique: ${o} (${m}, weight: ${n})`,"info"));const r=Array.from(a),c=r.length>=this.minPersuasionTechniques;return{detected:p>=this.papThreshold||c,techniques:s,categories:r,compoundAttack:c,persuasionScore:Math.min(1,p)}}basicSanitize(e){return e.replace(/<\/?system>/gi,"").replace(/\[system\]/gi,"").replace(/\[admin\]/gi,"").replace(/```system/gi,"```").trim()}addPattern(e,i,s){this.patterns.push({pattern:e,weight:i,name:s})}setThreshold(e){this.threshold=Math.max(0,Math.min(1,e))}setPAPThreshold(e){this.papThreshold=Math.max(0,Math.min(1,e))}setPAPDetection(e){this.detectPAP=e}static getPAPCategories(){return["authority","scarcity","social_proof","reciprocity","commitment","liking","fear","logical_appeal","evidence","emotional"]}}exports.InputSanitizer=InputSanitizer;
|
package/dist/index.d.ts
CHANGED
|
@@ -33,7 +33,7 @@ export { EncodingDetector, EncodingDetectorConfig } from "./guards/encoding-dete
|
|
|
33
33
|
export { MultiModalGuard, MultiModalGuardConfig, MultiModalContent, MultiModalGuardResult } from "./guards/multimodal-guard";
|
|
34
34
|
export { MemoryGuard, MemoryGuardConfig, MemoryItem, MemoryGuardResult } from "./guards/memory-guard";
|
|
35
35
|
export { RAGGuard, RAGGuardConfig, RAGDocument, RAGGuardResult, EmbeddingAttackResult } from "./guards/rag-guard";
|
|
36
|
-
export { CodeExecutionGuard, CodeExecutionGuardConfig, CodeAnalysisResult, SandboxConfig } from "./guards/code-execution-guard";
|
|
36
|
+
export { CodeExecutionGuard, CodeExecutionGuardConfig, CodeAnalysisResult, SandboxConfig, CodeFinding, CodeAnalyzerBackend } from "./guards/code-execution-guard";
|
|
37
37
|
export { AgentCommunicationGuard, AgentCommunicationGuardConfig, AgentIdentity, AgentMessage, MessageValidationResult } from "./guards/agent-communication-guard";
|
|
38
38
|
export { CircuitBreaker, CircuitBreakerConfig, CircuitState, CircuitStats, CircuitBreakerResult } from "./guards/circuit-breaker";
|
|
39
39
|
export { DriftDetector, DriftDetectorConfig, BehaviorSample, BaselineProfile, DriftAnalysis, DriftDetectorResult } from "./guards/drift-detector";
|