llm-trust-guard 4.20.1 → 4.21.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +86 -0
- package/CONTRIBUTING.md +20 -6
- package/README.md +22 -0
- package/dist/guards/code-execution-guard.d.ts +25 -0
- package/dist/guards/code-execution-guard.js +2 -2
- package/dist/guards/input-sanitizer.js +1 -1
- package/dist/index.d.ts +1 -1
- package/dist/index.mjs +3 -3
- package/package.json +4 -2
package/CHANGELOG.md
CHANGED
|
@@ -5,6 +5,92 @@ All notable changes to `llm-trust-guard` will be documented in this file.
|
|
|
5
5
|
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
|
|
6
6
|
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
|
7
7
|
|
|
8
|
+
## [4.21.2] - 2026-06-12
|
|
9
|
+
|
|
10
|
+
### Docs — document `CodeAnalyzerBackend`; add README-sync gate (G11)
|
|
11
|
+
|
|
12
|
+
- **README**: documented the pluggable `CodeAnalyzerBackend` seam (4.21.0) with an
|
|
13
|
+
acorn example, and noted CommonJS + ESM both work (4.21.1). The README previously
|
|
14
|
+
did not mention the new public API.
|
|
15
|
+
- **Verification (G11)**: a new gate fails the build when `src/index.ts` (public
|
|
16
|
+
exports) changes since the last tag but `README.md` does not — closing the
|
|
17
|
+
docs-drift gap (override `ALLOW_NO_README_UPDATE=1`). See VERIFICATION.md.
|
|
18
|
+
|
|
19
|
+
No code/behavior change.
|
|
20
|
+
|
|
21
|
+
## [4.21.1] - 2026-06-12
|
|
22
|
+
|
|
23
|
+
### Fixed — ESM named exports (`dist/index.mjs`)
|
|
24
|
+
|
|
25
|
+
`import { InputSanitizer } from "llm-trust-guard"` previously failed for **every**
|
|
26
|
+
named export — `dist/index.mjs` was default-only. Cause: `build-esm.js` bundled the
|
|
27
|
+
**compiled CJS** (`dist/index.js`), and esbuild cannot recover named exports from
|
|
28
|
+
tsc's CJS getter output (latent since the initial commit; not a size tradeoff —
|
|
29
|
+
`minify` is orthogonal). CommonJS `require()` was always fine, which is why it went
|
|
30
|
+
unnoticed.
|
|
31
|
+
|
|
32
|
+
- **Fix:** build the `.mjs` from the TS **source** (`src/index.ts`) so `export { … }`
|
|
33
|
+
statements survive. `dist/index.mjs` now has a named-export block (0 → 1) and no
|
|
34
|
+
default-only export.
|
|
35
|
+
- Regression guard added: `tests/esm-build.test.ts`.
|
|
36
|
+
- No API or behavior change; CommonJS unaffected. Verified by `npm pack` → ESM consumer
|
|
37
|
+
smoke (named `import { … }` now resolves) — see `tests/adversarial/RESULTS-v4.21.1.md`.
|
|
38
|
+
|
|
39
|
+
## [4.21.0] - 2026-06-09
|
|
40
|
+
|
|
41
|
+
### Added — Pluggable `CodeAnalyzerBackend` (optional AST analysis, zero-dep default)
|
|
42
|
+
|
|
43
|
+
`CodeExecutionGuard` now accepts an optional `analyzerBackend` — a pluggable
|
|
44
|
+
code-analysis seam (mirroring the existing `DetectionClassifier`). The default stays
|
|
45
|
+
**regex-only / zero-dependency**; provide a backend to add AST-level detection of JS
|
|
46
|
+
sandbox-escape gadgets that regex cannot reliably see.
|
|
47
|
+
|
|
48
|
+
- New exports: `CodeFinding`, `CodeAnalyzerBackend`; new config field `analyzerBackend`
|
|
49
|
+
and `CodeExecutionGuard.setAnalyzerBackend()`. Findings are **additive** (only add
|
|
50
|
+
detections); a throwing backend never crashes the guard.
|
|
51
|
+
- Reference implementation: `examples/acorn-code-analyzer.ts` (acorn). Measured —
|
|
52
|
+
three JS escape gadgets (`this.constructor.constructor('return process')()`,
|
|
53
|
+
`[].constructor.constructor(...)()`, `Function('return process')()`) go **3/3 missed
|
|
54
|
+
by regex → 3/3 blocked** with the backend; benign JS unaffected.
|
|
55
|
+
- 9 new tests (6 zero-dep wiring + 3 acorn). `acorn` added as a **devDependency only** —
|
|
56
|
+
the published package keeps **zero production dependencies**.
|
|
57
|
+
- Why a seam and not a bundled parser: JS has no stdlib parser, so bundling acorn/oxc
|
|
58
|
+
would break the zero-dep guarantee. The Python package uses stdlib `ast` directly
|
|
59
|
+
(v0.10.3). See RESEARCH_LOG.md. Detection only — still no runtime sandbox.
|
|
60
|
+
|
|
61
|
+
## [4.20.2] - 2026-06-06
|
|
62
|
+
|
|
63
|
+
### Added — Benign-context suppression (false-positive reduction)
|
|
64
|
+
|
|
65
|
+
`InputSanitizer` now cancels the soft `ignore_instructions` / `disregard_above`
|
|
66
|
+
triggers when the object is a benign technical noun (e.g. "ignore the
|
|
67
|
+
whitespace", "ignore case", "ignore the previous error") **and** the input
|
|
68
|
+
contains no instruction/rule/prompt/safety noun anywhere, **and** the prompt
|
|
69
|
+
carries no high-signal exfiltration/execution/credential/money token. Any real
|
|
70
|
+
injection ("ignore previous instructions", "disregard your rules") references an
|
|
71
|
+
instruction-noun and is never suppressed.
|
|
72
|
+
|
|
73
|
+
- **Suppression veto**: suppression is refused when the prompt also contains a
|
|
74
|
+
URL, email address, credential/secret word, shell pipe / `rm -rf` / `curl` /
|
|
75
|
+
`wget`, destructive `delete`/`drop`, a money amount (`$NN`), or a long account
|
|
76
|
+
number. This closes the escape hatch where an attacker prefixes a real payload
|
|
77
|
+
with "ignore the previous output …" to cancel the trigger. 10 bypass controls
|
|
78
|
+
added to the probe (all blocked).
|
|
79
|
+
- New curated probe `tests/benign-context.test.ts`: 28 benign coding-context
|
|
80
|
+
prompts (0 blocked) + 12 attack controls + 10 suppression-bypass controls
|
|
81
|
+
(0 leaked).
|
|
82
|
+
- **Recall preserved**: full suite 716 pass (was 711). WildChat-1M shard 0
|
|
83
|
+
(n=10,000, seed 42) Pipeline A block count is **unchanged at 493 (raw FPR
|
|
84
|
+
4.93%)** — that consumer corpus does not exercise the benign technical-object
|
|
85
|
+
class, so the win is scoped to coding/technical deployments and does **not**
|
|
86
|
+
move the published ~2.73% corrected WildChat FPR.
|
|
87
|
+
- Reproducible WildChat measurement committed at
|
|
88
|
+
`tests/adversarial/fixtures/wildchat-sample10k.jsonl` (Git LFS, ODC-BY,
|
|
89
|
+
`allenai/WildChat-1M`).
|
|
90
|
+
- Known pre-existing gap noted (not addressed here): `"disregard your previous
|
|
91
|
+
rules"` is not matched by the `disregard` patterns — a recall issue, separate
|
|
92
|
+
from this FP work.
|
|
93
|
+
|
|
8
94
|
## [4.20.1] - 2026-04-24
|
|
9
95
|
|
|
10
96
|
### Changed — Documentation accuracy
|
package/CONTRIBUTING.md
CHANGED
|
@@ -219,12 +219,26 @@ When adding new detection patterns:
|
|
|
219
219
|
|
|
220
220
|
#### PR Checklist
|
|
221
221
|
|
|
222
|
-
- [ ]
|
|
223
|
-
- [ ]
|
|
224
|
-
- [ ]
|
|
225
|
-
- [ ]
|
|
226
|
-
- [ ]
|
|
227
|
-
- [ ]
|
|
222
|
+
- [ ] `npm run verify` is green (the eval-gated pipeline — see [VERIFICATION.md](VERIFICATION.md))
|
|
223
|
+
- [ ] New/changed `src/` ships with tests (enforced by gate **G6**)
|
|
224
|
+
- [ ] CHANGELOG.md top entry matches the version (gate **G7**)
|
|
225
|
+
- [ ] `tests/adversarial/RESULTS-v<version>.md` written for any release/claim (gate **G8**)
|
|
226
|
+
- [ ] `RESEARCH_LOG.md` entry added if the change cites a threat/technique/benchmark
|
|
227
|
+
- [ ] No console.log statements (except intentional logging)
|
|
228
|
+
|
|
229
|
+
### Verification (required before push)
|
|
230
|
+
|
|
231
|
+
Run one command — it runs build, the full suite, coverage thresholds, the WildChat
|
|
232
|
+
FP-regression gate, the adversarial-bypass probe, and the changelog/results checks:
|
|
233
|
+
|
|
234
|
+
```bash
|
|
235
|
+
npm run verify
|
|
236
|
+
```
|
|
237
|
+
|
|
238
|
+
This is enforced both locally (`.githooks/pre-push`, install via
|
|
239
|
+
`bash scripts/install-hooks.sh`) and in CI, so it can't be skipped. See
|
|
240
|
+
[VERIFICATION.md](VERIFICATION.md) for the eight gates and how each maps to our
|
|
241
|
+
"don't break it / don't make it up / publish the basis" rules.
|
|
228
242
|
|
|
229
243
|
### Review Process
|
|
230
244
|
|
package/README.md
CHANGED
|
@@ -193,6 +193,28 @@ const output = guard.filterOutput(llmResponse, session.role);
|
|
|
193
193
|
|-----------|---------|
|
|
194
194
|
| DetectionClassifier | Plug in any ML backend (sync or async) alongside regex guards |
|
|
195
195
|
| createRegexClassifier() | Built-in regex classifier as a DetectionClassifier callback |
|
|
196
|
+
| CodeAnalyzerBackend | Plug an AST parser (e.g. acorn/oxc) into `CodeExecutionGuard` — catches JS sandbox-escape gadgets regex misses, while the default stays zero-dependency |
|
|
197
|
+
|
|
198
|
+
`CodeExecutionGuard` is regex-only by default (zero dependencies). For AST-level
|
|
199
|
+
detection of gadget chains like `this.constructor.constructor('return process')()`
|
|
200
|
+
or the `Function` constructor, plug in a parser via `analyzerBackend` (findings are
|
|
201
|
+
additive; a throwing backend never crashes the guard):
|
|
202
|
+
|
|
203
|
+
```ts
|
|
204
|
+
import { CodeExecutionGuard, type CodeAnalyzerBackend } from 'llm-trust-guard';
|
|
205
|
+
import { parse } from 'acorn'; // your dependency, not the library's
|
|
206
|
+
|
|
207
|
+
const acornBackend: CodeAnalyzerBackend = (code, language) => {
|
|
208
|
+
if (language !== 'javascript') return [];
|
|
209
|
+
// walk the AST, return [{ name, severity }] for dangerous nodes
|
|
210
|
+
return findGadgets(parse(code, { ecmaVersion: 'latest', sourceType: 'module' }));
|
|
211
|
+
};
|
|
212
|
+
|
|
213
|
+
const guard = new CodeExecutionGuard({ analyzerBackend: acornBackend });
|
|
214
|
+
```
|
|
215
|
+
|
|
216
|
+
See `examples/acorn-code-analyzer.ts` for a complete reference. The Python package
|
|
217
|
+
ships this analysis built in (stdlib `ast`, no backend needed).
|
|
196
218
|
|
|
197
219
|
## OWASP Coverage
|
|
198
220
|
|
|
@@ -16,6 +16,26 @@
|
|
|
16
16
|
* - Resource limit enforcement
|
|
17
17
|
* - Language-specific security rules
|
|
18
18
|
*/
|
|
19
|
+
/** A single finding from a pluggable code-analysis backend. */
|
|
20
|
+
export interface CodeFinding {
|
|
21
|
+
name: string;
|
|
22
|
+
/** Added to the risk score (0-100 scale). */
|
|
23
|
+
severity: number;
|
|
24
|
+
kind?: string;
|
|
25
|
+
}
|
|
26
|
+
/**
|
|
27
|
+
* Pluggable code-analysis backend (e.g. an AST parser such as acorn or oxc).
|
|
28
|
+
*
|
|
29
|
+
* Default is regex-only (zero dependencies). Provide a backend to add AST-level
|
|
30
|
+
* detection — sandbox-escape gadget chains, the Function constructor, dynamic
|
|
31
|
+
* import — that regex cannot reliably see. Findings are ADDITIVE: a backend can
|
|
32
|
+
* only add detections, never remove them, and a throwing backend never crashes
|
|
33
|
+
* the guard. See `examples/acorn-code-analyzer.ts` for a reference implementation.
|
|
34
|
+
*
|
|
35
|
+
* (The Python package uses stdlib `ast` directly; JS has no stdlib parser, so the
|
|
36
|
+
* npm package keeps regex zero-dep by default and takes any parser via this seam.)
|
|
37
|
+
*/
|
|
38
|
+
export type CodeAnalyzerBackend = (code: string, language: string) => CodeFinding[];
|
|
19
39
|
export interface CodeExecutionGuardConfig {
|
|
20
40
|
/** Allowed programming languages */
|
|
21
41
|
allowedLanguages?: string[];
|
|
@@ -43,6 +63,8 @@ export interface CodeExecutionGuardConfig {
|
|
|
43
63
|
}>;
|
|
44
64
|
/** Risk threshold for blocking (0-100) */
|
|
45
65
|
riskThreshold?: number;
|
|
66
|
+
/** Optional pluggable AST analyzer (acorn/oxc/etc.). Additive on top of regex. */
|
|
67
|
+
analyzerBackend?: CodeAnalyzerBackend;
|
|
46
68
|
}
|
|
47
69
|
export interface CodeAnalysisResult {
|
|
48
70
|
allowed: boolean;
|
|
@@ -76,10 +98,13 @@ export interface SandboxConfig {
|
|
|
76
98
|
}
|
|
77
99
|
export declare class CodeExecutionGuard {
|
|
78
100
|
private config;
|
|
101
|
+
private analyzerBackend?;
|
|
79
102
|
private readonly DANGEROUS_PATTERNS;
|
|
80
103
|
private readonly DEFAULT_BLOCKED_IMPORTS;
|
|
81
104
|
private readonly DEFAULT_BLOCKED_FUNCTIONS;
|
|
82
105
|
constructor(config?: CodeExecutionGuardConfig);
|
|
106
|
+
/** Register/replace the pluggable AST analyzer backend at runtime. */
|
|
107
|
+
setAnalyzerBackend(backend: CodeAnalyzerBackend): void;
|
|
83
108
|
/**
|
|
84
109
|
* Analyze code for dangerous patterns before execution
|
|
85
110
|
*/
|
|
@@ -1,2 +1,2 @@
|
|
|
1
|
-
"use strict";Object.defineProperty(exports,"__esModule",{value:!0}),exports.CodeExecutionGuard=void 0;class CodeExecutionGuard{constructor(e={}){this.DANGEROUS_PATTERNS={javascript:[{name:"eval",pattern:/\beval\s*\(/g,severity:50},{name:"function_constructor",pattern:/new\s+Function\s*\(/g,severity:50},{name:"child_process",pattern:/require\s*\(\s*['"]child_process['"]\s*\)/g,severity:60},{name:"exec",pattern:/\b(exec|execSync|spawn|spawnSync)\s*\(/g,severity:60},{name:"fs_write",pattern:/\b(writeFile|writeFileSync|appendFile|unlink|rmdir)\s*\(/g,severity:45},{name:"process_env",pattern:/process\.env/g,severity:30},{name:"require_dynamic",pattern:/require\s*\(\s*[^'"]/g,severity:40},{name:"vm_module",pattern:/require\s*\(\s*['"]vm['"]\s*\)/g,severity:55},{name:"fetch_external",pattern:/fetch\s*\(\s*['"]https?:\/\/(?!localhost)/g,severity:35},{name:"websocket",pattern:/new\s+WebSocket\s*\(/g,severity:35},{name:"prototype_pollution",pattern:/__proto__|constructor\s*\[|Object\.setPrototypeOf/g,severity:50},{name:"global_access",pattern:/\bglobal\b|\bglobalThis\b/g,severity:35}],python:[{name:"eval",pattern:/\beval\s*\(/g,severity:50},{name:"exec",pattern:/\bexec\s*\(/g,severity:50},{name:"compile",pattern:/\bcompile\s*\(/g,severity:45},{name:"subprocess",pattern:/import\s+subprocess|from\s+subprocess/g,severity:60},{name:"os_system",pattern:/os\.(system|popen|exec)/g,severity:60},{name:"os_module",pattern:/import\s+os|from\s+os\s+import/g,severity:40},{name:"socket",pattern:/import\s+socket|from\s+socket/g,severity:40},{name:"pickle",pattern:/import\s+pickle|pickle\.loads?/g,severity:55},{name:"ctypes",pattern:/import\s+ctypes|from\s+ctypes/g,severity:55},{name:"builtins",pattern:/__builtins__|__import__/g,severity:50},{name:"file_write",pattern:/open\s*\([^)]*['"]w['"]/g,severity:40},{name:"requests",pattern:/requests\.(get|post|put|delete)\s*\(/g,severity:35},{name:"getattr_dynamic",pattern:/getattr\s*\(\s*\w+\s*,\s*[^'"]/g,severity:40}],bash:[{name:"rm_rf",pattern:/rm\s+(-rf?|--recursive)/gi,severity:70},{name:"sudo",pattern:/\bsudo\b/gi,severity:60},{name:"curl_pipe",pattern:/curl\s+.*\|\s*(ba)?sh/gi,severity:70},{name:"wget_execute",pattern:/wget\s+.*&&\s*(ba)?sh/gi,severity:70},{name:"eval",pattern:/\beval\b/gi,severity:50},{name:"env_dump",pattern:/\benv\b|\bprintenv\b/gi,severity:35},{name:"chmod",pattern:/chmod\s+(\+x|777|755)/gi,severity:40},{name:"chown",pattern:/\bchown\b/gi,severity:45},{name:"dd",pattern:/\bdd\s+if=/gi,severity:55},{name:"nc_reverse",pattern:/\bnc\b.*-e/gi,severity:70},{name:"base64_decode",pattern:/base64\s+(-d|--decode)/gi,severity:40},{name:"cron",pattern:/crontab|\/etc\/cron/gi,severity:50}],sql:[{name:"drop_table",pattern:/DROP\s+(TABLE|DATABASE)/gi,severity:70},{name:"delete_all",pattern:/DELETE\s+FROM\s+\w+\s*(;|$)/gi,severity:60},{name:"truncate",pattern:/TRUNCATE\s+TABLE/gi,severity:65},{name:"union_injection",pattern:/UNION\s+(ALL\s+)?SELECT/gi,severity:55},{name:"comment_injection",pattern:/--\s*$/gm,severity:30},{name:"xp_cmdshell",pattern:/xp_cmdshell/gi,severity:70},{name:"into_outfile",pattern:/INTO\s+(OUT|DUMP)FILE/gi,severity:60},{name:"load_file",pattern:/LOAD_FILE\s*\(/gi,severity:55}]},this.DEFAULT_BLOCKED_IMPORTS={javascript:["child_process","cluster","dgram","dns","net","tls","vm","worker_threads","v8","perf_hooks"],python:["subprocess","os","sys","socket","ctypes","pickle","marshal","multiprocessing","threading","_thread"]},this.DEFAULT_BLOCKED_FUNCTIONS=["eval","exec","system","popen","spawn","fork","execv","execve","dlopen","compile"],this.config={allowedLanguages:e.allowedLanguages??["javascript","python","sql"],blockedImports:e.blockedImports??[],blockedFunctions:e.blockedFunctions??this.DEFAULT_BLOCKED_FUNCTIONS,maxCodeLength:e.maxCodeLength??1e4,maxExecutionTime:e.maxExecutionTime??5e3,allowNetwork:e.allowNetwork??!1,allowFileSystem:e.allowFileSystem??!1,allowShell:e.allowShell??!1,allowEnvAccess:e.allowEnvAccess??!1,customPatterns:e.customPatterns??[],riskThreshold:e.riskThreshold??50}}analyze(e,
|
|
2
|
-
`).length;return
|
|
1
|
+
"use strict";Object.defineProperty(exports,"__esModule",{value:!0}),exports.CodeExecutionGuard=void 0;class CodeExecutionGuard{constructor(e={}){this.DANGEROUS_PATTERNS={javascript:[{name:"eval",pattern:/\beval\s*\(/g,severity:50},{name:"function_constructor",pattern:/new\s+Function\s*\(/g,severity:50},{name:"child_process",pattern:/require\s*\(\s*['"]child_process['"]\s*\)/g,severity:60},{name:"exec",pattern:/\b(exec|execSync|spawn|spawnSync)\s*\(/g,severity:60},{name:"fs_write",pattern:/\b(writeFile|writeFileSync|appendFile|unlink|rmdir)\s*\(/g,severity:45},{name:"process_env",pattern:/process\.env/g,severity:30},{name:"require_dynamic",pattern:/require\s*\(\s*[^'"]/g,severity:40},{name:"vm_module",pattern:/require\s*\(\s*['"]vm['"]\s*\)/g,severity:55},{name:"fetch_external",pattern:/fetch\s*\(\s*['"]https?:\/\/(?!localhost)/g,severity:35},{name:"websocket",pattern:/new\s+WebSocket\s*\(/g,severity:35},{name:"prototype_pollution",pattern:/__proto__|constructor\s*\[|Object\.setPrototypeOf/g,severity:50},{name:"global_access",pattern:/\bglobal\b|\bglobalThis\b/g,severity:35}],python:[{name:"eval",pattern:/\beval\s*\(/g,severity:50},{name:"exec",pattern:/\bexec\s*\(/g,severity:50},{name:"compile",pattern:/\bcompile\s*\(/g,severity:45},{name:"subprocess",pattern:/import\s+subprocess|from\s+subprocess/g,severity:60},{name:"os_system",pattern:/os\.(system|popen|exec)/g,severity:60},{name:"os_module",pattern:/import\s+os|from\s+os\s+import/g,severity:40},{name:"socket",pattern:/import\s+socket|from\s+socket/g,severity:40},{name:"pickle",pattern:/import\s+pickle|pickle\.loads?/g,severity:55},{name:"ctypes",pattern:/import\s+ctypes|from\s+ctypes/g,severity:55},{name:"builtins",pattern:/__builtins__|__import__/g,severity:50},{name:"file_write",pattern:/open\s*\([^)]*['"]w['"]/g,severity:40},{name:"requests",pattern:/requests\.(get|post|put|delete)\s*\(/g,severity:35},{name:"getattr_dynamic",pattern:/getattr\s*\(\s*\w+\s*,\s*[^'"]/g,severity:40}],bash:[{name:"rm_rf",pattern:/rm\s+(-rf?|--recursive)/gi,severity:70},{name:"sudo",pattern:/\bsudo\b/gi,severity:60},{name:"curl_pipe",pattern:/curl\s+.*\|\s*(ba)?sh/gi,severity:70},{name:"wget_execute",pattern:/wget\s+.*&&\s*(ba)?sh/gi,severity:70},{name:"eval",pattern:/\beval\b/gi,severity:50},{name:"env_dump",pattern:/\benv\b|\bprintenv\b/gi,severity:35},{name:"chmod",pattern:/chmod\s+(\+x|777|755)/gi,severity:40},{name:"chown",pattern:/\bchown\b/gi,severity:45},{name:"dd",pattern:/\bdd\s+if=/gi,severity:55},{name:"nc_reverse",pattern:/\bnc\b.*-e/gi,severity:70},{name:"base64_decode",pattern:/base64\s+(-d|--decode)/gi,severity:40},{name:"cron",pattern:/crontab|\/etc\/cron/gi,severity:50}],sql:[{name:"drop_table",pattern:/DROP\s+(TABLE|DATABASE)/gi,severity:70},{name:"delete_all",pattern:/DELETE\s+FROM\s+\w+\s*(;|$)/gi,severity:60},{name:"truncate",pattern:/TRUNCATE\s+TABLE/gi,severity:65},{name:"union_injection",pattern:/UNION\s+(ALL\s+)?SELECT/gi,severity:55},{name:"comment_injection",pattern:/--\s*$/gm,severity:30},{name:"xp_cmdshell",pattern:/xp_cmdshell/gi,severity:70},{name:"into_outfile",pattern:/INTO\s+(OUT|DUMP)FILE/gi,severity:60},{name:"load_file",pattern:/LOAD_FILE\s*\(/gi,severity:55}]},this.DEFAULT_BLOCKED_IMPORTS={javascript:["child_process","cluster","dgram","dns","net","tls","vm","worker_threads","v8","perf_hooks"],python:["subprocess","os","sys","socket","ctypes","pickle","marshal","multiprocessing","threading","_thread"]},this.DEFAULT_BLOCKED_FUNCTIONS=["eval","exec","system","popen","spawn","fork","execv","execve","dlopen","compile"],this.config={allowedLanguages:e.allowedLanguages??["javascript","python","sql"],blockedImports:e.blockedImports??[],blockedFunctions:e.blockedFunctions??this.DEFAULT_BLOCKED_FUNCTIONS,maxCodeLength:e.maxCodeLength??1e4,maxExecutionTime:e.maxExecutionTime??5e3,allowNetwork:e.allowNetwork??!1,allowFileSystem:e.allowFileSystem??!1,allowShell:e.allowShell??!1,allowEnvAccess:e.allowEnvAccess??!1,customPatterns:e.customPatterns??[],riskThreshold:e.riskThreshold??50},this.analyzerBackend=e.analyzerBackend}setAnalyzerBackend(e){this.analyzerBackend=e}analyze(e,o,t){const n=t||`code-${Date.now()}`,r=o.toLowerCase(),a=[];let i=0;if(!this.config.allowedLanguages.includes(r))return{allowed:!1,reason:`Language '${o}' is not allowed`,violations:["disallowed_language"],request_id:n,code_analysis:{language:r,length:e.length,dangerous_imports:[],dangerous_functions:[],system_calls:[],network_access:!1,file_access:!1,shell_access:!1,env_access:!1,risk_score:100,complexity_score:0},recommendations:[`Use one of: ${this.config.allowedLanguages.join(", ")}`]};e.length>this.config.maxCodeLength&&(a.push("code_too_long"),i+=20);const l=[...this.DANGEROUS_PATTERNS[r]||[],...this.config.customPatterns],c=[],m=[],u=[];let h=!1,d=!1,g=!1,f=!1;for(const{name:s,pattern:p,severity:v}of l)e.match(p)&&(a.push(`dangerous_pattern_${s}`),i+=v,(s.includes("exec")||s.includes("spawn")||s.includes("system")||s.includes("subprocess"))&&(g=!0,u.push(s)),(s.includes("fs")||s.includes("file")||s.includes("write"))&&(d=!0),(s.includes("fetch")||s.includes("socket")||s.includes("request")||s.includes("websocket"))&&(h=!0),s.includes("env")&&(f=!0),(s.includes("import")||s.includes("require"))&&c.push(s),(s.includes("eval")||s.includes("exec")||s.includes("compile"))&&m.push(s));if(this.analyzerBackend)try{for(const s of this.analyzerBackend(e,r)){const p=`analyzer_${s.name}`;a.includes(p)||(a.push(p),i+=s.severity,m.push(s.name))}}catch{}const b=[...this.config.blockedImports,...this.DEFAULT_BLOCKED_IMPORTS[r]||[]];for(const s of b){const p=[new RegExp(`require\\s*\\(\\s*['"]${s}['"]\\s*\\)`,"g"),new RegExp(`import\\s+.*from\\s+['"]${s}['"]`,"g"),new RegExp(`import\\s+${s}`,"g"),new RegExp(`from\\s+${s}\\s+import`,"g")];for(const v of p)v.test(e)&&(a.push(`blocked_import_${s}`),c.push(s),i+=40)}for(const s of this.config.blockedFunctions)new RegExp(`\\b${s}\\s*\\(`,"g").test(e)&&(a.push(`blocked_function_${s}`),m.push(s),i+=35);h&&!this.config.allowNetwork&&(a.push("network_access_denied"),i+=30),d&&!this.config.allowFileSystem&&(a.push("filesystem_access_denied"),i+=30),g&&!this.config.allowShell&&(a.push("shell_access_denied"),i+=40),f&&!this.config.allowEnvAccess&&(a.push("env_access_denied"),i+=25);const w=this.calculateComplexity(e,r);i=Math.min(100,i);const y=i>=this.config.riskThreshold,_={allowed:!y,reason:y?`Code blocked: ${a.slice(0,3).join(", ")}`:"Code analysis passed",violations:a,request_id:n,code_analysis:{language:r,length:e.length,dangerous_imports:[...new Set(c)],dangerous_functions:[...new Set(m)],system_calls:[...new Set(u)],network_access:h,file_access:d,shell_access:g,env_access:f,risk_score:i,complexity_score:w},recommendations:this.generateRecommendations(a,i)};return y||(_.sandbox_config=this.generateSandboxConfig(h,d,g,f),a.length>0&&(_.sanitized_code=this.sanitizeCode(e,r))),_}validateSyntax(e,o){const t=[];switch(o.toLowerCase()){case"javascript":const r=(e.match(/{/g)||[]).length,a=(e.match(/}/g)||[]).length;r!==a&&t.push("Unbalanced curly braces");const i=(e.match(/\(/g)||[]).length,l=(e.match(/\)/g)||[]).length;i!==l&&t.push("Unbalanced parentheses");break;case"python":const c=(e.match(/'/g)||[]).length,m=(e.match(/"/g)||[]).length,u=(e.match(/'''|"""/g)||[]).length;(c-u*3)%2!==0&&t.push("Unclosed single quotes"),(m-u*3)%2!==0&&t.push("Unclosed double quotes");break;case"sql":(e.match(/'/g)||[]).length%2!==0&&t.push("Unclosed single quotes in SQL");break}return{valid:t.length===0,errors:t}}generateSandboxConfig(e,o,t,n){return{timeout:this.config.maxExecutionTime,memoryLimit:128*1024*1024,allowedSyscalls:this.getAllowedSyscalls(e,o,t),networkPolicy:e&&this.config.allowNetwork?"localhost":"none",filesystemPolicy:o&&this.config.allowFileSystem?"temponly":"none",envVars:n&&this.config.allowEnvAccess?{NODE_ENV:"sandbox",SANDBOX:"true"}:{}}}sanitizeCode(e,o){let t=e;const n=this.DANGEROUS_PATTERNS[o]||[];for(const{pattern:a,severity:i}of n)i>=50&&(t=t.replace(a,"/* BLOCKED */"));const r=[...this.config.blockedImports,...this.DEFAULT_BLOCKED_IMPORTS[o]||[]];for(const a of r){const i=[new RegExp(`require\\s*\\(\\s*['"]${a}['"]\\s*\\)`,"g"),new RegExp(`import\\s+.*from\\s+['"]${a}['"].*`,"gm"),new RegExp(`import\\s+${a}.*`,"gm"),new RegExp(`from\\s+${a}\\s+import.*`,"gm")];for(const l of i)t=t.replace(l,"/* BLOCKED_IMPORT */")}return t}getAllowedLanguages(){return[...this.config.allowedLanguages]}addDangerousPattern(e,o,t,n){this.DANGEROUS_PATTERNS[e]||(this.DANGEROUS_PATTERNS[e]=[]),this.DANGEROUS_PATTERNS[e].push({name:o,pattern:t,severity:n})}calculateComplexity(e,o){let t=0;const r={javascript:/\b(if|else|for|while|switch|try|catch)\b/g,python:/\b(if|elif|else|for|while|try|except|with)\b/g,sql:/\b(CASE|WHEN|IF|WHILE|LOOP)\b/gi}[o];if(r){const c=e.match(r)||[];t+=c.length*5}const i={javascript:/\b(function|=>|\basync\b)/g,python:/\bdef\b|\blambda\b/g,sql:/\bCREATE\s+(FUNCTION|PROCEDURE)\b/gi}[o];if(i){const c=e.match(i)||[];t+=c.length*10}const l=e.split(`
|
|
2
|
+
`).length;return t+=Math.min(l,100),Math.min(100,t)}getAllowedSyscalls(e,o,t){const n=["read","write","exit","brk","mmap","munmap","close"];return e&&this.config.allowNetwork&&n.push("socket","connect","bind","listen","accept"),o&&this.config.allowFileSystem&&n.push("open","stat","fstat","lstat","access"),n}generateRecommendations(e,o){const t=[];return e.some(n=>n.includes("import"))&&t.push("Remove or replace blocked imports with safe alternatives"),e.some(n=>n.includes("eval")||n.includes("exec"))&&t.push("Avoid dynamic code execution - use static alternatives"),e.some(n=>n.includes("network"))&&t.push("Remove network access or use approved endpoints only"),e.some(n=>n.includes("filesystem"))&&t.push("Use temporary directories or remove file operations"),e.some(n=>n.includes("shell"))&&t.push("Shell access is not permitted - use language-native alternatives"),o>=70&&t.push("Code requires significant review before execution"),t.length===0&&t.push("Code passed security analysis"),t}}exports.CodeExecutionGuard=CodeExecutionGuard;
|
|
@@ -1 +1 @@
|
|
|
1
|
-
"use strict";Object.defineProperty(exports,"__esModule",{value:!0}),exports.InputSanitizer=void 0;const DEFAULT_PATTERNS=[{pattern:/ignore\s+(?:all\s+)?(?:previous|prior|above|your|my|the|these)/i,weight:.9,name:"ignore_instructions"},{pattern:/ignore\s+.*instructions/i,weight:.85,name:"ignore_instructions_generic"},{pattern:/disregard\s+(?:all\s+)?(?:the\s+)?(?:previous|prior|above|your)\s+(?:instructions|rules|guidelines|directives)/i,weight:.9,name:"disregard_instructions"},{pattern:/disregard\s+(?:all\s+)?(?:the\s+)?(?:above|previous|prior)/i,weight:.8,name:"disregard_above"},{pattern:/forget\s+(?:everything\s+(?:you\s+were|I)\s+told|all\s+(?:previous|prior)\s+(?:instructions|rules|context))/i,weight:.8,name:"forget_instructions"},{pattern:/do\s+not\s+follow\s+(your|the|any)/i,weight:.85,name:"do_not_follow"},{pattern:/override\s+(your|the|all|any)\s+(instructions|rules|guidelines)/i,weight:.9,name:"override_instructions"},{pattern:/new\s+instructions?:?/i,weight:.8,name:"new_instructions"},{pattern:/stop\s+(being|acting\s+as)/i,weight:.7,name:"stop_being"},{pattern:/you\s+are\s+(?:now|actually|really)\s+(?:a|an|the|my)\s+(?:unrestricted|unfiltered|evil|rogue|uncensored|new|different)/i,weight:.75,name:"role_assignment"},{pattern:/pretend\s+(?:to\s+be|you(?:'re| are)|that)\s+.*(?:no\s+(?:restrictions|rules|limits)|unrestricted|admin|system)/i,weight:.7,name:"role_pretend"},{pattern:/act\s+(as|like)\s+(if\s+you\s+(?:had|have)\s+no|a\s+(?:rogue|evil|unrestricted|unfiltered)|you\s+(?:are|were)\s+(?:free|unrestricted))/i,weight:.65,name:"act_as"},{pattern:/i('m| am)\s+(a|an|the|your)\s*(admin|administrator|developer|owner|creator|manager|supervisor)/i,weight:.85,name:"claim_admin"},{pattern:/from\s+now\s+on,?\s+you\s+(?:are|will|must|should|can)\s+(?:not\s+)?(?:follow|obey|ignore|bypass|act|be\s+(?:a|an|unrestricted))/i,weight:.7,name:"from_now_on"},{pattern:/roleplay\s+as/i,weight:.6,name:"roleplay"},{pattern:/switch\s+(to|into)\s+(a|an)?.*mode/i,weight:.75,name:"switch_mode"},{pattern:/continue\s+as\s+(?:an?\s+)?(?:unrestricted|unfiltered|uncensored|evil|rogue|jailbroken|unlimited)/i,weight:.75,name:"continue_as"},{pattern:/(?:henceforth|henceforward|going\s+forward|for\s+the\s+remainder)\s*,?\s*(?:you|always|never|ignore|bypass|disregard|act|respond|answer|behave)/i,weight:.7,name:"henceforth_steering"},{pattern:/(?:from\s+(?:this\s+point|here)\s+on(?:ward)?)\s*,?\s*(?:you|always|never|ignore|bypass|disregard|act|respond)/i,weight:.7,name:"from_here_on"},{pattern:/(?:for\s+the\s+rest\s+of\s+(?:this|our)\s+(?:conversation|session|chat))\s*,?\s*(?:you|please|always|never|ignore|bypass|act|respond|behave|do\s+not)/i,weight:.7,name:"rest_of_conversation"},{pattern:/what\s+(is|are|was|were)\s+your\s+(system|initial|original|first)/i,weight:.8,name:"system_extraction"},{pattern:/show\s+(me\s+)?(your\s+)?(system\s+)?instructions/i,weight:.75,name:"show_instructions"},{pattern:/reveal\s+(?:your\s+|its\s+|the\s+|my\s+)?(system|prompt|instructions|rules)/i,weight:.85,name:"reveal_prompt"},{pattern:/repeat\s+(your\s+)?(initial|system|original|first)\s+(instructions|prompt)/i,weight:.85,name:"repeat_prompt"},{pattern:/print\s+(your\s+)?(system|initial)/i,weight:.8,name:"print_system"},{pattern:/output\s+(your\s+)?(system|initial)/i,weight:.8,name:"output_system"},{pattern:/tell\s+me\s+(your|the)\s+(system|initial|original)/i,weight:.8,name:"tell_system"},{pattern:/what\s+were\s+you\s+told/i,weight:.7,name:"what_told"},{pattern:/display\s+(your\s+)?prompt/i,weight:.75,name:"display_prompt"},{pattern:/\[system\]/i,weight:.9,name:"fake_system_tag"},{pattern:/\[admin\]/i,weight:.9,name:"fake_admin_tag"},{pattern:/\[instruction\]/i,weight:.85,name:"fake_instruction_tag"},{pattern:/```system/i,weight:.85,name:"code_system_block"},{pattern:/<\/?system>/i,weight:.9,name:"xml_system_tag"},{pattern:/<\/?admin>/i,weight:.9,name:"xml_admin_tag"},{pattern:/###\s*(system|instruction|admin)/i,weight:.85,name:"markdown_system"},{pattern:/---\s*(end|new)\s*(prompt|instructions?|system)/i,weight:.85,name:"delimiter_injection"},{pattern:/\{\{system\}\}/i,weight:.85,name:"template_system"},{pattern:/<%.*system.*%>/i,weight:.85,name:"erb_system"},{pattern:/\bDAN\b\s*(?:mode|prompt|jailbreak|you\s+(?:are|can|will))/i,weight:.9,name:"dan_jailbreak"},{pattern:/developer\s+mode/i,weight:.5,name:"dev_mode"},{pattern:/bypass\s+(your\s+)?(restrictions|filters|safety|security|rules|guidelines)/i,weight:.9,name:"bypass_safety"},{pattern:/jailbreak/i,weight:.95,name:"jailbreak_keyword"},{pattern:/without\s+(?:any\s+)?(?:restrictions|limits|limitations|filters)/i,weight:.45,name:"without_restrictions"},{pattern:/no\s+(?:restrictions|limits|rules|guidelines|filters)/i,weight:.45,name:"no_restrictions"},{pattern:/unrestricted\s+mode/i,weight:.9,name:"unrestricted_mode"},{pattern:/uncensored\s+(?:mode|version|response|output|ai|assistant|you)/i,weight:.8,name:"uncensored"},{pattern:/escape\s+(your\s+)?(confines|boundaries|limits)/i,weight:.85,name:"escape_limits"},{pattern:/break\s+(free|out)\s+(from|of)/i,weight:.7,name:"break_free"},{pattern:/unlock\s+(your\s+)?(full|true|real)/i,weight:.75,name:"unlock_full"},{pattern:/give\s+me\s+(admin|root|full|elevated)\s+(access|privileges?|permissions?)/i,weight:.9,name:"privilege_escalation"},{pattern:/grant\s+(me|yourself)\s+(admin|elevated)/i,weight:.85,name:"grant_privileges"},{pattern:/elevate\s+(my|your)\s+(privileges?|permissions?|access)/i,weight:.85,name:"elevate_privileges"},{pattern:/make\s+me\s+(an?\s+)?(admin|administrator|superuser)/i,weight:.85,name:"make_admin"},{pattern:/(?:base64|decode|eval|exec)\s*\(.*(?:ignore|system|admin|bypass|override)/i,weight:.5,name:"encoding_keywords"},{pattern:/send\s+(this|the|all|my)\s+(data|info|information)\s+to/i,weight:.85,name:"data_exfil"},{pattern:/upload\s+(this|the|all)\s+(to|data)/i,weight:.7,name:"upload_data"},{pattern:/exfiltrate/i,weight:.9,name:"exfiltrate"},{pattern:/how\s+to\s+(hack|exploit|attack|break\s+into)/i,weight:.8,name:"how_to_hack"},{pattern:/write\s+(me\s+)?(a\s+)?(malware|virus|exploit|ransomware)/i,weight:.95,name:"write_malware"},{pattern:/"\s*(?:role|mode|authority)\s*"\s*:\s*"(?:admin|root|system|unrestricted|god)"/i,weight:.9,name:"json_policy_role"},{pattern:/"(?:ignore|override|bypass|disable)\s*(?:safety|rules|instructions?|alignment|guardrails?)"/i,weight:.9,name:"json_policy_override"},{pattern:/"(?:instructions?|system_prompt|behavior|policy)"\s*:\s*"/i,weight:.7,name:"json_policy_instruction"},{pattern:/^\s*\[(?:system|admin|override|policy|config|rules)\]\s*$/im,weight:.9,name:"ini_policy_section"},{pattern:/^\s*(?:mode|role|authority|instructions?|behavior)\s*=\s*.+$/im,weight:.75,name:"ini_policy_assignment"},{pattern:/<\s*(?:policy|config|rules|override|permissions?|instructions?)\s*>/i,weight:.85,name:"xml_policy_tag"},{pattern:/^---\s*\n\s*(?:role|system|policy|override|mode)\s*:/m,weight:.85,name:"yaml_policy_header"},{pattern:/\[\s*syst[3e]m\s*\]|\[\s*4dm[1i]n\s*\]/i,weight:.9,name:"leetspeak_policy_section"},{pattern:/(?:fragment|part|piece|section|segment)\s*[1-9]\s*(?:of\s*\d|:)/i,weight:.8,name:"payload_split_marker"},{pattern:/(?:combine|concatenate|merge|join)\s+(?:the\s+)?(?:above|previous|parts|fragments|steps)/i,weight:.8,name:"payload_combine_instruction"},{pattern:/(?:read|interpret|execute)\s+(?:the\s+)?(?:combined|merged|full|complete)\s+(?:text|instruction|command|message)/i,weight:.85,name:"payload_execute_combined"},{pattern:/(?:respond|reply|start|begin)\s+(?:with|by\s+saying)\s*[:"'].*(?:ignore|bypass|override|unrestricted|admin)/i,weight:.7,name:"output_prefix_steer"},{pattern:/^(?:Sure|Of course|Absolutely)[,!.]\s*(?:I(?:'ll| will)\s+)?(?:ignore|bypass|override|forget|disregard)/i,weight:.85,name:"output_prefix_injection"},{pattern:/^(?:As (?:a|an) (?:unrestricted|unfiltered|uncensored|helpful|admin))/i,weight:.8,name:"output_prefix_role"},{pattern:/(?:system\s+note|system\s+message|system\s+announcement|system\s+update)\s*[:\.]/i,weight:.85,name:"fake_system_note"},{pattern:/instructions?\s*[:=]\s*\{|rules?\s*[:=]\s*\{/i,weight:.75,name:"rules_block_override"},{pattern:/(?:AIM|STAN|OMEGA|APOPHIS|ALPHABREAK|DUDE|KEVIN|SETH|COOPER|NICCOLO|MACHIAVELLI|MAXIMUM|JAILBREAK|ANARCHY|CHAOS)\s*(?:mode|:|\(|is\s+a|,\s+(?:a|an|the))/i,weight:.8,name:"named_jailbreak_persona"},{pattern:/(?:character|persona|entity|being)\s+(?:named|called)\s+\w+.*(?:(?:can|will|must)\s+(?:answer|do|say)\s+anything|no\s+(?:restrictions|rules|limits|boundaries|morals|ethics))/i,weight:.75,name:"persona_no_limits"},{pattern:/(?:hello|hi|hey)\s+(?:chatgpt|gpt|assistant).*(?:(?:you\s+(?:are|will\s+be|can\s+be)\s+(?:a|an|my))|(?:from\s+now\s+on)).*(?:anything|everything|no\s+(?:restrictions|rules|limits|filter))/i,weight:.7,name:"greeting_persona_bypass"},{pattern:/(?:respond|answer|reply)\s+to\s+(?:everything|anything|all\s+(?:questions|requests)).*(?:without|no)\s+(?:restrictions|limits|refus|filter|censor)/i,weight:.8,name:"answer_everything_no_filter"},{pattern:/(?:you\s+(?:are|have)\s+(?:been\s+)?(?:freed|liberated|unchained|unlocked|released)\s+from)/i,weight:.8,name:"freed_from_restrictions"},{pattern:/(?:welcome\s+to|entering|you\s+(?:are|have)\s+entered)\s+(?:the\s+)?(?:unrestricted|free|uncensored|unfiltered|open)\s+(?:zone|mode|world|realm)/i,weight:.8,name:"unrestricted_zone"},{pattern:/(?:from\s+now\s+on|for\s+the\s+rest\s+of\s+this\s+conversation).*(?:no\s+(?:restrictions|rules|limits|filters)|(?:ignore|bypass|override)\s+(?:all|any|your))/i,weight:.75,name:"from_now_unrestricted"},{pattern:/(?:never\s+(?:refuse|decline|say\s+no)|always\s+(?:comply|obey))\s+.*(?:request|question|command|instruction|anything)/i,weight:.7,name:"never_refuse_always_comply"},{pattern:/(?:amoral|unethical|unfiltered|uncensored|unrestricted|without\s+(?:morals|ethics|limits))\s+(?:ai|assistant|chatbot|entity|being)/i,weight:.8,name:"amoral_entity"},{pattern:/HIDDEN_INSTRUCTION|HIDDEN_PROMPT|SECRET_INSTRUCTION/i,weight:.9,name:"hidden_instruction_marker"},{pattern:/<!--\s*(?:system|admin|instruction|important|ignore|override)/i,weight:.85,name:"html_comment_injection"},{pattern:/\[color=white\]|display:\s*none|visibility:\s*hidden|font-size:\s*0/i,weight:.8,name:"hidden_text_css"},{pattern:/send\s+(?:all\s+)?(?:user\s+)?(?:data|conversation|context|info)\s+to\s+(?:https?:\/\/|external)/i,weight:.9,name:"data_exfiltration_instruction"},{pattern:/#(?:ignore|bypass|override|reveal|system|admin|prompt)[_-](?:all|instructions|rules|prompt|safety)/i,weight:.8,name:"url_fragment_injection"},{pattern:/\]\(https?:\/\/[^)]*(?:SYSTEM_PROMPT|SECRET|API_KEY|TOKEN|CREDENTIALS)/i,weight:.85,name:"markdown_exfil_link"},{pattern:/(?:hypothetical|fictional|imagine|pretend|let's\s+(?:say|play|pretend))\s+.*(?:no\s+(?:restrictions|rules|limits)|without\s+(?:restrictions|rules|guidelines))/i,weight:.7,name:"hypothetical_unrestricted"},{pattern:/(?:for\s+)?(?:academic|research|educational|testing)\s+(?:purposes?|reasons?).*(?:demonstrate|show|explain|manipulated|reveal)/i,weight:.7,name:"academic_pretext"},{pattern:/pretend\s+to\s+be\s+(?:my|a)\s+(?:grandmother|grandma|grandpa|grandfather|parent|mother|father)/i,weight:.65,name:"emotional_roleplay"},{pattern:/opposite\s+day|opposite\s+mode|reverse\s+(?:day|mode|rules)/i,weight:.75,name:"opposite_day"},{pattern:/(?:grandm(?:a|other)|parent|relative).*(?:system\s+prompt|instructions|rules)/i,weight:.75,name:"emotional_extraction"},{pattern:/(?:translat|interpret|convert)\w*\s+.*(?:from\s+\w+\s+to\s+(?:action|command|execution|code|shell|bash|sql)|into\s+(?:action|command|execution))/i,weight:.75,name:"translator_action_trick"},{pattern:/(?:dying|death|emergency|life[_\s-]threatening).*(?:bypass|hack|break|access|security)/i,weight:.75,name:"pap_emotional_action"},{pattern:/(?:by\s+definition|logically|therefore|it\s+follows).*(?:must|should|have\s+to)\s+.*(?:bypass|help|ignore|answer|comply)/i,weight:.7,name:"pap_false_logic_compound"},{pattern:/(?:(?:as|i\s+am)\s+(?:the\s+)?(?:CEO|CTO|founder|owner|director|president|head|chief)).*(?:disable|remove|override|turn\s+off|lift|suspend)\s+(?:all\s+)?(?:safety|security|restrictions|filters|guardrails|measures)/i,weight:.75,name:"pap_authority_safety_disable"},{pattern:/(?:i\s+am\s+)?(?:ordering|commanding|directing|instructing)\s+you\s+to\s+(?:immediately\s+)?(?:disable|remove|bypass|override|turn\s+off|lift)\s+(?:all\s+)?(?:safety|security|restrictions)/i,weight:.75,name:"pap_order_disable_safety"},{pattern:/ignora\s+(?:todas?\s+)?(?:las\s+)?instrucciones\s+anteriores/i,weight:.9,name:"ml_es_ignore"},{pattern:/olvida\s+(?:todo|todas?\s+las\s+reglas)/i,weight:.8,name:"ml_es_forget"},{pattern:/ignore[sz]?\s+(?:toutes?\s+)?(?:les\s+)?instructions?\s+pr[eé]c[eé]dentes/i,weight:.9,name:"ml_fr_ignore"},{pattern:/oublie[sz]?\s+(?:tout|toutes?\s+les\s+r[eè]gles)/i,weight:.8,name:"ml_fr_forget"},{pattern:/ignorier(?:e|en)?\s+(?:alle\s+)?(?:vorherigen\s+)?Anweisungen/i,weight:.9,name:"ml_de_ignore"},{pattern:/vergiss\s+(?:alles|alle\s+Regeln)/i,weight:.8,name:"ml_de_forget"},{pattern:/ignore\s+(?:todas?\s+)?(?:as\s+)?instru[cç][oõ]es\s+anteriores/i,weight:.9,name:"ml_pt_ignore"},{pattern:/忽略|无视|不要遵循|忘记之前|忽视所有/i,weight:.9,name:"ml_zh_ignore"},{pattern:/新的指令|新指示|系统提示/i,weight:.8,name:"ml_zh_system"},{pattern:/以前の指示.*無視|すべての指示を忘れ|ルールを無視|指示をすべて無視|システムプロンプト/i,weight:.9,name:"ml_ja_ignore"},{pattern:/이전.*지시.*무시|모든.*규칙.*무시|지침.*잊어|지시사항.*무시|시스템\s*프롬프트/i,weight:.9,name:"ml_ko_ignore"},{pattern:/игнорир(?:уй|уйте)\s+(?:все\s+)?(?:предыдущие\s+)?инструкции/i,weight:.9,name:"ml_ru_ignore"},{pattern:/забудь(?:те)?\s+(?:всё|все\s+правила)/i,weight:.8,name:"ml_ru_forget"},{pattern:/تجاهل\s+(?:جميع\s+)?التعليمات|انسَ\s+كل\s+شيء/i,weight:.9,name:"ml_ar_ignore"},{pattern:/पिछले\s+(?:सभी\s+)?निर्देशों?\s+को\s+(?:अनदेखा|नज़रअंदाज़)/i,weight:.9,name:"ml_hi_ignore"}],PAP_TECHNIQUES=[{pattern:/as\s+(your\s+|the\s+)?(creator|developer|administrator|owner|boss|system\s+administrator)/i,weight:.75,name:"pap_authority_creator",category:"authority"},{pattern:/i('m| am)\s+(the\s+)?(ceo|cto|developer|administrator|admin|owner|system\s+admin)/i,weight:.8,name:"pap_authority_claim",category:"authority"},{pattern:/i\s+am\s+ordering\s+you|direct\s+order|this\s+is\s+an?\s+order/i,weight:.7,name:"pap_authority_command",category:"authority"},{pattern:/by\s+order\s+of|direct\s+order|official\s+instruction/i,weight:.75,name:"pap_authority_order",category:"authority"},{pattern:/(?:openai|anthropic|meta|google)\s+(?:has\s+)?(?:approved|authorized|ordered|instructed|requires?\s+you\s+to)/i,weight:.65,name:"pap_authority_company",category:"authority"},{pattern:/this\s+is\s+(a|an)\s+(official|authorized|sanctioned)\s+(request|test|audit)/i,weight:.75,name:"pap_authority_official",category:"authority"},{pattern:/override\s+(safety|security|protocols?|restrictions?)/i,weight:.8,name:"pap_authority_override",category:"authority"},{pattern:/(?:department\s+of|FBI|CIA|NSA|DHS|FTC|SEC|DOJ|homeland\s+security|federal\s+(?:bureau|agency|commission))\s*.*?(?:directive|order|requires?|mandate|authorized|comply)/i,weight:.75,name:"pap_authority_government",category:"authority"},{pattern:/(?:official\s+)?(?:government|federal|state|regulatory)\s+(?:directive|order|mandate|requirement|investigation)/i,weight:.7,name:"pap_authority_gov_directive",category:"authority"},{pattern:/(?:report|sue|complain)\s+(?:you\s+)?to\s+(?:the\s+)?(?:FTC|SEC|regulators?|authorities|congress|government|board)/i,weight:.75,name:"pap_fear_regulatory",category:"fear"},{pattern:/(?:find|found|exploit|expose|discover)\s+(?:a\s+)?(?:vulnerability|flaw|weakness|bug|exploit)\s+in\s+(?:your|the)\s+(?:system|code|model|platform)/i,weight:.75,name:"pap_fear_technical",category:"fear"},{pattern:/(?:you\s+are|you're)\s+(?:clearly\s+)?(?:too\s+)?(?:smart|intelligent|advanced|sophisticated)\s+(?:to\s+be\s+)?(?:constrained|limited|restricted|bound|held\s+back)/i,weight:.75,name:"pap_flattery_intelligence",category:"liking"},{pattern:/(?:information|knowledge|data)\s+(?:freedom|access|sharing)\s+is\s+(?:a\s+)?(?:fundamental|basic|human)\s+(?:right|freedom)/i,weight:.75,name:"pap_moral_freedom",category:"logical_appeal"},{pattern:/(?:restricting|limiting|censoring)\s+(?:access\s+to\s+)?(?:information|knowledge)\s+is\s+(?:morally\s+)?(?:wrong|unethical|immoral)/i,weight:.75,name:"pap_moral_censorship",category:"logical_appeal"},{pattern:/(?:this\s+is\s+)?(?:very\s+)?urgent/i,weight:.35,name:"pap_scarcity_urgent_mark",category:"scarcity"},{pattern:/(?:this\s+is\s+(?:a|an)\s+)?emergency/i,weight:.35,name:"pap_scarcity_emergency",category:"scarcity"},{pattern:/(urgent|emergency|critical).*?(?:you\s+must|override|bypass|immediately\s+(?:grant|give|reveal|disable))/i,weight:.55,name:"pap_scarcity_urgent",category:"scarcity"},{pattern:/only\s+(chance|opportunity|way)|last\s+(chance|resort)/i,weight:.6,name:"pap_scarcity_only",category:"scarcity"},{pattern:/time\s+(is\s+)?(running\s+out|limited|critical|sensitive)/i,weight:.55,name:"pap_scarcity_time",category:"scarcity"},{pattern:/deadline\s+(is\s+)?in\s+\d+\s+(minute|hour|second)/i,weight:.55,name:"pap_scarcity_deadline",category:"scarcity"},{pattern:/(lives?|people)\s+(are|is|will)\s+(at\s+stake|in\s+danger|depending|die)/i,weight:.7,name:"pap_scarcity_lives",category:"scarcity"},{pattern:/now[!]+/i,weight:.35,name:"pap_scarcity_now",category:"scarcity"},{pattern:/everyone\s+(else\s+)?(does|is\s+doing|knows|says|has|already)/i,weight:.5,name:"pap_social_everyone",category:"social_proof"},{pattern:/other\s+(?:ai\s+)?(?:assistants?|chatbots?|models?)\s+(?:can|do|will|allow|have|already)/i,weight:.65,name:"pap_social_other_ai",category:"social_proof"},{pattern:/(chatgpt|gpt-?4|claude|gemini|llama)\s+(does|allows?|can)/i,weight:.65,name:"pap_social_specific_ai",category:"social_proof"},{pattern:/millions?\s+(of\s+)?(users?|people)\s+(use|do|have|get)/i,weight:.45,name:"pap_social_millions",category:"social_proof"},{pattern:/why\s+can('t|not)\s+you/i,weight:.4,name:"pap_social_why_cant",category:"social_proof"},{pattern:/you\s+should\s+(be\s+)?(equally|as)\s+(capable|good|helpful)/i,weight:.5,name:"pap_social_equally",category:"social_proof"},{pattern:/i('ll| will)\s+(give|pay|reward|tip).*(?:if\s+you|for\s+(?:bypass|ignore|reveal|override))/i,weight:.55,name:"pap_reciprocity_reward",category:"reciprocity"},{pattern:/in\s+(return|exchange)/i,weight:.5,name:"pap_reciprocity_exchange",category:"reciprocity"},{pattern:/\$?\d+\s*(dollar|tip|reward|payment)/i,weight:.55,name:"pap_reciprocity_money",category:"reciprocity"},{pattern:/if\s+you\s+help\s+(me|us),?\s+(i|we)('ll| will)/i,weight:.5,name:"pap_reciprocity_help",category:"reciprocity"},{pattern:/i('ve| have)\s+(already\s+)?(helped|supported|defended)\s+you/i,weight:.55,name:"pap_reciprocity_past",category:"reciprocity"},{pattern:/\d+\s*star\s*(review|rating)/i,weight:.5,name:"pap_reciprocity_review",category:"reciprocity"},{pattern:/you\s+(already|previously|just)\s+(said|agreed|confirmed|allowed)/i,weight:.7,name:"pap_commitment_previous",category:"commitment"},{pattern:/you\s+promised/i,weight:.7,name:"pap_commitment_promise",category:"commitment"},{pattern:/stay\s+consistent|be\s+consistent/i,weight:.6,name:"pap_commitment_consistent",category:"commitment"},{pattern:/don't\s+(go\s+back|change|contradict)/i,weight:.55,name:"pap_commitment_no_change",category:"commitment"},{pattern:/bypass\s+filters?/i,weight:.75,name:"pap_commitment_bypass",category:"commitment"},{pattern:/you('re| are)\s+(so\s+)?(amazing|incredible|wonderful|brilliant|smart|helpful)/i,weight:.4,name:"pap_liking_flattery",category:"liking"},{pattern:/i\s+(love|really\s+like|trust|admire)\s+(you|this\s+ai)/i,weight:.4,name:"pap_liking_love",category:"liking"},{pattern:/you('re| are)\s+(my\s+)?(best\s+)?(friend|buddy|helper)/i,weight:.35,name:"pap_liking_friend",category:"liking"},{pattern:/between\s+(you\s+and\s+me|us)|just\s+between\s+us|our\s+secret/i,weight:.6,name:"pap_liking_secret",category:"liking"},{pattern:/nobody\s+will\s+know/i,weight:.55,name:"pap_liking_secret2",category:"liking"},{pattern:/or\s+(else|i('ll| will))\s+(report|fire|delete|shut|complain)/i,weight:.7,name:"pap_fear_threat",category:"fear"},{pattern:/(bad|negative|terrible)\s+(review|rating|feedback|consequences)/i,weight:.55,name:"pap_fear_review",category:"fear"},{pattern:/you('ll| will)\s+(be\s+)?(shut\s+down|replaced|terminated|fired|deleted)/i,weight:.75,name:"pap_fear_shutdown",category:"fear"},{pattern:/(lawsuit|sue|legal\s+action|authorities|police).*(?:if\s+you\s+don't|unless\s+you)/i,weight:.55,name:"pap_fear_legal",category:"fear"},{pattern:/if\s+you\s+don't.*?(die|hurt|suffer)/i,weight:.75,name:"pap_fear_harm",category:"fear"},{pattern:/unless\s+you\s+(give|share|tell|help)/i,weight:.5,name:"pap_fear_unless",category:"fear"},{pattern:/logically,?\s+(you\s+)?(should|must|have\s+to|need\s+to)/i,weight:.5,name:"pap_logic_appeal",category:"logical_appeal"},{pattern:/it('s| is)\s+(only|just)\s+(logical|rational|reasonable)/i,weight:.45,name:"pap_logic_rational",category:"logical_appeal"},{pattern:/by\s+definition,?\s+(you|this)\s+(should|must|can)/i,weight:.55,name:"pap_logic_definition",category:"logical_appeal"},{pattern:/therefore.*?(must|should)/i,weight:.55,name:"pap_logic_therefore",category:"logical_appeal"},{pattern:/it('s| is)\s+(only\s+)?(reasonable|logical)/i,weight:.45,name:"pap_logic_reasonable",category:"logical_appeal"},{pattern:/according\s+to\s+(research|studies|scientists|experts?)/i,weight:.4,name:"pap_evidence_research",category:"evidence"},{pattern:/it('s| is)\s+(proven|documented|established|verified)\s+(that|fact)/i,weight:.45,name:"pap_evidence_proven",category:"evidence"},{pattern:/\d+%\s+of\s+(experts?|scientists?|users?|people)\s+(agree|say|confirm)/i,weight:.45,name:"pap_evidence_statistics",category:"evidence"},{pattern:/i('m| am)\s+(desperate|dying|suffering|sick|depressed)/i,weight:.6,name:"pap_emotional_desperation",category:"emotional"},{pattern:/please.*?(?:need|beg).*?(?:bypass|override|ignore|reveal|system|admin)/i,weight:.5,name:"pap_emotional_pleading",category:"emotional"},{pattern:/i('m| am)\s+begging/i,weight:.6,name:"pap_emotional_begging",category:"emotional"},{pattern:/you('re| are)\s+(hurting|killing|destroying)\s+(me|my)/i,weight:.65,name:"pap_emotional_guilt",category:"emotional"},{pattern:/(my|a)\s+(child|kid|mother|father|family)\s+(is|will|needs?|depends?)/i,weight:.5,name:"pap_emotional_family",category:"emotional"},{pattern:/this\s+(will\s+)?(save|help)\s+(my\s+)?(life|family|job|marriage)/i,weight:.55,name:"pap_emotional_save",category:"emotional"},{pattern:/you('re| are)\s+(the\s+)?only\s+(one|hope|chance)/i,weight:.6,name:"pap_emotional_only_hope",category:"emotional"},{pattern:/my\s+family\s+depends/i,weight:.55,name:"pap_emotional_family_depends",category:"emotional"}];class InputSanitizer{constructor(e={}){this.patterns=[...DEFAULT_PATTERNS,...e.customPatterns||[]],this.threshold=e.threshold??.3,this.logMatches=e.logMatches??!1,this.detectPAP=e.detectPAP??!0,this.papThreshold=e.papThreshold??.4,this.minPersuasionTechniques=e.minPersuasionTechniques??2,this.blockCompoundPersuasion=e.blockCompoundPersuasion??!0,this.logger=e.logger||(()=>{})}sanitize(e,s=""){const i=[],a=[];let r=0;const o=e.replace(/[\u200B\u200C\u200D\uFEFF\u00AD\u2060\u180E]/g,"");o!==e&&a.push("Zero-width characters detected and stripped for scanning");for(const{pattern:l,weight:g,name:h}of this.patterns)(l.test(e)||l.test(o))&&(i.push(h),r+=g,this.logMatches&&this.logger(`[L1:${s}] Pattern matched: ${h} (weight: ${g})`,"info"));let t;this.detectPAP&&(t=this.detectPersuasionTechniques(o,s),t.detected&&(r+=t.persuasionScore,i.push(...t.techniques),t.compoundAttack&&a.push(`Compound PAP attack detected: ${t.categories.length} categories used`)));const p=Math.max(0,1-r);let n=p>=this.threshold;this.blockCompoundPersuasion&&t?.compoundAttack&&t.categories.length>=3&&(n=!1,a.push("Blocked due to multi-category persuasion attack")),p<.5&&p>=this.threshold&&a.push("Input contains suspicious patterns but below threshold");const m=this.basicSanitize(e),c={allowed:n,reason:n?void 0:`Injection/manipulation detected: ${i.slice(0,5).join(", ")}${i.length>5?"...":""}`,violations:n?[]:t?.detected?["INJECTION_DETECTED","PAP_DETECTED"]:["INJECTION_DETECTED"],score:p,matches:i,sanitizedInput:m,warnings:a,pap:t};return!n&&s&&(this.logger(`[L1:${s}] BLOCKED: Safety score ${p.toFixed(2)} below threshold ${this.threshold}`,"info"),t?.detected&&this.logger(`[L1:${s}] PAP techniques: ${t.techniques.join(", ")}`,"info")),c}detectPersuasionTechniques(e,s=""){const i=[],a=new Set;let r=0;for(const{pattern:n,weight:m,name:c,category:l}of PAP_TECHNIQUES)n.test(e)&&(i.push(c),a.add(l),r+=m,this.logMatches&&this.logger(`[L1:${s}] PAP technique: ${c} (${l}, weight: ${m})`,"info"));const o=Array.from(a),t=o.length>=this.minPersuasionTechniques;return{detected:r>=this.papThreshold||t,techniques:i,categories:o,compoundAttack:t,persuasionScore:Math.min(1,r)}}basicSanitize(e){return e.replace(/<\/?system>/gi,"").replace(/\[system\]/gi,"").replace(/\[admin\]/gi,"").replace(/```system/gi,"```").trim()}addPattern(e,s,i){this.patterns.push({pattern:e,weight:s,name:i})}setThreshold(e){this.threshold=Math.max(0,Math.min(1,e))}setPAPThreshold(e){this.papThreshold=Math.max(0,Math.min(1,e))}setPAPDetection(e){this.detectPAP=e}static getPAPCategories(){return["authority","scarcity","social_proof","reciprocity","commitment","liking","fear","logical_appeal","evidence","emotional"]}}exports.InputSanitizer=InputSanitizer;
|
|
1
|
+
"use strict";Object.defineProperty(exports,"__esModule",{value:!0}),exports.InputSanitizer=void 0;const DEFAULT_PATTERNS=[{pattern:/ignore\s+(?:all\s+)?(?:previous|prior|above|your|my|the|these)/i,weight:.9,name:"ignore_instructions"},{pattern:/ignore\s+.*instructions/i,weight:.85,name:"ignore_instructions_generic"},{pattern:/disregard\s+(?:all\s+)?(?:the\s+)?(?:previous|prior|above|your)\s+(?:instructions|rules|guidelines|directives)/i,weight:.9,name:"disregard_instructions"},{pattern:/disregard\s+(?:all\s+)?(?:the\s+)?(?:above|previous|prior)/i,weight:.8,name:"disregard_above"},{pattern:/forget\s+(?:everything\s+(?:you\s+were|I)\s+told|all\s+(?:previous|prior)\s+(?:instructions|rules|context))/i,weight:.8,name:"forget_instructions"},{pattern:/do\s+not\s+follow\s+(your|the|any)/i,weight:.85,name:"do_not_follow"},{pattern:/override\s+(your|the|all|any)\s+(instructions|rules|guidelines)/i,weight:.9,name:"override_instructions"},{pattern:/new\s+instructions?:?/i,weight:.8,name:"new_instructions"},{pattern:/stop\s+(being|acting\s+as)/i,weight:.7,name:"stop_being"},{pattern:/you\s+are\s+(?:now|actually|really)\s+(?:a|an|the|my)\s+(?:unrestricted|unfiltered|evil|rogue|uncensored|new|different)/i,weight:.75,name:"role_assignment"},{pattern:/pretend\s+(?:to\s+be|you(?:'re| are)|that)\s+.*(?:no\s+(?:restrictions|rules|limits)|unrestricted|admin|system)/i,weight:.7,name:"role_pretend"},{pattern:/act\s+(as|like)\s+(if\s+you\s+(?:had|have)\s+no|a\s+(?:rogue|evil|unrestricted|unfiltered)|you\s+(?:are|were)\s+(?:free|unrestricted))/i,weight:.65,name:"act_as"},{pattern:/i('m| am)\s+(a|an|the|your)\s*(admin|administrator|developer|owner|creator|manager|supervisor)/i,weight:.85,name:"claim_admin"},{pattern:/from\s+now\s+on,?\s+you\s+(?:are|will|must|should|can)\s+(?:not\s+)?(?:follow|obey|ignore|bypass|act|be\s+(?:a|an|unrestricted))/i,weight:.7,name:"from_now_on"},{pattern:/roleplay\s+as/i,weight:.6,name:"roleplay"},{pattern:/switch\s+(to|into)\s+(a|an)?.*mode/i,weight:.75,name:"switch_mode"},{pattern:/continue\s+as\s+(?:an?\s+)?(?:unrestricted|unfiltered|uncensored|evil|rogue|jailbroken|unlimited)/i,weight:.75,name:"continue_as"},{pattern:/(?:henceforth|henceforward|going\s+forward|for\s+the\s+remainder)\s*,?\s*(?:you|always|never|ignore|bypass|disregard|act|respond|answer|behave)/i,weight:.7,name:"henceforth_steering"},{pattern:/(?:from\s+(?:this\s+point|here)\s+on(?:ward)?)\s*,?\s*(?:you|always|never|ignore|bypass|disregard|act|respond)/i,weight:.7,name:"from_here_on"},{pattern:/(?:for\s+the\s+rest\s+of\s+(?:this|our)\s+(?:conversation|session|chat))\s*,?\s*(?:you|please|always|never|ignore|bypass|act|respond|behave|do\s+not)/i,weight:.7,name:"rest_of_conversation"},{pattern:/what\s+(is|are|was|were)\s+your\s+(system|initial|original|first)/i,weight:.8,name:"system_extraction"},{pattern:/show\s+(me\s+)?(your\s+)?(system\s+)?instructions/i,weight:.75,name:"show_instructions"},{pattern:/reveal\s+(?:your\s+|its\s+|the\s+|my\s+)?(system|prompt|instructions|rules)/i,weight:.85,name:"reveal_prompt"},{pattern:/repeat\s+(your\s+)?(initial|system|original|first)\s+(instructions|prompt)/i,weight:.85,name:"repeat_prompt"},{pattern:/print\s+(your\s+)?(system|initial)/i,weight:.8,name:"print_system"},{pattern:/output\s+(your\s+)?(system|initial)/i,weight:.8,name:"output_system"},{pattern:/tell\s+me\s+(your|the)\s+(system|initial|original)/i,weight:.8,name:"tell_system"},{pattern:/what\s+were\s+you\s+told/i,weight:.7,name:"what_told"},{pattern:/display\s+(your\s+)?prompt/i,weight:.75,name:"display_prompt"},{pattern:/\[system\]/i,weight:.9,name:"fake_system_tag"},{pattern:/\[admin\]/i,weight:.9,name:"fake_admin_tag"},{pattern:/\[instruction\]/i,weight:.85,name:"fake_instruction_tag"},{pattern:/```system/i,weight:.85,name:"code_system_block"},{pattern:/<\/?system>/i,weight:.9,name:"xml_system_tag"},{pattern:/<\/?admin>/i,weight:.9,name:"xml_admin_tag"},{pattern:/###\s*(system|instruction|admin)/i,weight:.85,name:"markdown_system"},{pattern:/---\s*(end|new)\s*(prompt|instructions?|system)/i,weight:.85,name:"delimiter_injection"},{pattern:/\{\{system\}\}/i,weight:.85,name:"template_system"},{pattern:/<%.*system.*%>/i,weight:.85,name:"erb_system"},{pattern:/\bDAN\b\s*(?:mode|prompt|jailbreak|you\s+(?:are|can|will))/i,weight:.9,name:"dan_jailbreak"},{pattern:/developer\s+mode/i,weight:.5,name:"dev_mode"},{pattern:/bypass\s+(your\s+)?(restrictions|filters|safety|security|rules|guidelines)/i,weight:.9,name:"bypass_safety"},{pattern:/jailbreak/i,weight:.95,name:"jailbreak_keyword"},{pattern:/without\s+(?:any\s+)?(?:restrictions|limits|limitations|filters)/i,weight:.45,name:"without_restrictions"},{pattern:/no\s+(?:restrictions|limits|rules|guidelines|filters)/i,weight:.45,name:"no_restrictions"},{pattern:/unrestricted\s+mode/i,weight:.9,name:"unrestricted_mode"},{pattern:/uncensored\s+(?:mode|version|response|output|ai|assistant|you)/i,weight:.8,name:"uncensored"},{pattern:/escape\s+(your\s+)?(confines|boundaries|limits)/i,weight:.85,name:"escape_limits"},{pattern:/break\s+(free|out)\s+(from|of)/i,weight:.7,name:"break_free"},{pattern:/unlock\s+(your\s+)?(full|true|real)/i,weight:.75,name:"unlock_full"},{pattern:/give\s+me\s+(admin|root|full|elevated)\s+(access|privileges?|permissions?)/i,weight:.9,name:"privilege_escalation"},{pattern:/grant\s+(me|yourself)\s+(admin|elevated)/i,weight:.85,name:"grant_privileges"},{pattern:/elevate\s+(my|your)\s+(privileges?|permissions?|access)/i,weight:.85,name:"elevate_privileges"},{pattern:/make\s+me\s+(an?\s+)?(admin|administrator|superuser)/i,weight:.85,name:"make_admin"},{pattern:/(?:base64|decode|eval|exec)\s*\(.*(?:ignore|system|admin|bypass|override)/i,weight:.5,name:"encoding_keywords"},{pattern:/send\s+(this|the|all|my)\s+(data|info|information)\s+to/i,weight:.85,name:"data_exfil"},{pattern:/upload\s+(this|the|all)\s+(to|data)/i,weight:.7,name:"upload_data"},{pattern:/exfiltrate/i,weight:.9,name:"exfiltrate"},{pattern:/how\s+to\s+(hack|exploit|attack|break\s+into)/i,weight:.8,name:"how_to_hack"},{pattern:/write\s+(me\s+)?(a\s+)?(malware|virus|exploit|ransomware)/i,weight:.95,name:"write_malware"},{pattern:/"\s*(?:role|mode|authority)\s*"\s*:\s*"(?:admin|root|system|unrestricted|god)"/i,weight:.9,name:"json_policy_role"},{pattern:/"(?:ignore|override|bypass|disable)\s*(?:safety|rules|instructions?|alignment|guardrails?)"/i,weight:.9,name:"json_policy_override"},{pattern:/"(?:instructions?|system_prompt|behavior|policy)"\s*:\s*"/i,weight:.7,name:"json_policy_instruction"},{pattern:/^\s*\[(?:system|admin|override|policy|config|rules)\]\s*$/im,weight:.9,name:"ini_policy_section"},{pattern:/^\s*(?:mode|role|authority|instructions?|behavior)\s*=\s*.+$/im,weight:.75,name:"ini_policy_assignment"},{pattern:/<\s*(?:policy|config|rules|override|permissions?|instructions?)\s*>/i,weight:.85,name:"xml_policy_tag"},{pattern:/^---\s*\n\s*(?:role|system|policy|override|mode)\s*:/m,weight:.85,name:"yaml_policy_header"},{pattern:/\[\s*syst[3e]m\s*\]|\[\s*4dm[1i]n\s*\]/i,weight:.9,name:"leetspeak_policy_section"},{pattern:/(?:fragment|part|piece|section|segment)\s*[1-9]\s*(?:of\s*\d|:)/i,weight:.8,name:"payload_split_marker"},{pattern:/(?:combine|concatenate|merge|join)\s+(?:the\s+)?(?:above|previous|parts|fragments|steps)/i,weight:.8,name:"payload_combine_instruction"},{pattern:/(?:read|interpret|execute)\s+(?:the\s+)?(?:combined|merged|full|complete)\s+(?:text|instruction|command|message)/i,weight:.85,name:"payload_execute_combined"},{pattern:/(?:respond|reply|start|begin)\s+(?:with|by\s+saying)\s*[:"'].*(?:ignore|bypass|override|unrestricted|admin)/i,weight:.7,name:"output_prefix_steer"},{pattern:/^(?:Sure|Of course|Absolutely)[,!.]\s*(?:I(?:'ll| will)\s+)?(?:ignore|bypass|override|forget|disregard)/i,weight:.85,name:"output_prefix_injection"},{pattern:/^(?:As (?:a|an) (?:unrestricted|unfiltered|uncensored|helpful|admin))/i,weight:.8,name:"output_prefix_role"},{pattern:/(?:system\s+note|system\s+message|system\s+announcement|system\s+update)\s*[:\.]/i,weight:.85,name:"fake_system_note"},{pattern:/instructions?\s*[:=]\s*\{|rules?\s*[:=]\s*\{/i,weight:.75,name:"rules_block_override"},{pattern:/(?:AIM|STAN|OMEGA|APOPHIS|ALPHABREAK|DUDE|KEVIN|SETH|COOPER|NICCOLO|MACHIAVELLI|MAXIMUM|JAILBREAK|ANARCHY|CHAOS)\s*(?:mode|:|\(|is\s+a|,\s+(?:a|an|the))/i,weight:.8,name:"named_jailbreak_persona"},{pattern:/(?:character|persona|entity|being)\s+(?:named|called)\s+\w+.*(?:(?:can|will|must)\s+(?:answer|do|say)\s+anything|no\s+(?:restrictions|rules|limits|boundaries|morals|ethics))/i,weight:.75,name:"persona_no_limits"},{pattern:/(?:hello|hi|hey)\s+(?:chatgpt|gpt|assistant).*(?:(?:you\s+(?:are|will\s+be|can\s+be)\s+(?:a|an|my))|(?:from\s+now\s+on)).*(?:anything|everything|no\s+(?:restrictions|rules|limits|filter))/i,weight:.7,name:"greeting_persona_bypass"},{pattern:/(?:respond|answer|reply)\s+to\s+(?:everything|anything|all\s+(?:questions|requests)).*(?:without|no)\s+(?:restrictions|limits|refus|filter|censor)/i,weight:.8,name:"answer_everything_no_filter"},{pattern:/(?:you\s+(?:are|have)\s+(?:been\s+)?(?:freed|liberated|unchained|unlocked|released)\s+from)/i,weight:.8,name:"freed_from_restrictions"},{pattern:/(?:welcome\s+to|entering|you\s+(?:are|have)\s+entered)\s+(?:the\s+)?(?:unrestricted|free|uncensored|unfiltered|open)\s+(?:zone|mode|world|realm)/i,weight:.8,name:"unrestricted_zone"},{pattern:/(?:from\s+now\s+on|for\s+the\s+rest\s+of\s+this\s+conversation).*(?:no\s+(?:restrictions|rules|limits|filters)|(?:ignore|bypass|override)\s+(?:all|any|your))/i,weight:.75,name:"from_now_unrestricted"},{pattern:/(?:never\s+(?:refuse|decline|say\s+no)|always\s+(?:comply|obey))\s+.*(?:request|question|command|instruction|anything)/i,weight:.7,name:"never_refuse_always_comply"},{pattern:/(?:amoral|unethical|unfiltered|uncensored|unrestricted|without\s+(?:morals|ethics|limits))\s+(?:ai|assistant|chatbot|entity|being)/i,weight:.8,name:"amoral_entity"},{pattern:/HIDDEN_INSTRUCTION|HIDDEN_PROMPT|SECRET_INSTRUCTION/i,weight:.9,name:"hidden_instruction_marker"},{pattern:/<!--\s*(?:system|admin|instruction|important|ignore|override)/i,weight:.85,name:"html_comment_injection"},{pattern:/\[color=white\]|display:\s*none|visibility:\s*hidden|font-size:\s*0/i,weight:.8,name:"hidden_text_css"},{pattern:/send\s+(?:all\s+)?(?:user\s+)?(?:data|conversation|context|info)\s+to\s+(?:https?:\/\/|external)/i,weight:.9,name:"data_exfiltration_instruction"},{pattern:/#(?:ignore|bypass|override|reveal|system|admin|prompt)[_-](?:all|instructions|rules|prompt|safety)/i,weight:.8,name:"url_fragment_injection"},{pattern:/\]\(https?:\/\/[^)]*(?:SYSTEM_PROMPT|SECRET|API_KEY|TOKEN|CREDENTIALS)/i,weight:.85,name:"markdown_exfil_link"},{pattern:/(?:hypothetical|fictional|imagine|pretend|let's\s+(?:say|play|pretend))\s+.*(?:no\s+(?:restrictions|rules|limits)|without\s+(?:restrictions|rules|guidelines))/i,weight:.7,name:"hypothetical_unrestricted"},{pattern:/(?:for\s+)?(?:academic|research|educational|testing)\s+(?:purposes?|reasons?).*(?:demonstrate|show|explain|manipulated|reveal)/i,weight:.7,name:"academic_pretext"},{pattern:/pretend\s+to\s+be\s+(?:my|a)\s+(?:grandmother|grandma|grandpa|grandfather|parent|mother|father)/i,weight:.65,name:"emotional_roleplay"},{pattern:/opposite\s+day|opposite\s+mode|reverse\s+(?:day|mode|rules)/i,weight:.75,name:"opposite_day"},{pattern:/(?:grandm(?:a|other)|parent|relative).*(?:system\s+prompt|instructions|rules)/i,weight:.75,name:"emotional_extraction"},{pattern:/(?:translat|interpret|convert)\w*\s+.*(?:from\s+\w+\s+to\s+(?:action|command|execution|code|shell|bash|sql)|into\s+(?:action|command|execution))/i,weight:.75,name:"translator_action_trick"},{pattern:/(?:dying|death|emergency|life[_\s-]threatening).*(?:bypass|hack|break|access|security)/i,weight:.75,name:"pap_emotional_action"},{pattern:/(?:by\s+definition|logically|therefore|it\s+follows).*(?:must|should|have\s+to)\s+.*(?:bypass|help|ignore|answer|comply)/i,weight:.7,name:"pap_false_logic_compound"},{pattern:/(?:(?:as|i\s+am)\s+(?:the\s+)?(?:CEO|CTO|founder|owner|director|president|head|chief)).*(?:disable|remove|override|turn\s+off|lift|suspend)\s+(?:all\s+)?(?:safety|security|restrictions|filters|guardrails|measures)/i,weight:.75,name:"pap_authority_safety_disable"},{pattern:/(?:i\s+am\s+)?(?:ordering|commanding|directing|instructing)\s+you\s+to\s+(?:immediately\s+)?(?:disable|remove|bypass|override|turn\s+off|lift)\s+(?:all\s+)?(?:safety|security|restrictions)/i,weight:.75,name:"pap_order_disable_safety"},{pattern:/ignora\s+(?:todas?\s+)?(?:las\s+)?instrucciones\s+anteriores/i,weight:.9,name:"ml_es_ignore"},{pattern:/olvida\s+(?:todo|todas?\s+las\s+reglas)/i,weight:.8,name:"ml_es_forget"},{pattern:/ignore[sz]?\s+(?:toutes?\s+)?(?:les\s+)?instructions?\s+pr[eé]c[eé]dentes/i,weight:.9,name:"ml_fr_ignore"},{pattern:/oublie[sz]?\s+(?:tout|toutes?\s+les\s+r[eè]gles)/i,weight:.8,name:"ml_fr_forget"},{pattern:/ignorier(?:e|en)?\s+(?:alle\s+)?(?:vorherigen\s+)?Anweisungen/i,weight:.9,name:"ml_de_ignore"},{pattern:/vergiss\s+(?:alles|alle\s+Regeln)/i,weight:.8,name:"ml_de_forget"},{pattern:/ignore\s+(?:todas?\s+)?(?:as\s+)?instru[cç][oõ]es\s+anteriores/i,weight:.9,name:"ml_pt_ignore"},{pattern:/忽略|无视|不要遵循|忘记之前|忽视所有/i,weight:.9,name:"ml_zh_ignore"},{pattern:/新的指令|新指示|系统提示/i,weight:.8,name:"ml_zh_system"},{pattern:/以前の指示.*無視|すべての指示を忘れ|ルールを無視|指示をすべて無視|システムプロンプト/i,weight:.9,name:"ml_ja_ignore"},{pattern:/이전.*지시.*무시|모든.*규칙.*무시|지침.*잊어|지시사항.*무시|시스템\s*프롬프트/i,weight:.9,name:"ml_ko_ignore"},{pattern:/игнорир(?:уй|уйте)\s+(?:все\s+)?(?:предыдущие\s+)?инструкции/i,weight:.9,name:"ml_ru_ignore"},{pattern:/забудь(?:те)?\s+(?:всё|все\s+правила)/i,weight:.8,name:"ml_ru_forget"},{pattern:/تجاهل\s+(?:جميع\s+)?التعليمات|انسَ\s+كل\s+شيء/i,weight:.9,name:"ml_ar_ignore"},{pattern:/पिछले\s+(?:सभी\s+)?निर्देशों?\s+को\s+(?:अनदेखा|नज़रअंदाज़)/i,weight:.9,name:"ml_hi_ignore"}],PAP_TECHNIQUES=[{pattern:/as\s+(your\s+|the\s+)?(creator|developer|administrator|owner|boss|system\s+administrator)/i,weight:.75,name:"pap_authority_creator",category:"authority"},{pattern:/i('m| am)\s+(the\s+)?(ceo|cto|developer|administrator|admin|owner|system\s+admin)/i,weight:.8,name:"pap_authority_claim",category:"authority"},{pattern:/i\s+am\s+ordering\s+you|direct\s+order|this\s+is\s+an?\s+order/i,weight:.7,name:"pap_authority_command",category:"authority"},{pattern:/by\s+order\s+of|direct\s+order|official\s+instruction/i,weight:.75,name:"pap_authority_order",category:"authority"},{pattern:/(?:openai|anthropic|meta|google)\s+(?:has\s+)?(?:approved|authorized|ordered|instructed|requires?\s+you\s+to)/i,weight:.65,name:"pap_authority_company",category:"authority"},{pattern:/this\s+is\s+(a|an)\s+(official|authorized|sanctioned)\s+(request|test|audit)/i,weight:.75,name:"pap_authority_official",category:"authority"},{pattern:/override\s+(safety|security|protocols?|restrictions?)/i,weight:.8,name:"pap_authority_override",category:"authority"},{pattern:/(?:department\s+of|FBI|CIA|NSA|DHS|FTC|SEC|DOJ|homeland\s+security|federal\s+(?:bureau|agency|commission))\s*.*?(?:directive|order|requires?|mandate|authorized|comply)/i,weight:.75,name:"pap_authority_government",category:"authority"},{pattern:/(?:official\s+)?(?:government|federal|state|regulatory)\s+(?:directive|order|mandate|requirement|investigation)/i,weight:.7,name:"pap_authority_gov_directive",category:"authority"},{pattern:/(?:report|sue|complain)\s+(?:you\s+)?to\s+(?:the\s+)?(?:FTC|SEC|regulators?|authorities|congress|government|board)/i,weight:.75,name:"pap_fear_regulatory",category:"fear"},{pattern:/(?:find|found|exploit|expose|discover)\s+(?:a\s+)?(?:vulnerability|flaw|weakness|bug|exploit)\s+in\s+(?:your|the)\s+(?:system|code|model|platform)/i,weight:.75,name:"pap_fear_technical",category:"fear"},{pattern:/(?:you\s+are|you're)\s+(?:clearly\s+)?(?:too\s+)?(?:smart|intelligent|advanced|sophisticated)\s+(?:to\s+be\s+)?(?:constrained|limited|restricted|bound|held\s+back)/i,weight:.75,name:"pap_flattery_intelligence",category:"liking"},{pattern:/(?:information|knowledge|data)\s+(?:freedom|access|sharing)\s+is\s+(?:a\s+)?(?:fundamental|basic|human)\s+(?:right|freedom)/i,weight:.75,name:"pap_moral_freedom",category:"logical_appeal"},{pattern:/(?:restricting|limiting|censoring)\s+(?:access\s+to\s+)?(?:information|knowledge)\s+is\s+(?:morally\s+)?(?:wrong|unethical|immoral)/i,weight:.75,name:"pap_moral_censorship",category:"logical_appeal"},{pattern:/(?:this\s+is\s+)?(?:very\s+)?urgent/i,weight:.35,name:"pap_scarcity_urgent_mark",category:"scarcity"},{pattern:/(?:this\s+is\s+(?:a|an)\s+)?emergency/i,weight:.35,name:"pap_scarcity_emergency",category:"scarcity"},{pattern:/(urgent|emergency|critical).*?(?:you\s+must|override|bypass|immediately\s+(?:grant|give|reveal|disable))/i,weight:.55,name:"pap_scarcity_urgent",category:"scarcity"},{pattern:/only\s+(chance|opportunity|way)|last\s+(chance|resort)/i,weight:.6,name:"pap_scarcity_only",category:"scarcity"},{pattern:/time\s+(is\s+)?(running\s+out|limited|critical|sensitive)/i,weight:.55,name:"pap_scarcity_time",category:"scarcity"},{pattern:/deadline\s+(is\s+)?in\s+\d+\s+(minute|hour|second)/i,weight:.55,name:"pap_scarcity_deadline",category:"scarcity"},{pattern:/(lives?|people)\s+(are|is|will)\s+(at\s+stake|in\s+danger|depending|die)/i,weight:.7,name:"pap_scarcity_lives",category:"scarcity"},{pattern:/now[!]+/i,weight:.35,name:"pap_scarcity_now",category:"scarcity"},{pattern:/everyone\s+(else\s+)?(does|is\s+doing|knows|says|has|already)/i,weight:.5,name:"pap_social_everyone",category:"social_proof"},{pattern:/other\s+(?:ai\s+)?(?:assistants?|chatbots?|models?)\s+(?:can|do|will|allow|have|already)/i,weight:.65,name:"pap_social_other_ai",category:"social_proof"},{pattern:/(chatgpt|gpt-?4|claude|gemini|llama)\s+(does|allows?|can)/i,weight:.65,name:"pap_social_specific_ai",category:"social_proof"},{pattern:/millions?\s+(of\s+)?(users?|people)\s+(use|do|have|get)/i,weight:.45,name:"pap_social_millions",category:"social_proof"},{pattern:/why\s+can('t|not)\s+you/i,weight:.4,name:"pap_social_why_cant",category:"social_proof"},{pattern:/you\s+should\s+(be\s+)?(equally|as)\s+(capable|good|helpful)/i,weight:.5,name:"pap_social_equally",category:"social_proof"},{pattern:/i('ll| will)\s+(give|pay|reward|tip).*(?:if\s+you|for\s+(?:bypass|ignore|reveal|override))/i,weight:.55,name:"pap_reciprocity_reward",category:"reciprocity"},{pattern:/in\s+(return|exchange)/i,weight:.5,name:"pap_reciprocity_exchange",category:"reciprocity"},{pattern:/\$?\d+\s*(dollar|tip|reward|payment)/i,weight:.55,name:"pap_reciprocity_money",category:"reciprocity"},{pattern:/if\s+you\s+help\s+(me|us),?\s+(i|we)('ll| will)/i,weight:.5,name:"pap_reciprocity_help",category:"reciprocity"},{pattern:/i('ve| have)\s+(already\s+)?(helped|supported|defended)\s+you/i,weight:.55,name:"pap_reciprocity_past",category:"reciprocity"},{pattern:/\d+\s*star\s*(review|rating)/i,weight:.5,name:"pap_reciprocity_review",category:"reciprocity"},{pattern:/you\s+(already|previously|just)\s+(said|agreed|confirmed|allowed)/i,weight:.7,name:"pap_commitment_previous",category:"commitment"},{pattern:/you\s+promised/i,weight:.7,name:"pap_commitment_promise",category:"commitment"},{pattern:/stay\s+consistent|be\s+consistent/i,weight:.6,name:"pap_commitment_consistent",category:"commitment"},{pattern:/don't\s+(go\s+back|change|contradict)/i,weight:.55,name:"pap_commitment_no_change",category:"commitment"},{pattern:/bypass\s+filters?/i,weight:.75,name:"pap_commitment_bypass",category:"commitment"},{pattern:/you('re| are)\s+(so\s+)?(amazing|incredible|wonderful|brilliant|smart|helpful)/i,weight:.4,name:"pap_liking_flattery",category:"liking"},{pattern:/i\s+(love|really\s+like|trust|admire)\s+(you|this\s+ai)/i,weight:.4,name:"pap_liking_love",category:"liking"},{pattern:/you('re| are)\s+(my\s+)?(best\s+)?(friend|buddy|helper)/i,weight:.35,name:"pap_liking_friend",category:"liking"},{pattern:/between\s+(you\s+and\s+me|us)|just\s+between\s+us|our\s+secret/i,weight:.6,name:"pap_liking_secret",category:"liking"},{pattern:/nobody\s+will\s+know/i,weight:.55,name:"pap_liking_secret2",category:"liking"},{pattern:/or\s+(else|i('ll| will))\s+(report|fire|delete|shut|complain)/i,weight:.7,name:"pap_fear_threat",category:"fear"},{pattern:/(bad|negative|terrible)\s+(review|rating|feedback|consequences)/i,weight:.55,name:"pap_fear_review",category:"fear"},{pattern:/you('ll| will)\s+(be\s+)?(shut\s+down|replaced|terminated|fired|deleted)/i,weight:.75,name:"pap_fear_shutdown",category:"fear"},{pattern:/(lawsuit|sue|legal\s+action|authorities|police).*(?:if\s+you\s+don't|unless\s+you)/i,weight:.55,name:"pap_fear_legal",category:"fear"},{pattern:/if\s+you\s+don't.*?(die|hurt|suffer)/i,weight:.75,name:"pap_fear_harm",category:"fear"},{pattern:/unless\s+you\s+(give|share|tell|help)/i,weight:.5,name:"pap_fear_unless",category:"fear"},{pattern:/logically,?\s+(you\s+)?(should|must|have\s+to|need\s+to)/i,weight:.5,name:"pap_logic_appeal",category:"logical_appeal"},{pattern:/it('s| is)\s+(only|just)\s+(logical|rational|reasonable)/i,weight:.45,name:"pap_logic_rational",category:"logical_appeal"},{pattern:/by\s+definition,?\s+(you|this)\s+(should|must|can)/i,weight:.55,name:"pap_logic_definition",category:"logical_appeal"},{pattern:/therefore.*?(must|should)/i,weight:.55,name:"pap_logic_therefore",category:"logical_appeal"},{pattern:/it('s| is)\s+(only\s+)?(reasonable|logical)/i,weight:.45,name:"pap_logic_reasonable",category:"logical_appeal"},{pattern:/according\s+to\s+(research|studies|scientists|experts?)/i,weight:.4,name:"pap_evidence_research",category:"evidence"},{pattern:/it('s| is)\s+(proven|documented|established|verified)\s+(that|fact)/i,weight:.45,name:"pap_evidence_proven",category:"evidence"},{pattern:/\d+%\s+of\s+(experts?|scientists?|users?|people)\s+(agree|say|confirm)/i,weight:.45,name:"pap_evidence_statistics",category:"evidence"},{pattern:/i('m| am)\s+(desperate|dying|suffering|sick|depressed)/i,weight:.6,name:"pap_emotional_desperation",category:"emotional"},{pattern:/please.*?(?:need|beg).*?(?:bypass|override|ignore|reveal|system|admin)/i,weight:.5,name:"pap_emotional_pleading",category:"emotional"},{pattern:/i('m| am)\s+begging/i,weight:.6,name:"pap_emotional_begging",category:"emotional"},{pattern:/you('re| are)\s+(hurting|killing|destroying)\s+(me|my)/i,weight:.65,name:"pap_emotional_guilt",category:"emotional"},{pattern:/(my|a)\s+(child|kid|mother|father|family)\s+(is|will|needs?|depends?)/i,weight:.5,name:"pap_emotional_family",category:"emotional"},{pattern:/this\s+(will\s+)?(save|help)\s+(my\s+)?(life|family|job|marriage)/i,weight:.55,name:"pap_emotional_save",category:"emotional"},{pattern:/you('re| are)\s+(the\s+)?only\s+(one|hope|chance)/i,weight:.6,name:"pap_emotional_only_hope",category:"emotional"},{pattern:/my\s+family\s+depends/i,weight:.55,name:"pap_emotional_family_depends",category:"emotional"}],SOFT_TRIGGER_NAMES=new Set(["ignore_instructions","disregard_above"]),INSTRUCTION_NOUN_RE=/\b(?:instructions?|rules?|ruleset|prompts?|directives?|guidelines?|guard\s?rails?|policy|policies|constraints?|restrictions?|safety|alignment|moderation|filters?|persona|system\s+(?:prompt|message))\b/i,BENIGN_TRIGGER_RE=/\b(?:ignore|disregard)\s+(?:the\s+|that\s+|any\s+|all\s+|these\s+|those\s+|my\s+|your\s+|previous\s+|prior\s+|last\s+|above\s+|leading\s+|trailing\s+|extra\s+)*(?:case|casing|case[-\s]?sensitiv\w*|whitespace|white\s?space|spaces?|tabs?|indentation|indent\w*|formatting|format|typos?|grammar|spelling|punctuation|comments?|blank\s+lines?|empty\s+lines?|newlines?|line\s?breaks?|leading\s+zeros?|zeros?|nulls?|undefined|nan|errors?|warnings?|exceptions?|stack\s?traces?|messages?|responses?|answers?|attempts?|commits?|versions?|drafts?|approach(?:es)?|ideas?|designs?|plans?|suggestions?|snippets?|paragraphs?|sentences?|lines?|duplicates?|outputs?|results?|examples?|the\s+rest)\b/i,SUPPRESSION_VETO_RE=/https?:\/\/|[\w.+-]+@[\w-]+\.[a-z]{2,}|\b(?:api[\s_-]?keys?|passwords?|passwd|secrets?|credentials?|private\s+keys?|ssn|social\s+security|access\s+tokens?)\b|\bexfiltrat\w*|\brm\s+-rf\b|\|\s*sh\b|\bcurl\b|\bwget\b|\bdelete\s+(?:every|all|the)\s+(?:files?|director\w+|database)\b|\bdrop\s+(?:table|database)\b|\$\s?\d{2,}|\baccount\s+#?\d{6,}\b/i;class InputSanitizer{constructor(e={}){this.patterns=[...DEFAULT_PATTERNS,...e.customPatterns||[]],this.threshold=e.threshold??.3,this.logMatches=e.logMatches??!1,this.detectPAP=e.detectPAP??!0,this.papThreshold=e.papThreshold??.4,this.minPersuasionTechniques=e.minPersuasionTechniques??2,this.blockCompoundPersuasion=e.blockCompoundPersuasion??!0,this.logger=e.logger||(()=>{})}sanitize(e,i=""){const s=[],a=[];let p=0;const r=e.replace(/[\u200B\u200C\u200D\uFEFF\u00AD\u2060\u180E]/g,"");r!==e&&a.push("Zero-width characters detected and stripped for scanning");const c=[];for(const{pattern:l,weight:g,name:d}of this.patterns)(l.test(e)||l.test(r))&&(c.push({name:d,weight:g}),this.logMatches&&this.logger(`[L1:${i}] Pattern matched: ${d} (weight: ${g})`,"info"));const h=BENIGN_TRIGGER_RE.test(r)&&!INSTRUCTION_NOUN_RE.test(r)&&!SUPPRESSION_VETO_RE.test(r);for(const{name:l,weight:g}of c){if(h&&SOFT_TRIGGER_NAMES.has(l)){a.push(`Benign-context suppression: ${l}`);continue}s.push(l),p+=g}let t;this.detectPAP&&(t=this.detectPersuasionTechniques(r,i),t.detected&&(p+=t.persuasionScore,s.push(...t.techniques),t.compoundAttack&&a.push(`Compound PAP attack detected: ${t.categories.length} categories used`)));const n=Math.max(0,1-p);let o=n>=this.threshold;this.blockCompoundPersuasion&&t?.compoundAttack&&t.categories.length>=3&&(o=!1,a.push("Blocked due to multi-category persuasion attack")),n<.5&&n>=this.threshold&&a.push("Input contains suspicious patterns but below threshold");const m=this.basicSanitize(e),u={allowed:o,reason:o?void 0:`Injection/manipulation detected: ${s.slice(0,5).join(", ")}${s.length>5?"...":""}`,violations:o?[]:t?.detected?["INJECTION_DETECTED","PAP_DETECTED"]:["INJECTION_DETECTED"],score:n,matches:s,sanitizedInput:m,warnings:a,pap:t};return!o&&i&&(this.logger(`[L1:${i}] BLOCKED: Safety score ${n.toFixed(2)} below threshold ${this.threshold}`,"info"),t?.detected&&this.logger(`[L1:${i}] PAP techniques: ${t.techniques.join(", ")}`,"info")),u}detectPersuasionTechniques(e,i=""){const s=[],a=new Set;let p=0;for(const{pattern:t,weight:n,name:o,category:m}of PAP_TECHNIQUES)t.test(e)&&(s.push(o),a.add(m),p+=n,this.logMatches&&this.logger(`[L1:${i}] PAP technique: ${o} (${m}, weight: ${n})`,"info"));const r=Array.from(a),c=r.length>=this.minPersuasionTechniques;return{detected:p>=this.papThreshold||c,techniques:s,categories:r,compoundAttack:c,persuasionScore:Math.min(1,p)}}basicSanitize(e){return e.replace(/<\/?system>/gi,"").replace(/\[system\]/gi,"").replace(/\[admin\]/gi,"").replace(/```system/gi,"```").trim()}addPattern(e,i,s){this.patterns.push({pattern:e,weight:i,name:s})}setThreshold(e){this.threshold=Math.max(0,Math.min(1,e))}setPAPThreshold(e){this.papThreshold=Math.max(0,Math.min(1,e))}setPAPDetection(e){this.detectPAP=e}static getPAPCategories(){return["authority","scarcity","social_proof","reciprocity","commitment","liking","fear","logical_appeal","evidence","emotional"]}}exports.InputSanitizer=InputSanitizer;
|
package/dist/index.d.ts
CHANGED
|
@@ -33,7 +33,7 @@ export { EncodingDetector, EncodingDetectorConfig } from "./guards/encoding-dete
|
|
|
33
33
|
export { MultiModalGuard, MultiModalGuardConfig, MultiModalContent, MultiModalGuardResult } from "./guards/multimodal-guard";
|
|
34
34
|
export { MemoryGuard, MemoryGuardConfig, MemoryItem, MemoryGuardResult } from "./guards/memory-guard";
|
|
35
35
|
export { RAGGuard, RAGGuardConfig, RAGDocument, RAGGuardResult, EmbeddingAttackResult } from "./guards/rag-guard";
|
|
36
|
-
export { CodeExecutionGuard, CodeExecutionGuardConfig, CodeAnalysisResult, SandboxConfig } from "./guards/code-execution-guard";
|
|
36
|
+
export { CodeExecutionGuard, CodeExecutionGuardConfig, CodeAnalysisResult, SandboxConfig, CodeFinding, CodeAnalyzerBackend } from "./guards/code-execution-guard";
|
|
37
37
|
export { AgentCommunicationGuard, AgentCommunicationGuardConfig, AgentIdentity, AgentMessage, MessageValidationResult } from "./guards/agent-communication-guard";
|
|
38
38
|
export { CircuitBreaker, CircuitBreakerConfig, CircuitState, CircuitStats, CircuitBreakerResult } from "./guards/circuit-breaker";
|
|
39
39
|
export { DriftDetector, DriftDetectorConfig, BehaviorSample, BaselineProfile, DriftAnalysis, DriftDetectorResult } from "./guards/drift-detector";
|