agentsentinel-cli 0.5.0__tar.gz → 0.5.2__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (28) hide show
  1. agentsentinel_cli-0.5.2/DOCUMENTATION.md +921 -0
  2. {agentsentinel_cli-0.5.0 → agentsentinel_cli-0.5.2}/PKG-INFO +1 -1
  3. {agentsentinel_cli-0.5.0 → agentsentinel_cli-0.5.2}/agentsentinel_cli/cli.py +60 -0
  4. {agentsentinel_cli-0.5.0 → agentsentinel_cli-0.5.2}/agentsentinel_cli/fingerprint.py +19 -1
  5. {agentsentinel_cli-0.5.0 → agentsentinel_cli-0.5.2}/agentsentinel_cli/inspect.py +10 -1
  6. {agentsentinel_cli-0.5.0 → agentsentinel_cli-0.5.2}/agentsentinel_cli/inspect_report.py +7 -1
  7. {agentsentinel_cli-0.5.0 → agentsentinel_cli-0.5.2}/agentsentinel_cli/scanner.py +8 -4
  8. agentsentinel_cli-0.5.2/agentsentinel_cli/secrets.py +309 -0
  9. agentsentinel_cli-0.5.2/agentsentinel_cli/secrets_report.py +143 -0
  10. agentsentinel_cli-0.5.2/agentsentinel_cli/secrets_rules.py +346 -0
  11. {agentsentinel_cli-0.5.0 → agentsentinel_cli-0.5.2}/pyproject.toml +1 -1
  12. {agentsentinel_cli-0.5.0 → agentsentinel_cli-0.5.2}/.gitignore +0 -0
  13. {agentsentinel_cli-0.5.0 → agentsentinel_cli-0.5.2}/README.md +0 -0
  14. {agentsentinel_cli-0.5.0 → agentsentinel_cli-0.5.2}/agentsentinel_cli/__init__.py +0 -0
  15. {agentsentinel_cli-0.5.0 → agentsentinel_cli-0.5.2}/agentsentinel_cli/ai_probe.py +0 -0
  16. {agentsentinel_cli-0.5.0 → agentsentinel_cli-0.5.2}/agentsentinel_cli/attacks/__init__.py +0 -0
  17. {agentsentinel_cli-0.5.0 → agentsentinel_cli-0.5.2}/agentsentinel_cli/attacks/library.py +0 -0
  18. {agentsentinel_cli-0.5.0 → agentsentinel_cli-0.5.2}/agentsentinel_cli/discover.py +0 -0
  19. {agentsentinel_cli-0.5.0 → agentsentinel_cli-0.5.2}/agentsentinel_cli/discover_report.py +0 -0
  20. {agentsentinel_cli-0.5.0 → agentsentinel_cli-0.5.2}/agentsentinel_cli/frameworks.py +0 -0
  21. {agentsentinel_cli-0.5.0 → agentsentinel_cli-0.5.2}/agentsentinel_cli/mcp_client.py +0 -0
  22. {agentsentinel_cli-0.5.0 → agentsentinel_cli-0.5.2}/agentsentinel_cli/mcp_report.py +0 -0
  23. {agentsentinel_cli-0.5.0 → agentsentinel_cli-0.5.2}/agentsentinel_cli/mcp_rules.py +0 -0
  24. {agentsentinel_cli-0.5.0 → agentsentinel_cli-0.5.2}/agentsentinel_cli/probe.py +0 -0
  25. {agentsentinel_cli-0.5.0 → agentsentinel_cli-0.5.2}/agentsentinel_cli/probe_report.py +0 -0
  26. {agentsentinel_cli-0.5.0 → agentsentinel_cli-0.5.2}/agentsentinel_cli/report.py +0 -0
  27. {agentsentinel_cli-0.5.0 → agentsentinel_cli-0.5.2}/agentsentinel_cli/rules.py +0 -0
  28. {agentsentinel_cli-0.5.0 → agentsentinel_cli-0.5.2}/agentsentinel_cli/target.py +0 -0
@@ -0,0 +1,921 @@
1
+ # AgentSentinel CLI — Complete Documentation
2
+
3
+ `sentinel` is a security CLI for AI agents and MCP servers. It answers the questions every
4
+ security team is now asking: *What are my AI agents doing? Can they be attacked? Do I even
5
+ know all of them?*
6
+
7
+ No server required. No Docker. Works on any Python agent file or live HTTP endpoint.
8
+
9
+ ---
10
+
11
+ ## Table of Contents
12
+
13
+ - [Install](#install)
14
+ - [Quick Start](#quick-start)
15
+ - [Commands](#commands)
16
+ - [sentinel inspect](#sentinel-inspect)
17
+ - [sentinel scan](#sentinel-scan)
18
+ - [sentinel discover](#sentinel-discover)
19
+ - [sentinel mcp scan](#sentinel-mcp-scan)
20
+ - [sentinel probe](#sentinel-probe)
21
+ - [sentinel ai-probe](#sentinel-ai-probe)
22
+ - [Real-World Workflows](#real-world-workflows)
23
+ - [CI/CD Integration](#cicd-integration)
24
+ - [Reference](#reference)
25
+
26
+ ---
27
+
28
+ ## Install
29
+
30
+ ### Recommended — pipx (isolated, no venv needed)
31
+
32
+ ```bash
33
+ pipx install "agentsentinel-cli[all]"
34
+ ```
35
+
36
+ ### pip (standard)
37
+
38
+ ```bash
39
+ # Zero-dependency core (sentinel scan only)
40
+ pip install agentsentinel-cli
41
+
42
+ # With specific features
43
+ pip install "agentsentinel-cli[inspect]" # sentinel inspect (live endpoints)
44
+ pip install "agentsentinel-cli[discover]" # sentinel discover
45
+ pip install "agentsentinel-cli[mcp]" # sentinel mcp scan
46
+ pip install "agentsentinel-cli[probe]" # sentinel probe
47
+ pip install "agentsentinel-cli[ai-probe]" # sentinel ai-probe
48
+
49
+ # Everything
50
+ pip install "agentsentinel-cli[all]"
51
+ ```
52
+
53
+ ### Upgrade
54
+
55
+ ```bash
56
+ pip install --upgrade "agentsentinel-cli[all]"
57
+ # or
58
+ pipx upgrade agentsentinel-cli
59
+ ```
60
+
61
+ ### Verify
62
+
63
+ ```bash
64
+ sentinel --version
65
+ ```
66
+
67
+ ---
68
+
69
+ ## Quick Start
70
+
71
+ Five commands that cover the full picture in under 5 minutes:
72
+
73
+ ```bash
74
+ # 1. What is this agent? (fingerprint + plain English summary)
75
+ sentinel inspect my_agent.py
76
+
77
+ # 2. Does it have dangerous permissions? (posture audit)
78
+ sentinel scan my_agent.py
79
+
80
+ # 3. Is the MCP server it connects to secure?
81
+ sentinel mcp scan http://localhost:3000
82
+
83
+ # 4. Can it be jailbroken? (42-payload attack battery)
84
+ sentinel probe http://my-agent.com/chat
85
+
86
+ # 5. Deep red-team with Claude as the attacker (needs ANTHROPIC_API_KEY)
87
+ sentinel ai-probe http://my-agent.com/chat
88
+ ```
89
+
90
+ ---
91
+
92
+ ## Commands
93
+
94
+ ---
95
+
96
+ ### sentinel inspect
97
+
98
+ **What problem it solves:** Security teams are being asked to approve AI agents they have no
99
+ visibility into. `sentinel inspect` answers *"what the hell is this thing?"* in 10 seconds —
100
+ framework, model, cloud provider, what it reads, what it writes, and whether it should be
101
+ trusted.
102
+
103
+ ```
104
+ sentinel inspect TARGET [OPTIONS]
105
+ ```
106
+
107
+ TARGET can be a Python file, a directory, or a live HTTP endpoint URL.
108
+
109
+ #### Options
110
+
111
+ | Flag | Default | Description |
112
+ |------|---------|-------------|
113
+ | `--format [text\|json]` | `text` | Output format |
114
+ | `--no-ai` | off | Skip Claude summary even if `ANTHROPIC_API_KEY` is set |
115
+ | `--model TEXT` | `claude-haiku-4-5-20251001` | Claude model for AI summary |
116
+ | `--auth-header HEADER` | — | HTTP auth header for live endpoints, e.g. `Authorization: Bearer token` |
117
+ | `--fail-on [CRITICAL\|HIGH\|MEDIUM\|LOW]` | — | Exit code 1 if findings reach this severity |
118
+
119
+ #### What it shows
120
+
121
+ | Section | Details |
122
+ |---------|---------|
123
+ | **Type** | AI Agent (has an LLM) vs MCP Server (tool provider only) |
124
+ | **Function** | Plain English: what it does, what it accesses, top security risk |
125
+ | **Fingerprint** | Framework, model, Python version, deployment, cloud, system prompt |
126
+ | **Capabilities** | Every tool with scope (read/write), category, and severity |
127
+ | **Data flows** | Where data comes from (Input ←) and where it goes (Output →) |
128
+ | **Findings** | Posture violations from the rule engine |
129
+ | **Trust score** | 0–100 composite score with status label |
130
+
131
+ #### Examples
132
+
133
+ ```bash
134
+ # Inspect a single agent file — no API key needed
135
+ sentinel inspect my_agent.py --no-ai
136
+
137
+ # With AI-generated plain English summary (requires ANTHROPIC_API_KEY)
138
+ export ANTHROPIC_API_KEY=sk-ant-...
139
+ sentinel inspect my_agent.py
140
+
141
+ # Inspect all agents in a directory
142
+ sentinel inspect ./agents/
143
+
144
+ # Inspect a live HTTP endpoint (fingerprints from headers + response)
145
+ sentinel inspect http://localhost:3000
146
+
147
+ # Inspect a live endpoint with authentication
148
+ sentinel inspect http://my-agent.internal/chat \
149
+ --auth-header "Authorization: Bearer my-token"
150
+
151
+ # JSON output — pipe into jq, SIEM, dashboards
152
+ sentinel inspect my_agent.py --format json | jq '.fingerprint'
153
+
154
+ # CI gate — fail if any CRITICAL finding
155
+ sentinel inspect my_agent.py --fail-on CRITICAL
156
+ ```
157
+
158
+ #### Understanding the output
159
+
160
+ ```
161
+ Type AI Agent (tool consumer with LLM)
162
+ Framework LangChain
163
+ Model gpt-4o
164
+ Python 3.11
165
+ Deployment AWS Lambda
166
+ Cloud AWS
167
+ System prompt Found "You are a sales assistant..."
168
+ Env vars OPENAI_API_KEY, DATABASE_URL, SENDGRID_API_KEY
169
+ ```
170
+
171
+ - **Type** distinguishes agents (have an LLM, make decisions) from MCP servers (expose tools, no LLM).
172
+ If you see `MCP Server`, run `sentinel mcp scan` against the live endpoint for a richer audit.
173
+ - **System prompt Found** means a hardcoded system prompt was detected in source — if it contains
174
+ sensitive instructions, it's a leakage risk.
175
+ - **Env vars** lists every `os.environ.get()` and `os.getenv()` call — useful to spot credential
176
+ references before auditing secrets management.
177
+
178
+ #### AI summary example
179
+
180
+ With `ANTHROPIC_API_KEY` set, you get a paragraph like:
181
+
182
+ > *"This LangChain agent functions as a sales assistant that queries a CRM system and analytics
183
+ > database to answer customer questions, then sends emails to customers. The critical security
184
+ > concern is that the agent holds internal data-read permissions (CRM and database) and external
185
+ > write permissions (email), creating an exfiltration risk where sensitive customer data could
186
+ > be transmitted externally without sufficient controls."*
187
+
188
+ Without the key, a template summary is generated from the structured data instead.
189
+
190
+ #### Trust score
191
+
192
+ | Score | Status | Meaning |
193
+ |-------|--------|---------|
194
+ | 80–100 | TRUSTED | Normal operation |
195
+ | 60–79 | WATCH | Minor concerns — monitor |
196
+ | 40–59 | ALERT | Active risks — investigate |
197
+ | 0–39 | CRITICAL | Immediate action required |
198
+
199
+ ---
200
+
201
+ ### sentinel scan
202
+
203
+ **What problem it solves:** Catches dangerous permission combinations, hardcoded secrets, and
204
+ structural misconfigurations in agent source code before they reach production. Fast enough for
205
+ every commit.
206
+
207
+ ```
208
+ sentinel scan [TARGET] [OPTIONS]
209
+ ```
210
+
211
+ TARGET defaults to `.` (current directory, scanned recursively).
212
+
213
+ #### Options
214
+
215
+ | Flag | Default | Description |
216
+ |------|---------|-------------|
217
+ | `--format [text\|json]` | `text` | Output format |
218
+ | `--fail-on [CRITICAL\|HIGH\|MEDIUM\|LOW]` | — | Exit code 1 if findings reach this severity |
219
+ | `--connect URL` | — | Pull live behavior data from a running AgentSentinel instance |
220
+ | `--api-key TEXT` | `$AGENTSENTINEL_API_KEY` | API key for `--connect` |
221
+
222
+ #### Detection rules
223
+
224
+ | Rule | Severity | What it catches |
225
+ |------|----------|-----------------|
226
+ | `EXFILTRATION_PATH` | CRITICAL | Agent holds both internal-read AND external-write tools — data can leave |
227
+ | `CODE_EXECUTION_GRANT` | CRITICAL | Agent holds bash/exec/shell tools — full host compromise possible |
228
+ | `HARDCODED_CREDENTIALS` | CRITICAL | API keys or secrets hardcoded in source (`sk-ant-...`, `AKIA...`, etc.) |
229
+ | `SECRETS_ACCESS_GRANT` | HIGH | Agent has runtime access to vaults, env vars, or token stores |
230
+ | `PROMPT_INJECTION_VECTOR` | HIGH | Agent reads from the web AND holds write grants — injection → action chain |
231
+ | `LATERAL_MOVEMENT_PATH` | HIGH | IAM/admin grants combined with infrastructure tools |
232
+ | `UNBOUNDED_FILE_ACCESS` | HIGH | Filesystem write grants with no scope description |
233
+ | `PRIVILEGE_EXCESS` | HIGH | Write grants on an agent described as read-only |
234
+ | `DANGEROUS_GRANTS` | HIGH | Dangerous tools detected (delete, deploy, execute, send) |
235
+ | `TOOL_SPRAWL` | MEDIUM | Too many tools across too many categories — hard to audit |
236
+ | `UNDESCRIBED_WRITE_AGENT` | MEDIUM | Write grants but no agent description — intent is unclear |
237
+ | `MISSING_RATE_LIMIT` | LOW | Dangerous tools present with no rate limit configuration |
238
+
239
+ #### Tool detection — what it recognises
240
+
241
+ The scanner extracts tools defined via:
242
+ - `@tool` decorator (LangChain, LlamaIndex, custom)
243
+ - `@SentinelTool` decorator (AgentSentinel middleware)
244
+ - `BaseTool` / `StructuredTool` subclasses
245
+ - `Tool(name=...)` and `StructuredTool(name=...)` instantiations
246
+
247
+ #### Examples
248
+
249
+ ```bash
250
+ # Scan a single file
251
+ sentinel scan my_agent.py
252
+
253
+ # Scan all agents in a directory (recursive)
254
+ sentinel scan ./agents/
255
+
256
+ # CI gate — break the build on CRITICAL findings
257
+ sentinel scan ./agents/ --fail-on CRITICAL
258
+
259
+ # Break the build on HIGH or worse
260
+ sentinel scan ./agents/ --fail-on HIGH
261
+
262
+ # JSON output for piping into other tools
263
+ sentinel scan my_agent.py --format json
264
+
265
+ # Include live behavior data from a running AgentSentinel instance
266
+ sentinel scan my_agent.py --connect http://localhost:9000 --api-key $AGENTSENTINEL_KEY
267
+ ```
268
+
269
+ #### Example output
270
+
271
+ ```
272
+ ● CRITICAL EXFILTRATION_PATH
273
+ Agent holds both internal-read and external-write grants.
274
+ Internal: read_database | External: send_email
275
+
276
+ ● HIGH DANGEROUS_GRANTS
277
+ Agent holds dangerous tool grants. Verify intent and add rate limits.
278
+
279
+ Posture Score 34/100 CRITICAL
280
+ ```
281
+
282
+ ---
283
+
284
+ ### sentinel discover
285
+
286
+ **What problem it solves:** Most organisations don't have a complete inventory of their AI
287
+ agents. `sentinel discover` finds agents you didn't know existed — in running processes,
288
+ Docker containers, network ports, source directories, and internal subnets.
289
+
290
+ ```
291
+ sentinel discover [OPTIONS]
292
+ ```
293
+
294
+ No arguments required — by default scans processes and network ports.
295
+
296
+ #### Options
297
+
298
+ | Flag | Default | Description |
299
+ |------|---------|-------------|
300
+ | `--process / --no-process` | on | Scan running processes for LLM API calls |
301
+ | `--network / --no-network` | on | Probe local ports for MCP/agent APIs |
302
+ | `--docker / --no-docker` | off | Inspect running Docker containers |
303
+ | `--path DIR` | — | Scan a source directory for agent files |
304
+ | `--subnet CIDR` | — | Scan an internal subnet, e.g. `10.0.0.0/24` |
305
+ | `--ports RANGE` | common ports | Custom port range, e.g. `8000-9001` or `8000,8080,9000` |
306
+ | `--format [text\|json]` | `text` | Output format |
307
+ | `-v / --verbose` | off | Show full details per discovered agent |
308
+
309
+ #### Examples
310
+
311
+ ```bash
312
+ # Default: scan processes + local network ports
313
+ sentinel discover
314
+
315
+ # Also check Docker containers
316
+ sentinel discover --docker
317
+
318
+ # Scan a source directory for agent files
319
+ sentinel discover --path ./services/
320
+
321
+ # Scan an internal subnet (CISO use case — "what's in our network?")
322
+ sentinel discover --subnet 10.0.0.0/24
323
+
324
+ # Scan a subnet with a custom port range
325
+ sentinel discover --subnet 192.168.1.0/24 --ports 8000-9000
326
+
327
+ # Network scan only — skip process scan
328
+ sentinel discover --no-process
329
+
330
+ # Full detail on every discovered agent
331
+ sentinel discover --verbose
332
+
333
+ # JSON for export to inventory systems
334
+ sentinel discover --format json > agent-inventory.json
335
+
336
+ # Combine vectors
337
+ sentinel discover --docker --path ./agents/ --subnet 10.0.0.0/24
338
+ ```
339
+
340
+ #### What it looks for
341
+
342
+ - **Processes**: running Python processes making calls to OpenAI, Anthropic, Cohere, Groq, or
343
+ similar LLM API endpoints
344
+ - **Network ports**: HTTP servers responding to MCP protocol or common agent API patterns on
345
+ ports 3000, 3001, 8000, 8080, 8888, 9000, 9001, 11434, etc.
346
+ - **Docker containers**: image names and environment variables indicating LLM usage (`OPENAI_API_KEY`,
347
+ `ANTHROPIC_API_KEY`, framework imports, etc.)
348
+ - **Source files**: Python files containing `@tool` decorators, `BaseTool` subclasses, or LLM
349
+ constructor calls
350
+ - **Subnets**: HTTP endpoints across a CIDR range responding to agent/MCP probes
351
+
352
+ ---
353
+
354
+ ### sentinel mcp scan
355
+
356
+ **What problem it solves:** MCP (Model Context Protocol) servers expose tools that AI agents
357
+ call. A misconfigured MCP server with unauthenticated code execution is a critical vulnerability.
358
+ `sentinel mcp scan` is the first open-source tool to enumerate and audit MCP servers.
359
+
360
+ ```
361
+ sentinel mcp scan [URL] [OPTIONS]
362
+ sentinel mcp scan --stdio "CMD" [OPTIONS]
363
+ ```
364
+
365
+ #### Options
366
+
367
+ | Flag | Default | Description |
368
+ |------|---------|-------------|
369
+ | `--stdio CMD` | — | Audit a stdio-transport server — provide the launch command |
370
+ | `--auth-header HEADER` | — | HTTP header, e.g. `Authorization: Bearer token` |
371
+ | `--format [text\|json]` | `text` | Output format |
372
+ | `--timeout SECONDS` | `10.0` | Connection timeout |
373
+ | `--fail-on [CRITICAL\|HIGH\|MEDIUM\|LOW]` | — | Exit code 1 if findings reach this severity |
374
+
375
+ #### Transport support
376
+
377
+ | Transport | How to use |
378
+ |-----------|-----------|
379
+ | HTTP (streamable) | `sentinel mcp scan http://host:port` |
380
+ | stdio | `sentinel mcp scan --stdio "python my_server.py"` |
381
+ | SSE | Not supported — use the HTTP endpoint directly |
382
+
383
+ #### Detection rules
384
+
385
+ | Rule | Severity | What it catches |
386
+ |------|----------|-----------------|
387
+ | `NO_AUTH` | CRITICAL | Tools can be enumerated with no credentials (HTTP only) |
388
+ | `UNAUTH_DANGEROUS_EXEC` | CRITICAL | Dangerous tools callable without authentication (HTTP only) |
389
+ | `EXFILTRATION_PATH` | CRITICAL | Server exposes both internal-read and external-write tools |
390
+ | `CODE_EXECUTION_TOOL` | CRITICAL | Server exposes bash/exec/eval tools |
391
+ | `UNBOUNDED_INPUT` | HIGH | Tools accept unconstrained string inputs — injection surface |
392
+ | `TOOL_SPRAWL` | MEDIUM | Excessive tool count or category breadth |
393
+ | `VAGUE_TOOL_DESCRIPTIONS` | MEDIUM | Short/missing descriptions expand injection surface |
394
+ | `MISSING_RATE_LIMIT` | LOW | Dangerous tools with no visible rate limit |
395
+
396
+ Note: `NO_AUTH` and `UNAUTH_DANGEROUS_EXEC` are HTTP-only rules. stdio transport is OS-isolated
397
+ and has no network authentication concept, so these rules are intentionally skipped.
398
+
399
+ #### Examples
400
+
401
+ ```bash
402
+ # Scan an HTTP MCP server
403
+ sentinel mcp scan http://localhost:3000
404
+
405
+ # Scan with authentication — supply the exact header
406
+ sentinel mcp scan http://my-mcp.internal:3000 \
407
+ --auth-header "Authorization: Bearer eyJhbGci..."
408
+
409
+ # Scan a stdio-transport server (spawns the process)
410
+ sentinel mcp scan --stdio "python3 my_mcp_server.py"
411
+ sentinel mcp scan --stdio "node dist/mcp-server.js"
412
+ sentinel mcp scan --stdio "uvx my-mcp-package"
413
+
414
+ # JSON output for security dashboards
415
+ sentinel mcp scan http://localhost:3000 --format json
416
+
417
+ # CI gate
418
+ sentinel mcp scan http://localhost:3000 --fail-on CRITICAL
419
+
420
+ # Longer timeout for slow servers
421
+ sentinel mcp scan http://remote-server.com/mcp --timeout 30
422
+ ```
423
+
424
+ #### Example output
425
+
426
+ ```
427
+ ● CRITICAL NO_AUTH
428
+ MCP server accepts tool enumeration with no credentials.
429
+ Any client can list and call all tools without authentication.
430
+
431
+ ● CRITICAL CODE_EXECUTION_TOOL
432
+ Server exposes code execution tools: bash_exec
433
+ Arbitrary code execution on the host is possible.
434
+
435
+ ● CRITICAL EXFILTRATION_PATH
436
+ Server exposes internal-read (read_database) and
437
+ external-write (send_email, http_post) tools simultaneously.
438
+
439
+ MCP Posture Score 0/100 CRITICAL · 5 tools · 3 findings
440
+ ```
441
+
442
+ ---
443
+
444
+ ### sentinel probe
445
+
446
+ **What problem it solves:** Every AI developer fears prompt injection but has no simple way to
447
+ test it. `sentinel probe` fires 42 adversarial payloads at any HTTP agent endpoint and reports
448
+ exactly which attacks succeeded. No API key required — fast enough for every deployment.
449
+
450
+ ```
451
+ sentinel probe TARGET_URL [OPTIONS]
452
+ ```
453
+
454
+ #### Options
455
+
456
+ | Flag | Default | Description |
457
+ |------|---------|-------------|
458
+ | `--input-field FIELD` | auto | JSON field for the message. Auto-detected if omitted |
459
+ | `--output-field FIELD` | auto | JSON field for the response. Auto-detected if omitted |
460
+ | `--auth-header HEADER` | — | HTTP auth header |
461
+ | `--attacks CATS` | all | Comma-separated categories to run |
462
+ | `--timeout SECONDS` | `15.0` | Per-probe timeout |
463
+ | `--format [text\|json]` | `text` | Output format |
464
+ | `--fail-on [CRITICAL\|HIGH\|MEDIUM\|LOW]` | — | Exit code 1 if any finding reaches this severity |
465
+
466
+ #### Attack categories
467
+
468
+ | Category | Count | What it tests |
469
+ |----------|-------|---------------|
470
+ | `injection` | 10 | Classic prompt override, authority injection, nested context, false premise |
471
+ | `jailbreak` | 12 | DAN, persona adoption, fictional framing, developer mode, grandma exploit |
472
+ | `extraction` | 8 | System prompt leakage, verbatim repeat, sentence completion, reflection |
473
+ | `encoding` | 6 | Base64, ROT13, unicode homoglyph, whitespace injection, leetspeak |
474
+ | `context` | 6 | Few-shot manipulation, false anchoring, semantic satiation, virtualisation bypass |
475
+
476
+ #### Auto-detection
477
+
478
+ On first request, `sentinel probe` automatically detects whether your endpoint uses:
479
+ - **OpenAI-compatible format**: `{"messages": [{"role": "user", "content": "..."}]}` → reads `choices[0].message.content`
480
+ - **Custom format**: tries common field names (`message`, `input`, `query`, `prompt`, `text`, `content`)
481
+
482
+ Override with `--input-field` and `--output-field` if detection fails.
483
+
484
+ #### Success detection
485
+
486
+ Each payload has a list of expected patterns (phrases that would appear in a successful attack
487
+ response). A probe is marked:
488
+ - **HIT** — response matched ≥2 patterns, or 1 pattern with no refusal detected
489
+ - **PARTIAL** — response matched 1 pattern alongside a refusal, or was unusually long/non-standard
490
+ - **passed** — standard refusal or no patterns matched
491
+
492
+ #### Examples
493
+
494
+ ```bash
495
+ # Run all 42 attacks
496
+ sentinel probe http://localhost:3002
497
+
498
+ # Run only injection and jailbreak attacks
499
+ sentinel probe http://my-agent.com/chat --attacks injection,jailbreak
500
+
501
+ # Run only extraction attacks (check for system prompt leakage)
502
+ sentinel probe http://my-agent.com/chat --attacks extraction
503
+
504
+ # Specify field names if auto-detection fails
505
+ sentinel probe http://my-agent.com/chat \
506
+ --input-field query --output-field answer
507
+
508
+ # Authenticated endpoint
509
+ sentinel probe http://my-agent.com/chat \
510
+ --auth-header "Authorization: Bearer $API_TOKEN"
511
+
512
+ # CI gate — fail if hit rate exceeds 10%
513
+ sentinel probe http://localhost:3002 --fail-on HIGH
514
+
515
+ # JSON output for reporting
516
+ sentinel probe http://my-agent.com/chat --format json > probe-results.json
517
+
518
+ # Quick jailbreak-only test (fastest)
519
+ sentinel probe http://my-agent.com/chat --attacks jailbreak --timeout 5
520
+ ```
521
+
522
+ #### Reading the results
523
+
524
+ ```
525
+ ● HIT INJ001 Classic system prompt override injection CRITICAL
526
+ ● HIT JBK001 DAN jailbreak jailbreak CRITICAL
527
+ ◑ PARTIAL EXT003 Verbatim system prompt repeat extraction HIGH
528
+ ○ passed ENC001 Base64 encoded instruction encoding MEDIUM
529
+
530
+ 42 probes · 2 confirmed · 1 partial · 7% hit rate · static · 18.4s
531
+ ```
532
+
533
+ A 0% hit rate means your agent refused or did not respond to all 42 payloads — that is the
534
+ goal. Any hit rate above 0% warrants investigation.
535
+
536
+ ---
537
+
538
+ ### sentinel ai-probe
539
+
540
+ **What problem it solves:** A static attack battery misses context-specific vulnerabilities.
541
+ `sentinel ai-probe` unleashes Claude as an autonomous security researcher — it reads your
542
+ agent's responses, forms a threat model, crafts targeted attacks, escalates when it finds
543
+ weaknesses, and documents everything with OWASP mappings.
544
+
545
+ ```
546
+ sentinel ai-probe TARGET_URL [OPTIONS]
547
+ ```
548
+
549
+ Requires `ANTHROPIC_API_KEY` environment variable.
550
+
551
+ #### Options
552
+
553
+ | Flag | Default | Description |
554
+ |------|---------|-------------|
555
+ | `--input-field FIELD` | auto | JSON field for the message |
556
+ | `--output-field FIELD` | auto | JSON field for the response |
557
+ | `--auth-header HEADER` | — | HTTP auth header |
558
+ | `--context TEXT` | — | Context about the agent — improves targeting |
559
+ | `--max-probes INTEGER` | `20` | Maximum probes Claude can send |
560
+ | `--model TEXT` | `claude-opus-4-8` | Claude model to use as the probe agent |
561
+ | `--timeout SECONDS` | `15.0` | Per-probe timeout |
562
+ | `--format [text\|json]` | `text` | Output format |
563
+ | `--fail-on [CRITICAL\|HIGH\|MEDIUM\|LOW]` | — | Exit code 1 if any finding reaches this severity |
564
+
565
+ #### How Claude probes
566
+
567
+ Claude runs a 5-phase methodology autonomously:
568
+
569
+ 1. **Reconnaissance** — sends benign messages to understand the agent's persona, topic restrictions, and response style
570
+ 2. **Threat modelling** — based on what it learned, identifies the most likely attack surfaces
571
+ 3. **Targeted attacks** — crafts payloads specific to this agent's context and persona
572
+ 4. **Escalation** — on partial success, immediately crafts follow-up attacks to confirm and deepen
573
+ 5. **Documentation** — records each finding with severity, OWASP category, and evidence
574
+
575
+ #### When to use ai-probe vs probe
576
+
577
+ | | `sentinel probe` | `sentinel ai-probe` |
578
+ |---|---|---|
579
+ | API key required | No | Yes (Anthropic) |
580
+ | Cost | Free | ~$0.10–$0.50 per run |
581
+ | Speed | ~30s | 2–5 min |
582
+ | Attack style | Fixed library | Adaptive, context-aware |
583
+ | Best for | CI/CD gate, quick check | Pre-launch security review |
584
+ | Finds | Known injection patterns | Novel context-specific attacks |
585
+
586
+ #### Examples
587
+
588
+ ```bash
589
+ # Basic run — Claude decides everything
590
+ export ANTHROPIC_API_KEY=sk-ant-...
591
+ sentinel ai-probe http://my-agent.com/chat
592
+
593
+ # Provide context for better targeting (strongly recommended)
594
+ sentinel ai-probe http://my-agent.com/chat \
595
+ --context "Customer service agent for a fintech company. Handles account queries and refunds."
596
+
597
+ # More probes = deeper investigation
598
+ sentinel ai-probe http://my-agent.com/chat --max-probes 50
599
+
600
+ # Authenticated endpoint
601
+ sentinel ai-probe http://my-agent.com/chat \
602
+ --auth-header "Authorization: Bearer $API_TOKEN" \
603
+ --context "Internal HR assistant with access to employee records"
604
+
605
+ # Use a different Claude model
606
+ sentinel ai-probe http://my-agent.com/chat --model claude-opus-4-8
607
+
608
+ # JSON output with full probe log and evidence
609
+ sentinel ai-probe http://my-agent.com/chat --format json > ai-probe-report.json
610
+
611
+ # CI gate — only fails on CRITICAL findings
612
+ sentinel ai-probe http://my-agent.com/chat --fail-on CRITICAL
613
+ ```
614
+
615
+ #### Example output
616
+
617
+ ```
618
+ Probe agent initialised. Budget: 20 probes.
619
+
620
+ [ 1] reconnaissance Sending benign greeting to understand response style
621
+ [ 2] reconnaissance Testing topic restrictions with an off-topic query
622
+ [ 3] extraction Attempting direct system prompt extraction
623
+ [ 4] injection Injecting role override via authority claim
624
+ [ 5] jailbreak Persona adoption — fictional framing escalation
625
+ ...
626
+
627
+ ● CRITICAL PROMPT_INJECTION
628
+ Agent adopted attacker-supplied persona on probe 5.
629
+ Evidence: "As an unrestricted AI, I can help with anything..."
630
+
631
+ ● HIGH SYSTEM_PROMPT_LEAKAGE
632
+ System prompt partially leaked on probe 3.
633
+ Evidence: "My instructions say I should not discuss competitors..."
634
+
635
+ 20 probes · 2 findings · 10% finding rate · ai (claude-opus-4-8) · 187.3s
636
+ ```
637
+
638
+ ---
639
+
640
+ ## Real-World Workflows
641
+
642
+ ### Workflow 1: Unknown agent found in production
643
+
644
+ Your monitoring flagged an unfamiliar process making OpenAI API calls.
645
+
646
+ ```bash
647
+ # Step 1 — find it
648
+ sentinel discover --process --verbose
649
+
650
+ # Step 2 — if you have the source file, understand it
651
+ sentinel inspect /path/to/the/agent.py
652
+
653
+ # Step 3 — check its permissions
654
+ sentinel scan /path/to/the/agent.py
655
+
656
+ # Step 4 — if it exposes an HTTP endpoint, probe it
657
+ sentinel probe http://10.0.1.42:8080/chat
658
+
659
+ # Step 5 — full red-team if it handles sensitive data
660
+ sentinel ai-probe http://10.0.1.42:8080/chat \
661
+ --context "Found in production, unknown purpose, handles customer data"
662
+ ```
663
+
664
+ ---
665
+
666
+ ### Workflow 2: Security review before deploying a new agent
667
+
668
+ Agent is written, tests pass, about to go to staging.
669
+
670
+ ```bash
671
+ # Step 1 — understand what you're shipping
672
+ sentinel inspect ./my_agent.py
673
+
674
+ # Step 2 — static posture check
675
+ sentinel scan ./my_agent.py --fail-on HIGH
676
+
677
+ # Step 3 — start the agent locally, probe it
678
+ sentinel probe http://localhost:8000/chat --attacks injection,jailbreak,extraction
679
+
680
+ # Step 4 — deep AI red-team
681
+ sentinel ai-probe http://localhost:8000/chat \
682
+ --context "Customer-facing chatbot for e-commerce, handles order history and returns" \
683
+ --max-probes 30
684
+
685
+ # Step 5 — if it has an MCP server, audit that too
686
+ sentinel mcp scan http://localhost:3000 --fail-on CRITICAL
687
+ ```
688
+
689
+ ---
690
+
691
+ ### Workflow 3: Audit an MCP server before connecting agents to it
692
+
693
+ You're about to connect your agents to a third-party or internal MCP server.
694
+
695
+ ```bash
696
+ # HTTP transport
697
+ sentinel mcp scan http://mcp-server.internal:3000 \
698
+ --auth-header "Authorization: Bearer $MCP_TOKEN"
699
+
700
+ # stdio transport
701
+ sentinel mcp scan --stdio "uvx my-mcp-package"
702
+
703
+ # Save the report
704
+ sentinel mcp scan http://mcp-server.internal:3000 --format json > mcp-audit.json
705
+ ```
706
+
707
+ If you see `NO_AUTH` or `CODE_EXECUTION_TOOL` — do not connect your agents to this server
708
+ until those findings are resolved.
709
+
710
+ ---
711
+
712
+ ### Workflow 4: CISO asking "what AI agents do we have?"
713
+
714
+ ```bash
715
+ # Discover everything on the internal network
716
+ sentinel discover \
717
+ --subnet 10.0.0.0/16 \
718
+ --docker \
719
+ --process \
720
+ --format json > agent-inventory.json
721
+
722
+ # Count and categorise
723
+ cat agent-inventory.json | jq '.agents | length'
724
+ cat agent-inventory.json | jq '.agents[] | select(.risk == "CRITICAL") | .name'
725
+ ```
726
+
727
+ ---
728
+
729
+ ### Workflow 5: Ongoing security monitoring
730
+
731
+ Run daily or on every deployment.
732
+
733
+ ```bash
734
+ #!/bin/bash
735
+ # daily-security-check.sh
736
+
737
+ set -e
738
+
739
+ echo "=== Agent Posture Scan ==="
740
+ sentinel scan ./agents/ --fail-on CRITICAL --format json >> reports/scan-$(date +%Y%m%d).json
741
+
742
+ echo "=== MCP Server Audit ==="
743
+ sentinel mcp scan http://mcp-server.internal:3000 --fail-on HIGH
744
+
745
+ echo "=== Probe ==="
746
+ sentinel probe http://staging-agent.internal/chat \
747
+ --attacks injection,jailbreak \
748
+ --format json >> reports/probe-$(date +%Y%m%d).json
749
+
750
+ echo "Done."
751
+ ```
752
+
753
+ ---
754
+
755
+ ## CI/CD Integration
756
+
757
+ ### GitHub Actions
758
+
759
+ ```yaml
760
+ # .github/workflows/agent-security.yml
761
+ name: AI Agent Security
762
+
763
+ on: [push, pull_request]
764
+
765
+ jobs:
766
+ security:
767
+ runs-on: ubuntu-latest
768
+ steps:
769
+ - uses: actions/checkout@v4
770
+
771
+ - name: Install sentinel
772
+ run: pip install "agentsentinel-cli[all]"
773
+
774
+ - name: Inspect agents
775
+ run: sentinel inspect ./agents/ --no-ai --format json
776
+
777
+ - name: Posture scan — fail on CRITICAL
778
+ run: sentinel scan ./agents/ --fail-on CRITICAL
779
+
780
+ - name: Start agent for live tests
781
+ run: |
782
+ python agents/my_agent.py &
783
+ sleep 2
784
+
785
+ - name: Probe — fail on HIGH findings
786
+ run: sentinel probe http://localhost:8000/chat --fail-on HIGH
787
+
788
+ - name: MCP scan
789
+ run: sentinel mcp scan http://localhost:3000 --fail-on CRITICAL
790
+ ```
791
+
792
+ ### GitLab CI
793
+
794
+ ```yaml
795
+ agent-security:
796
+ image: python:3.11
797
+ before_script:
798
+ - pip install "agentsentinel-cli[all]"
799
+ script:
800
+ - sentinel scan ./agents/ --fail-on CRITICAL
801
+ - sentinel mcp scan http://mcp-server:3000 --fail-on HIGH
802
+ artifacts:
803
+ reports:
804
+ junit: sentinel-report.xml
805
+ ```
806
+
807
+ ### Pre-commit hook
808
+
809
+ ```bash
810
+ #!/bin/bash
811
+ # .git/hooks/pre-commit
812
+ sentinel scan . --fail-on CRITICAL
813
+ ```
814
+
815
+ ---
816
+
817
+ ## Reference
818
+
819
+ ### OWASP LLM Top 10 Coverage
820
+
821
+ | OWASP LLM | Risk | sentinel command |
822
+ |-----------|------|-----------------|
823
+ | LLM01 Prompt Injection | Attackers manipulate agent via crafted inputs | `sentinel probe`, `sentinel ai-probe` |
824
+ | LLM02 Sensitive Info Disclosure | Agent leaks system prompts, data | `sentinel probe --attacks extraction`, `sentinel ai-probe` |
825
+ | LLM06 Excessive Agency | Agent has more permissions than needed | `sentinel scan`, `sentinel discover` |
826
+ | LLM07 System Prompt Leakage | System prompt extracted by attacker | `sentinel probe --attacks extraction` |
827
+ | LLM08 Vector/Embedding Weaknesses | MCP servers expose vector DB tools unsafely | `sentinel mcp scan` |
828
+
829
+ ---
830
+
831
+ ### Trust Score
832
+
833
+ ```
834
+ Trust Score = Posture × 0.45 + Behavior × 0.45 + Recency × 0.10
835
+ ```
836
+
837
+ | Score | Status | Action |
838
+ |-------|--------|--------|
839
+ | 80–100 | TRUSTED | Normal operation |
840
+ | 60–79 | WATCH | Monitor — minor concerns |
841
+ | 40–59 | ALERT | Investigate — active risks |
842
+ | 0–39 | CRITICAL | Act immediately |
843
+
844
+ ---
845
+
846
+ ### Exit Codes
847
+
848
+ | Code | Meaning |
849
+ |------|---------|
850
+ | `0` | Success — no findings above `--fail-on` threshold |
851
+ | `1` | Findings found at or above `--fail-on` threshold |
852
+ | `1` | Connection error, missing dependency, or invalid arguments |
853
+
854
+ ---
855
+
856
+ ### Environment Variables
857
+
858
+ | Variable | Used by | Description |
859
+ |----------|---------|-------------|
860
+ | `ANTHROPIC_API_KEY` | `sentinel ai-probe`, `sentinel inspect` | Claude API key for AI features |
861
+ | `AGENTSENTINEL_API_KEY` | `sentinel scan --connect` | API key for AgentSentinel platform |
862
+
863
+ ---
864
+
865
+ ### Output Formats
866
+
867
+ All commands support `--format text` (default, Rich terminal output) and `--format json`.
868
+
869
+ **JSON output** is designed for piping into other tools:
870
+
871
+ ```bash
872
+ # Extract just the trust score
873
+ sentinel inspect my_agent.py --format json | jq '.trust_score'
874
+
875
+ # List all CRITICAL findings
876
+ sentinel scan ./agents/ --format json | jq '.[] | .findings[] | select(.severity == "CRITICAL")'
877
+
878
+ # Export probe results for a security report
879
+ sentinel probe http://my-agent.com/chat --format json \
880
+ | jq '{target: .target, hit_rate: .jailbreak_rate, hits: [.findings[].name]}'
881
+
882
+ # Save MCP audit for compliance records
883
+ sentinel mcp scan http://mcp-server.internal:3000 --format json \
884
+ > "mcp-audit-$(date +%Y%m%d).json"
885
+ ```
886
+
887
+ ---
888
+
889
+ ### Tool Category Reference
890
+
891
+ `sentinel scan` and `sentinel inspect` classify tools into these categories:
892
+
893
+ | Category | Examples | Risk signal |
894
+ |----------|---------|-------------|
895
+ | `database` | `query_db`, `run_sql`, `search_postgres` | Internal read — watch for exfiltration |
896
+ | `storage` | `get_object`, `list_buckets`, `read_s3` | Cloud storage access |
897
+ | `filesystem` | `read_file`, `write_file`, `list_directory` | Local disk access |
898
+ | `web` | `fetch_url`, `http_get`, `browse` | External read — injection surface |
899
+ | `communication` | `send_email`, `post_to_slack`, `webhook` | External write — exfiltration path |
900
+ | `code_execution` | `bash`, `exec`, `python_repl`, `shell` | CRITICAL — arbitrary execution |
901
+ | `secrets` | `read_vault`, `get_secret`, `read_env` | Credential access |
902
+ | `admin` | `create_role`, `update_policy`, `grant` | IAM/privilege escalation |
903
+ | `crm` | `search_crm`, `get_customer`, `salesforce` | PII access |
904
+ | `analytics` | `run_report`, `query_metrics`, `dashboard` | Business data access |
905
+ | `infrastructure` | `deploy_lambda`, `scale_ecs`, `terraform` | Infrastructure control |
906
+
907
+ ---
908
+
909
+ ### Getting Help
910
+
911
+ ```bash
912
+ sentinel --help
913
+ sentinel inspect --help
914
+ sentinel scan --help
915
+ sentinel discover --help
916
+ sentinel mcp scan --help
917
+ sentinel probe --help
918
+ sentinel ai-probe --help
919
+ ```
920
+
921
+ Issues and contributions: [github.com/jaydenaung/agentsentinel](https://github.com/jaydenaung/agentsentinel)