agentguard-hitl 0.2.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,23 @@
1
+ # Python bytecode & caches
2
+ __pycache__/
3
+ *.py[cod]
4
+
5
+ # Build / packaging output (from `python -m build`)
6
+ build/
7
+ dist/
8
+ *.egg-info/
9
+
10
+ # Runtime artifact (AgentGuard's audit log)
11
+ *.log
12
+
13
+ # Local tooling & secrets
14
+ .claude/
15
+ .env
16
+ .venv/
17
+ venv/
18
+
19
+ # Editors / OS
20
+ .idea/
21
+ .vscode/
22
+ .DS_Store
23
+ Thumbs.db
@@ -0,0 +1,364 @@
1
+ Metadata-Version: 2.4
2
+ Name: agentguard-hitl
3
+ Version: 0.2.0
4
+ Summary: A permission gate for AI agent tool calls. Stops dangerous actions before they execute. Safe under many concurrent agents.
5
+ Project-URL: Homepage, https://github.com/agentguard/agentguard
6
+ License-Expression: MIT
7
+ Keywords: agents,ai,guardrails,llm,safety,tool-use
8
+ Requires-Python: >=3.8
9
+ Description-Content-Type: text/markdown
10
+
11
+ # AgentGuard
12
+
13
+ AgentGuard is a stop-and-ask checkpoint for software that can take actions on its own. Some programs don't just return text — they can run real operations like deleting a file, sending an email, or charging a card by calling functions in your code. AgentGuard sits in front of those function calls: when a call looks dangerous, it pauses the program and asks a person to approve or deny it before anything runs. If nobody answers in time, the action is blocked. The check is plain code, so the program cannot skip it by "deciding" to.
14
+
15
+ No dependencies — pure Python standard library.
16
+
17
+ ## Install
18
+
19
+ ```bash
20
+ pip install agentguard-hitl
21
+ # or, from a clone of this repo:
22
+ pip install -e .
23
+ ```
24
+
25
+ ## Smallest working example
26
+
27
+ The main way to use AgentGuard is to wrap your tool calls in `guard.execute()`. It looks at the tool's name and, if the name looks dangerous, asks for approval before running the call.
28
+
29
+ ```python
30
+ from agentguard import AgentGuard
31
+
32
+ guard = AgentGuard() # approval happens in the terminal
33
+
34
+ def run_tool(name, params):
35
+ return f"ran {name} with {params}"
36
+
37
+ # "delete_file" contains "delete", so this asks before running:
38
+ result = guard.execute("delete_file", {"path": "notes.txt"}, executor=run_tool)
39
+ print(result)
40
+ ```
41
+
42
+ Run it and you get a prompt in your terminal:
43
+
44
+ ```
45
+ [!] AGENTGUARD: Action requires approval
46
+ ===================================================
47
+ Tool : delete_file
48
+ Parameters: {"path": "notes.txt"}
49
+ Risk : IRREVERSIBLE - cannot be undone
50
+ Request : 3f9c...
51
+ ===================================================
52
+ Allow? (y/n):
53
+ ```
54
+
55
+ Type `y` and `run_tool` runs. Type `n` — or run it where there's no keyboard — and the call is blocked and `result` is an `ActionDenied` value instead.
56
+
57
+ ## How it works
58
+
59
+ **What gets gated.** When you call `guard.execute(name, params, executor=run_tool)`, AgentGuard looks at the tool's *name*. If the name contains a word like `delete`, `wipe`, `drop`, `send`, `run`, or `write`, it asks for approval first. If the name looks safe, the call runs right away with no prompt. The check reads only the name — not what the function does or what arguments it gets.
60
+
61
+ **The two ways to gate, side by side.** `guard.execute()` is the main path: it runs the automatic name check. `guard.gate` is a manual override — a decorator bound to a guard that makes a function *always* require approval, regardless of its name. Both go through whichever approval mode the guard is configured with (terminal / callback / external). Use `execute()` for normal integration; use `guard.gate` to force approval on one specific function you already know is dangerous.
62
+
63
+ **Where the human is asked.** You pick one when you create the guard:
64
+
65
+ - **Terminal** (default): a `y/n` question in the console.
66
+ - **Callback**: you give a function that returns `True` or `False` (for example, it posts to Slack and waits for a click).
67
+ - **External**: the request is parked with an ID, and your own app approves it later by calling `resolve()`; `pending()` lists everything waiting. This is how you connect approvals to a web page or dashboard.
68
+
69
+ **The result.** If the call is approved, it runs and you get its normal return value. If it is denied — or if no one answers before the optional timeout — it does not run, and you get an `ActionDenied` value back (a value, not an error you have to catch).
70
+
71
+ ## The `guard.gate` decorator
72
+
73
+ Use `guard.gate` when you know a specific function is always dangerous regardless of its name. It is a decorator **bound to a guard**, so it forces approval through that guard's configured mode — terminal, callback, or external/UI.
74
+
75
+ ```python
76
+ guard = AgentGuard() # or mode="external", confirm=..., etc.
77
+
78
+ @guard.gate
79
+ def delete_file(path):
80
+ print(f"deleting {path}")
81
+
82
+ delete_file("notes.txt") # always asks for approval, through the guard's mode
83
+ ```
84
+
85
+ `guard.gate` registers the function's name so it always gates, then routes each call through `guard.execute()`. It gates every call regardless of the name and returns an `ActionDenied` value on denial (the same as `execute()`). Because it uses the guard's mode, on a `mode="external"` guard it shows up as a UI approval card, not a terminal prompt.
86
+
87
+ ## Function reference
88
+
89
+ Everything exported from the top-level `agentguard` package:
90
+
91
+ | Name | What it does | Returns |
92
+ |---|---|---|
93
+ | `AgentGuard` | The checkpoint object you put in front of tool calls. The main entry point. Its constructor takes `dangerous_tools=[...]` — a list of tool names that always require approval regardless of their name, through whichever mode you configured. | an `AgentGuard` instance |
94
+ | `classify` | Checks whether a tool name looks dangerous. | a risk-label string, or `None` if safe |
95
+ | `ApprovalRequest` | The details of one pending approval (tool, params, risk, ids, timestamp, deadline). Read-only. | an object; `.to_dict()` gives a plain dict |
96
+ | `ActionDenied` | The value returned when an action is blocked. It is "falsy" and is a value, not an exception. | an object; `.reason`, `.tool`, `.request_id` |
97
+ | `ResolveOutcome` | The result of answering a parked request: `RESOLVED`, `ALREADY_RESOLVED`, `UNKNOWN`, or `EXPIRED`. | an enum member |
98
+ | `ApprovalStore` | The interface for plugging in your own storage for pending approvals. | (a type to implement) |
99
+ | `InMemoryStore` | The default storage for pending approvals (kept in memory). | an `InMemoryStore` instance |
100
+
101
+ Methods on an `AgentGuard` instance:
102
+
103
+ | Method | What it does | Returns |
104
+ |---|---|---|
105
+ | `execute(tool_name, params, executor, *, reason=None, agent_id=None, session_id=None)` | Gates one tool call. Runs `executor(tool_name, params)` only if allowed. The main integration point. | the executor's result, or `ActionDenied` |
106
+ | `aexecute(...)` | Same as `execute`, for `async`/`await` code. | the executor's result, or `ActionDenied` |
107
+ | `resolve(request_id, approved, *, actor=None)` | Approves or denies a parked request (external mode). First answer wins. | a `ResolveOutcome` |
108
+ | `pending()` | Lists the approval requests currently waiting. | a list of `ApprovalRequest` |
109
+ | `gate(func)` | Decorator bound to this guard: forces `func` to always require approval, through this guard's mode. | the wrapped function |
110
+
111
+ ## Full integration example
112
+
113
+ This is a complete, runnable program. It shows the recommended pattern: keep your real tools as normal functions, put them behind one dispatcher, and route that dispatcher through `guard.execute()`.
114
+
115
+ ```python
116
+ from agentguard import AgentGuard, ActionDenied
117
+
118
+ # 1. Your real tool implementations. Each takes a params dict and does its work.
119
+ def read_customer(params):
120
+ return {"id": params["id"], "name": "Jane Doe", "plan": "pro"}
121
+
122
+ def update_email(params):
123
+ # pretend this writes to a database
124
+ return f"Email for #{params['id']} set to {params['email']}"
125
+
126
+ def delete_customer(params):
127
+ # pretend this deletes a row
128
+ return f"Customer #{params['id']} deleted"
129
+
130
+ # 2. One dispatcher mapping a tool name to its implementation.
131
+ TOOLS = {
132
+ "read_customer": read_customer,
133
+ "update_email": update_email,
134
+ "delete_customer": delete_customer,
135
+ }
136
+
137
+ def run_tool(name, params):
138
+ return TOOLS[name](params)
139
+
140
+ # 3. Create the guard. Default asks in the terminal.
141
+ guard = AgentGuard()
142
+ # Other options (pick ONE when you create the guard):
143
+ # guard = AgentGuard(confirm=lambda req: ask_slack_and_wait(req)) # returns True/False
144
+ # guard = AgentGuard(mode="external", on_request=push_to_ui, timeout=120) # web UI
145
+
146
+ # 4. Anywhere your code currently runs a tool, route it through the guard instead.
147
+ def handle_tool_call(name, params):
148
+ result = guard.execute(
149
+ name, params,
150
+ executor=run_tool, # the guard calls this only after approval
151
+ agent_id="support-bot", # optional: who is acting
152
+ session_id="ticket-4821", # optional: which run this belongs to
153
+ )
154
+ if isinstance(result, ActionDenied):
155
+ # Blocked. Return a plain message; do NOT automatically retry.
156
+ return f"Action blocked: {result.reason}"
157
+ return result
158
+
159
+ # 5. Use it.
160
+ print(handle_tool_call("read_customer", {"id": 7})) # safe name -> runs, no prompt
161
+ print(handle_tool_call("update_email", {"id": 7, "email": "a@b.com"})) # "email"/"update" -> asks first
162
+ print(handle_tool_call("delete_customer", {"id": 7})) # "delete" -> asks first
163
+ ```
164
+
165
+ Notes for adapting this:
166
+ - `executor` must be a function with the signature `executor(tool_name, params)`.
167
+ - A safe-named tool (`read_customer`) runs with no prompt. A dangerous-named one (`update_email`, `delete_customer`) asks first.
168
+ - To approve through a web UI instead of the terminal, create the guard with `mode="external"`, push each request to your UI from `on_request`, and call `guard.resolve(request_id, approved, actor=...)` from your Approve/Deny buttons. Always pass a finite `timeout`.
169
+ - `guard.gate` is not used here on purpose: `execute()` is the direct path. Reach for `@guard.gate` only to force approval on one specific function — it routes through this same guard, so it uses the same approval mode.
170
+
171
+ ## Use with Cursor, Claude Code, or any AI coding agent
172
+
173
+ Copy the block below and paste it into your own coding agent. It is a prompt to hand to a tool — not code to run directly.
174
+ I want to integrate AgentGuard into this project. AgentGuard is a library
175
+ that pauses an AI agent before it runs a dangerous or irreversible tool call
176
+ and asks a human to approve or deny it first. Install it with:
177
+ pip install agentguard-hitl
178
+
179
+ Before writing a single line of code, do the following discovery steps and
180
+ tell me what you find:
181
+
182
+ DISCOVERY (do this first, do not skip):
183
+ 1. Find every place in this codebase where an AI agent executes a tool or
184
+ function call — the dispatcher, the tool runner, wherever "the agent
185
+ picked a tool and now it runs." List every file and line.
186
+ 2. List every tool/function the agent can call. For each one, tell me:
187
+ does its name sound dangerous (delete, send, wipe, drop, write, run,
188
+ exec, update) or does it sound innocent but could be dangerous in
189
+ practice (charge_card, grant_admin, transfer_funds, deploy,
190
+ read_secrets, export, disable, reset, create_api_key, revoke)?
191
+ 3. Find where the existing UI is built — what framework (React, Vue, plain
192
+ HTML, Jinja, etc.), what components already exist for modals, cards,
193
+ dialogs, notifications, or alerts, and what the existing color scheme,
194
+ font, spacing, and button styles look like. I want the approval card to
195
+ look like it belongs in this UI, not like it was dropped in from outside.
196
+ 4. Find the web framework being used (Flask, FastAPI, Django, Express, etc.)
197
+ and where routes/endpoints are defined.
198
+ 5. Find how real-time or live updates currently work in this project —
199
+ WebSockets, Server-Sent Events, polling, a message queue, or nothing yet.
200
+ If nothing exists, identify the simplest option that fits this stack.
201
+ 6. Find if there is an existing authenticated user session — how is the
202
+ current user identified (user.id, session["user"], request.user, JWT,
203
+ etc.)? I need this for the approver identity record.
204
+
205
+ Do not proceed until you have listed findings for all six points above and
206
+ I have confirmed them.
207
+
208
+ IMPLEMENTATION (after discovery is confirmed):
209
+
210
+ Follow these exact steps in this exact order. Show me the diff for each
211
+ step before moving to the next. Do not batch them.
212
+
213
+ STEP 1 — Create the guard (one place, app startup):
214
+ - Import AgentGuard and ActionDenied at the top of the appropriate file
215
+ (wherever app-level objects like db connections or config are initialized).
216
+ - Define a push_to_ui(request) function that sends the approval request to
217
+ the frontend using whatever real-time mechanism exists (or the simplest
218
+ one you identified). It receives an ApprovalRequest object with these
219
+ fields: request_id, tool, params, risk, agent_id, session_id, reason,
220
+ timestamp, deadline. Send all of them — don't drop any.
221
+ - Create exactly one guard instance:
222
+ guard = AgentGuard(
223
+ mode="external",
224
+ on_request=push_to_ui,
225
+ timeout=120,
226
+ dangerous_tools=[LIST EVERY INNOCENTLY-NAMED DANGEROUS TOOL YOU FOUND]
227
+ )
228
+ - This instance must be importable by both the agent code and the route
229
+ handlers. Put it somewhere both can reach — a shared module, app state,
230
+ or dependency injection, whatever fits this project's existing pattern.
231
+
232
+ STEP 2 — Wrap the tool dispatcher (one line changed):
233
+ - Find the exact line(s) from discovery step 1 where tools execute.
234
+ - Change each one from:
235
+ result = run_tool(name, params)
236
+ to:
237
+ result = guard.execute(
238
+ name, params,
239
+ executor=run_tool,
240
+ agent_id=<how this agent is identified in this codebase>,
241
+ session_id=<current session or run id if one exists>
242
+ )
243
+ - Immediately after, handle denial:
244
+ if isinstance(result, ActionDenied):
245
+ <return or yield the denial message back to the agent in whatever
246
+ format this project uses for tool results — string, dict, JSON,
247
+ tool_result block, etc.>
248
+ <do NOT raise an exception, do NOT retry the call>
249
+ - If multiple agents share one dispatcher, this one change guards all of
250
+ them. If each agent has its own, make this change in each one.
251
+
252
+ STEP 3 — Add two backend routes:
253
+ Add these two endpoints using whatever routing pattern this project already
254
+ uses. Match the existing route style exactly (decorators, blueprints,
255
+ routers, controllers — whatever is already here):
256
+
257
+ Route 1 — list pending approvals (used by the UI to render the queue):
258
+ GET /agentguard/pending
259
+ Returns: guard.pending() serialized as JSON (each ApprovalRequest has
260
+ a .to_dict() method). Protect this route with whatever authentication
261
+ middleware already exists on sensitive routes in this project.
262
+
263
+ Route 2 — resolve an approval (called by Approve/Deny buttons):
264
+ POST /agentguard/resolve
265
+ Body: { request_id: string, approved: boolean }
266
+ Action: guard.resolve(request_id, approved, actor=<current user identity>)
267
+ Returns: { outcome: <ResolveOutcome value> }
268
+ On UNKNOWN or EXPIRED outcome, return an appropriate error response.
269
+ Protect this route identically to Route 1.
270
+
271
+ STEP 4 — Build the approval UI component:
272
+ - Build a single approval card component that matches the existing UI
273
+ exactly — same framework, same design system, same component library if
274
+ one exists. Do not introduce a new UI framework or new CSS library.
275
+ - The card must show ALL of these fields, labeled clearly:
276
+ Tool name (what the agent wants to run)
277
+ Parameters (what it's passing — shown as a readable key/value list)
278
+ Risk level (the risk label from AgentGuard)
279
+ Agent ID (which agent is asking)
280
+ Session ID (which run this belongs to)
281
+ Reason (if provided)
282
+ Countdown timer to deadline (live, counts down in seconds)
283
+ - Two buttons: Approve and Deny. On click, each calls the resolve route
284
+ with the correct request_id and approved=true/false.
285
+ - After clicking either button, disable both immediately to prevent
286
+ double-submission (the library handles it safely, but the UI should
287
+ reflect that the decision was made).
288
+ - When the deadline countdown hits zero, mark the card as expired and
289
+ disable both buttons.
290
+ - Wire the card to receive new requests via the same real-time mechanism
291
+ used in push_to_ui (step 1). The card should appear automatically when
292
+ a new request is parked — no manual refresh.
293
+ - Match the existing UI's: color scheme, font sizes, border radius, spacing,
294
+ button styles (primary/danger), modal or panel patterns, and any existing
295
+ loading/error states. If there's an existing modal or dialog component,
296
+ use it as the wrapper. If there are existing button components, use them.
297
+
298
+ STEP 5 — Verify end to end:
299
+ Run through this exact sequence manually and confirm each step works:
300
+ 1. Trigger an agent action that calls a tool with a dangerous-sounding name.
301
+ Confirm: card appears in UI, agent is paused.
302
+ 2. Click Approve. Confirm: agent continues, tool runs, result is returned.
303
+ 3. Trigger the same action again. Click Deny.
304
+ Confirm: ActionDenied is returned, agent receives the denial message,
305
+ does not retry.
306
+ 4. Trigger an action that calls a tool from the dangerous_tools list
307
+ (innocently named). Confirm: card appears (it was not silently allowed).
308
+ 5. Trigger an action that calls a safe tool (not dangerous-named, not in
309
+ dangerous_tools list). Confirm: no card appears, tool runs immediately.
310
+ 6. Click Approve twice on the same card (simulate double-click).
311
+ Confirm: second click returns ALREADY_RESOLVED, nothing bad happens.
312
+
313
+ THINGS YOU MUST NOT DO:
314
+ - Do not use @gate (the old top-level decorator — it no longer exists).
315
+ Use @guard.gate if you need to force-gate one specific function.
316
+ - Do not create more than one AgentGuard instance.
317
+ - Do not modify AgentGuard's internal code (core.py).
318
+ - Do not auto-retry a denied action.
319
+ - Do not set timeout=None — always use a finite number.
320
+ - Do not run the agent process and the web server in separate processes
321
+ unless you flag this to me explicitly, because cross-process approval
322
+ is not supported out of the box.
323
+ - Do not invent a new UI style — match what already exists.
324
+
325
+ After all five steps are complete and verified, give me:
326
+ 1. A list of every file changed and exactly what was changed in each.
327
+ 2. A list of any tools you found that are dangerous but NOT currently
328
+ covered by either the name classifier or the dangerous_tools list,
329
+ so I can decide whether to add them.
330
+ 3. Any architectural concern you noticed — specifically whether the agent
331
+ and web server run in the same process, and if not, what that means
332
+ for this integration.
333
+ ```
334
+
335
+ ## Known limitations
336
+
337
+ Read this before relying on AgentGuard. These are real and current.
338
+
339
+ **Critical / high — do not ignore:**
340
+
341
+ - **Ungated actions are also unlogged.** Only gated actions are written to the audit log. Combined with the point above, a dangerous-but-ordinarily-named action can run with no record at all.
342
+ - **Approved parameters are not frozen.** The values shown to the approver are held by reference. If they change between approval and execution, a different action can run than the one that was approved.
343
+ - **`confirm` plus `mode="external"` silently becomes callback mode.** If you pass both, the external/UI mode is ignored without warning. If your callback returns `True`, everything is auto-approved.
344
+ - **`timeout` defaults to waiting forever, and a wrong type crashes.** With no `timeout`, a parked external-mode request blocks indefinitely if nobody answers. A non-numeric `timeout` raises an error on the first gated call. Always pass a finite number.
345
+ - **Async use does not scale.** `aexecute()` parks each wait on a small shared thread pool. Many simultaneous approvals exhaust it and starve other async work, and a slow `on_request` callback blocks the event loop.
346
+ - **One process only.** External-mode approval works only when `execute()` and `resolve()` run in the same process. If you split them across processes or containers (agent in a worker, UI in a separate web server), the agent hangs until its timeout even after a human approves. There is no built-in cross-process support.
347
+
348
+ **Medium / low:**
349
+
350
+ - **Memory grows over time.** In external mode, resolved requests are kept in memory and never removed; a long-running server slowly accumulates them.
351
+ - **Parameters can leak secrets.** Tool parameters are written to the log and sent to the approval UI as-is. There is no redaction.
352
+ - **Name matching is fuzzy.** Substring matching over-flags some safe names (for example `evaluate_model`, `prune_old`). The audit log is a single local file with no rotation and no protection against multiple processes writing it at once.
353
+
354
+ **What you can honestly say it is:** a single-process, human-in-the-loop approval gate for AI tool calls, with terminal, callback, and same-process web-UI approval modes. It blocks when no one answers and when a timeout passes. Within one process, simultaneous approvals don't get mixed up and repeat approvals are safe.
355
+
356
+ **What it is not (yet):** not a security or authorization system, and not proven for production. It does not work across processes or containers, does not scale for heavy async use, does not keep a complete record of all activity, and does not catch dangerous actions by what they do — only by what they are named.
357
+
358
+ ## License
359
+
360
+ MIT.
361
+
362
+ ## Contributing
363
+
364
+ Issues and pull requests are welcome. If you change gating behavior, include a test that covers it (`python test_agentguard.py`, `python test_multiagent.py`, `python test_external_resolve.py`).