guardloop 0.2.0__tar.gz → 0.3.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- guardloop-0.3.0/CHANGELOG.md +82 -0
- {guardloop-0.2.0 → guardloop-0.3.0}/PKG-INFO +90 -20
- guardloop-0.3.0/README.md +218 -0
- {guardloop-0.2.0 → guardloop-0.3.0}/docs/design.md +31 -2
- guardloop-0.3.0/docs/project-overview.md +599 -0
- guardloop-0.3.0/docs/pypi-publishing.md +61 -0
- guardloop-0.3.0/docs/roadmap.md +59 -0
- guardloop-0.3.0/examples/verifier_retry_loop.py +54 -0
- {guardloop-0.2.0 → guardloop-0.3.0}/pyproject.toml +6 -3
- {guardloop-0.2.0 → guardloop-0.3.0}/src/guardloop/__init__.py +22 -0
- {guardloop-0.2.0 → guardloop-0.3.0}/src/guardloop/context.py +11 -1
- {guardloop-0.2.0 → guardloop-0.3.0}/src/guardloop/exceptions.py +34 -0
- {guardloop-0.2.0 → guardloop-0.3.0}/src/guardloop/models.py +3 -0
- guardloop-0.3.0/src/guardloop/runtime.py +385 -0
- {guardloop-0.2.0 → guardloop-0.3.0}/src/guardloop/telemetry/conventions.py +30 -0
- guardloop-0.3.0/src/guardloop/verifier.py +248 -0
- guardloop-0.3.0/tests/test_verifier.py +537 -0
- {guardloop-0.2.0 → guardloop-0.3.0}/uv.lock +1 -1
- guardloop-0.2.0/README.md +0 -148
- guardloop-0.2.0/docs/pypi-publishing.md +0 -66
- guardloop-0.2.0/docs/roadmap.md +0 -27
- guardloop-0.2.0/src/guardloop/runtime.py +0 -190
- {guardloop-0.2.0 → guardloop-0.3.0}/.github/workflows/publish-pypi.yml +0 -0
- {guardloop-0.2.0 → guardloop-0.3.0}/.gitignore +0 -0
- {guardloop-0.2.0 → guardloop-0.3.0}/LICENSE +0 -0
- {guardloop-0.2.0 → guardloop-0.3.0}/examples/live_anthropic_basic.py +0 -0
- {guardloop-0.2.0 → guardloop-0.3.0}/examples/live_openai_basic.py +0 -0
- {guardloop-0.2.0 → guardloop-0.3.0}/examples/runaway_cost_prevention.py +0 -0
- {guardloop-0.2.0 → guardloop-0.3.0}/examples/tool_circuit_breaker.py +0 -0
- {guardloop-0.2.0 → guardloop-0.3.0}/src/guardloop/budget.py +0 -0
- {guardloop-0.2.0 → guardloop-0.3.0}/src/guardloop/circuit_breaker.py +0 -0
- {guardloop-0.2.0 → guardloop-0.3.0}/src/guardloop/pricing.py +0 -0
- {guardloop-0.2.0 → guardloop-0.3.0}/src/guardloop/providers/__init__.py +0 -0
- {guardloop-0.2.0 → guardloop-0.3.0}/src/guardloop/providers/anthropic.py +0 -0
- {guardloop-0.2.0 → guardloop-0.3.0}/src/guardloop/providers/openai.py +0 -0
- {guardloop-0.2.0 → guardloop-0.3.0}/src/guardloop/py.typed +0 -0
- {guardloop-0.2.0 → guardloop-0.3.0}/src/guardloop/telemetry/__init__.py +0 -0
- {guardloop-0.2.0 → guardloop-0.3.0}/src/guardloop/telemetry/tracer.py +0 -0
- {guardloop-0.2.0 → guardloop-0.3.0}/src/guardloop/tokenization.py +0 -0
- {guardloop-0.2.0 → guardloop-0.3.0}/src/guardloop/tools.py +0 -0
- {guardloop-0.2.0 → guardloop-0.3.0}/tests/__init__.py +0 -0
- {guardloop-0.2.0 → guardloop-0.3.0}/tests/fakes.py +0 -0
- {guardloop-0.2.0 → guardloop-0.3.0}/tests/test_budget.py +0 -0
- {guardloop-0.2.0 → guardloop-0.3.0}/tests/test_circuit_breaker.py +0 -0
- {guardloop-0.2.0 → guardloop-0.3.0}/tests/test_providers.py +0 -0
- {guardloop-0.2.0 → guardloop-0.3.0}/tests/test_runtime.py +0 -0
- {guardloop-0.2.0 → guardloop-0.3.0}/tests/test_telemetry.py +0 -0
|
@@ -0,0 +1,82 @@
|
|
|
1
|
+
# Changelog
|
|
2
|
+
|
|
3
|
+
All notable changes to GuardLoop are documented here. The format is based on
|
|
4
|
+
[Keep a Changelog](https://keepachangelog.com/en/1.1.0/), and this project
|
|
5
|
+
follows [Semantic Versioning](https://semver.org/spec/v2.0.0.html) (pre-1.0:
|
|
6
|
+
minor releases may include breaking changes).
|
|
7
|
+
|
|
8
|
+
## [0.3.0] - 2026-05-10
|
|
9
|
+
|
|
10
|
+
### Added
|
|
11
|
+
|
|
12
|
+
- **Verifier retry loop (Pillar 3 / self-healing).** After an agent finishes,
|
|
13
|
+
GuardLoop can run a chain of verifiers against the output; on rejection it
|
|
14
|
+
appends the verifier's feedback to `RunContext.retry_feedback` and re-invokes
|
|
15
|
+
the agent, bounded by `VerifierConfig.max_retries`. All attempts share the
|
|
16
|
+
same budget (cost / tokens / time / tool calls) and the run's single
|
|
17
|
+
`asyncio.timeout`, so a verifier loop cannot bypass any guardrail.
|
|
18
|
+
- New module `guardloop.verifier` with public exports: `Verifier` (callable
|
|
19
|
+
type alias — sync or async, returning `VerifierResult`, `bool`, or `None`),
|
|
20
|
+
`VerifierResult`, `VerifierContext`, `VerifierConfig`, and `VerifierChain`.
|
|
21
|
+
- Built-in rule-based verifier factories: `non_empty()`, `matches_regex(...)`,
|
|
22
|
+
`is_json_object(required_keys=...)`.
|
|
23
|
+
- `GuardLoop(verifiers=[...], verifier_config=VerifierConfig(...))` constructor
|
|
24
|
+
parameters and `GuardLoop.add_verifier(fn)`.
|
|
25
|
+
- `RunResult` fields: `verification_passed: bool | None`,
|
|
26
|
+
`verification_attempts: int`, `verification_feedback: list[str]`.
|
|
27
|
+
- `RunContext.retry_feedback: list[str]` and `RunContext.attempt: int`.
|
|
28
|
+
- New exceptions `VerificationFailed` (`terminated_reason="verification_failed"`,
|
|
29
|
+
raised only in strict mode) and `VerifierExecutionError`
|
|
30
|
+
(`terminated_reason="verifier_error"`, raised when a verifier itself throws).
|
|
31
|
+
- OpenTelemetry: `verifier_run <name>` child spans, `agent_run` attributes
|
|
32
|
+
`guardloop.verification.passed` / `guardloop.verification.attempts`, and
|
|
33
|
+
`guardloop.verification.failed` / `.retrying` / `.exhausted` span events.
|
|
34
|
+
- No-key demo `examples/verifier_retry_loop.py`.
|
|
35
|
+
|
|
36
|
+
### Changed
|
|
37
|
+
|
|
38
|
+
- When verification ultimately fails (retries exhausted), `RunResult.success`
|
|
39
|
+
is `False` with `terminated_reason="verification_failed"`, but `output` still
|
|
40
|
+
holds the last attempt's text — consistent with how budget/timeout stops
|
|
41
|
+
report. Set `VerifierConfig(raise_on_failure=True)` for strict behavior
|
|
42
|
+
(surfaces a `VerificationFailed` with `output=None` and details in
|
|
43
|
+
`metadata`).
|
|
44
|
+
- `pyproject.toml`: `Changelog` URL now points at this file.
|
|
45
|
+
|
|
46
|
+
## [0.2.0] - 2026
|
|
47
|
+
|
|
48
|
+
### Added
|
|
49
|
+
|
|
50
|
+
- Per-tool circuit breakers with `closed` / `open` / `half_open` states, a
|
|
51
|
+
global default policy plus per-tool overrides, breaker state that persists on
|
|
52
|
+
the `GuardLoop` instance across runs, and `runtime.circuit_breaker_snapshots()`
|
|
53
|
+
/ `runtime.reset_circuit_breakers()`.
|
|
54
|
+
- `ctx.call_tool(...)` / `ctx.wrap_tool(...)` route tool calls through the
|
|
55
|
+
breaker before the tool-call budget is incremented.
|
|
56
|
+
- `CircuitBreakerOpen` exception and circuit-breaker OpenTelemetry attributes
|
|
57
|
+
on tool spans.
|
|
58
|
+
- No-key demo `examples/tool_circuit_breaker.py`.
|
|
59
|
+
|
|
60
|
+
## [0.1.0] - 2026
|
|
61
|
+
|
|
62
|
+
### Added
|
|
63
|
+
|
|
64
|
+
- Async runtime wrapper: `GuardLoop.run(agent, ...)` returns a structured
|
|
65
|
+
`RunResult`; controlled stops become `success=False` with a
|
|
66
|
+
`terminated_reason` instead of raised exceptions.
|
|
67
|
+
- Hard budget caps for cost (`Decimal`), tokens, wall-clock time, and tool
|
|
68
|
+
calls, enforced pre-flight before each LLM request.
|
|
69
|
+
- Direct wrappers for `AsyncOpenAI.responses.create` and
|
|
70
|
+
`AsyncAnthropic.messages.create` with usage accounting and pricing.
|
|
71
|
+
- OpenTelemetry spans for agent runs, LLM calls, and tool calls (core depends
|
|
72
|
+
only on `opentelemetry-api`; exporters via the `otel` extra).
|
|
73
|
+
- Public exception hierarchy: `GuardLoopError`, `BudgetExceeded`,
|
|
74
|
+
`TokenLimitExceeded`, `ToolCallLimitExceeded`, `TimeLimitExceeded`,
|
|
75
|
+
`ModelPricingMissing`, `TokenLimitMissing`; `AgentRuntime` / `AgentRuntimeError`
|
|
76
|
+
compatibility aliases.
|
|
77
|
+
- No-key demo `examples/runaway_cost_prevention.py`; packaged and published to
|
|
78
|
+
PyPI via GitHub Actions OIDC Trusted Publishing.
|
|
79
|
+
|
|
80
|
+
[0.3.0]: https://github.com/awesome-pro/guardloop/releases/tag/v0.3.0
|
|
81
|
+
[0.2.0]: https://github.com/awesome-pro/guardloop/releases/tag/v0.2.0
|
|
82
|
+
[0.1.0]: https://github.com/awesome-pro/guardloop/releases/tag/v0.1.0
|
|
@@ -1,17 +1,17 @@
|
|
|
1
1
|
Metadata-Version: 2.4
|
|
2
2
|
Name: guardloop
|
|
3
|
-
Version: 0.
|
|
4
|
-
Summary: A production runtime guardrail for AI agents: budget caps, timeouts, tool limits, and OpenTelemetry traces.
|
|
3
|
+
Version: 0.3.0
|
|
4
|
+
Summary: A production runtime guardrail for AI agents: budget caps, timeouts, tool limits, circuit breakers, verifier retries, and OpenTelemetry traces.
|
|
5
5
|
Project-URL: Homepage, https://github.com/awesome-pro/guardloop
|
|
6
6
|
Project-URL: Documentation, https://github.com/awesome-pro/guardloop#readme
|
|
7
7
|
Project-URL: Repository, https://github.com/awesome-pro/guardloop
|
|
8
8
|
Project-URL: Issues, https://github.com/awesome-pro/guardloop/issues
|
|
9
|
-
Project-URL: Changelog, https://github.com/awesome-pro/guardloop/
|
|
9
|
+
Project-URL: Changelog, https://github.com/awesome-pro/guardloop/blob/main/CHANGELOG.md
|
|
10
10
|
Author-email: awesome-pro <147910430+awesome-pro@users.noreply.github.com>
|
|
11
11
|
Maintainer-email: awesome-pro <147910430+awesome-pro@users.noreply.github.com>
|
|
12
12
|
License-Expression: MIT
|
|
13
13
|
License-File: LICENSE
|
|
14
|
-
Keywords: agentic-ai,ai-agents,ai-safety,anthropic,circuit-breaker,llm,mlops,openai,opentelemetry,runtime-guardrails
|
|
14
|
+
Keywords: agentic-ai,ai-agents,ai-safety,anthropic,circuit-breaker,llm,mlops,openai,opentelemetry,retry,runtime-guardrails,self-healing,verifier
|
|
15
15
|
Classifier: Development Status :: 3 - Alpha
|
|
16
16
|
Classifier: Framework :: AsyncIO
|
|
17
17
|
Classifier: Intended Audience :: Developers
|
|
@@ -42,12 +42,15 @@ Description-Content-Type: text/markdown
|
|
|
42
42
|
|
|
43
43
|
GuardLoop is a production runtime guardrail for AI agents. It wraps model
|
|
44
44
|
clients and tools with hard budget caps, timeout control, tool-call limits, and
|
|
45
|
-
per-tool circuit breakers,
|
|
46
|
-
|
|
47
|
-
|
|
45
|
+
per-tool circuit breakers, re-runs an agent against verifiers until the output
|
|
46
|
+
passes, and emits OpenTelemetry traces for every protected call. Runaway agent
|
|
47
|
+
loops can be stopped before they burn through money, flaky tools can be cut off
|
|
48
|
+
before an agent retries them into a bigger incident, and confidently-wrong
|
|
49
|
+
answers get a second pass.
|
|
48
50
|
|
|
49
|
-
The v0.
|
|
50
|
-
agents**
|
|
51
|
+
The v0.3 focus is intentionally sharp: **runtime guardrails for async Python
|
|
52
|
+
agents** — direct OpenAI and Anthropic wrappers, protected tool calls, per-tool
|
|
53
|
+
circuit breakers, and a verify-fix-retry loop.
|
|
51
54
|
|
|
52
55
|
```python
|
|
53
56
|
from guardloop import (
|
|
@@ -56,6 +59,8 @@ from guardloop import (
|
|
|
56
59
|
CircuitBreakerConfig,
|
|
57
60
|
CircuitBreakerPolicy,
|
|
58
61
|
RunContext,
|
|
62
|
+
VerifierConfig,
|
|
63
|
+
is_json_object,
|
|
59
64
|
)
|
|
60
65
|
|
|
61
66
|
runtime = GuardLoop(
|
|
@@ -71,13 +76,18 @@ runtime = GuardLoop(
|
|
|
71
76
|
recovery_timeout_seconds=30,
|
|
72
77
|
)
|
|
73
78
|
),
|
|
79
|
+
verifiers=[is_json_object(required_keys=["answer"])],
|
|
80
|
+
verifier_config=VerifierConfig(max_retries=2),
|
|
74
81
|
)
|
|
75
82
|
|
|
76
83
|
|
|
77
84
|
async def agent(ctx: RunContext, prompt: str) -> str:
|
|
85
|
+
instructions = prompt
|
|
86
|
+
if ctx.retry_feedback:
|
|
87
|
+
instructions += "\n\nFix the previous attempt: " + "; ".join(ctx.retry_feedback)
|
|
78
88
|
response = await ctx.openai.responses.create(
|
|
79
89
|
model="gpt-5.2",
|
|
80
|
-
input=
|
|
90
|
+
input=instructions,
|
|
81
91
|
max_output_tokens=300,
|
|
82
92
|
)
|
|
83
93
|
return str(response.output_text)
|
|
@@ -98,16 +108,60 @@ flowchart LR
|
|
|
98
108
|
U["User code"] --> R["GuardLoop"]
|
|
99
109
|
R --> B["BudgetController"]
|
|
100
110
|
R --> CB["CircuitBreakerRegistry"]
|
|
111
|
+
R --> V["VerifierChain"]
|
|
101
112
|
R --> T["OpenTelemetry spans"]
|
|
102
113
|
R --> C["RunContext"]
|
|
103
114
|
C --> O["Wrapped OpenAI client"]
|
|
104
115
|
C --> A["Wrapped Anthropic client"]
|
|
105
116
|
C --> W["Wrapped tools"]
|
|
117
|
+
V -. "feedback on retry" .-> C
|
|
106
118
|
```
|
|
107
119
|
|
|
120
|
+
## Verifier Retry Loop
|
|
121
|
+
|
|
122
|
+
Agents can return confidently wrong answers. Attach verifiers — plain callables,
|
|
123
|
+
sync or async — and GuardLoop runs them after the agent finishes. On rejection
|
|
124
|
+
it feeds the verifier's feedback into `ctx.retry_feedback` and re-invokes the
|
|
125
|
+
agent, up to `VerifierConfig.max_retries` times. Every attempt shares the same
|
|
126
|
+
budget and the run's timeout, so the retry loop can never spend past a cap.
|
|
127
|
+
|
|
128
|
+
```python
|
|
129
|
+
from guardloop import GuardLoop, RunContext, VerifierConfig, VerifierContext, VerifierResult
|
|
130
|
+
|
|
131
|
+
|
|
132
|
+
def no_todo(output: object, ctx: VerifierContext) -> VerifierResult:
|
|
133
|
+
if "TODO" in str(output):
|
|
134
|
+
return VerifierResult(passed=False, feedback="Replace the TODO placeholder.")
|
|
135
|
+
return VerifierResult(passed=True)
|
|
136
|
+
|
|
137
|
+
|
|
138
|
+
runtime = GuardLoop(verifiers=[no_todo], verifier_config=VerifierConfig(max_retries=2))
|
|
139
|
+
|
|
140
|
+
|
|
141
|
+
async def agent(ctx: RunContext, task: str) -> str:
|
|
142
|
+
# On a retry, ctx.retry_feedback holds the verifier's complaints — read it.
|
|
143
|
+
...
|
|
144
|
+
|
|
145
|
+
|
|
146
|
+
result = await runtime.run(agent, "draft the release notes")
|
|
147
|
+
print(result.verification_passed, result.verification_attempts, result.verification_feedback)
|
|
148
|
+
```
|
|
149
|
+
|
|
150
|
+
Built-in rule-based verifiers ship in `guardloop`: `non_empty()`,
|
|
151
|
+
`matches_regex(...)`, `is_json_object(required_keys=...)`. By default an output
|
|
152
|
+
that fails every retry comes back as `success=False` with
|
|
153
|
+
`terminated_reason="verification_failed"` but with `output` still populated;
|
|
154
|
+
set `VerifierConfig(raise_on_failure=True)` for a hard stop.
|
|
155
|
+
|
|
156
|
+
## Project Guide
|
|
157
|
+
|
|
158
|
+
For a deeper walkthrough of what has been implemented, how the code is
|
|
159
|
+
organized, and what the next roadmap goals are, read
|
|
160
|
+
[docs/project-overview.md](docs/project-overview.md).
|
|
161
|
+
|
|
108
162
|
## Install
|
|
109
163
|
|
|
110
|
-
|
|
164
|
+
Install from PyPI:
|
|
111
165
|
|
|
112
166
|
```bash
|
|
113
167
|
pip install guardloop
|
|
@@ -147,6 +201,15 @@ uv run python examples/tool_circuit_breaker.py
|
|
|
147
201
|
This demo uses a failing fake tool. GuardLoop allows the first failures,
|
|
148
202
|
opens the circuit breaker, then rejects the next call without invoking the tool.
|
|
149
203
|
|
|
204
|
+
```bash
|
|
205
|
+
uv run python examples/verifier_retry_loop.py
|
|
206
|
+
```
|
|
207
|
+
|
|
208
|
+
This demo's agent first returns a bad answer (a `TODO` placeholder, then
|
|
209
|
+
malformed JSON). A verifier chain rejects it with feedback, the agent reads
|
|
210
|
+
`ctx.retry_feedback` and self-corrects, and the run ends with
|
|
211
|
+
`verification_passed: true` after three attempts.
|
|
212
|
+
|
|
150
213
|
## Live Provider Smoke Tests
|
|
151
214
|
|
|
152
215
|
```bash
|
|
@@ -169,20 +232,27 @@ uv run ruff format --check .
|
|
|
169
232
|
uv run pyright
|
|
170
233
|
```
|
|
171
234
|
|
|
172
|
-
## v0.
|
|
235
|
+
## v0.3 Scope
|
|
173
236
|
|
|
174
237
|
- Async Python runtime with `src/` package layout.
|
|
175
238
|
- Hard caps for cost, tokens, time, and tool calls.
|
|
176
|
-
- Per-tool circuit breakers with closed, open, and half-open states
|
|
177
|
-
|
|
178
|
-
-
|
|
179
|
-
-
|
|
180
|
-
-
|
|
239
|
+
- Per-tool circuit breakers with closed, open, and half-open states; global
|
|
240
|
+
default breaker policy plus per-tool overrides.
|
|
241
|
+
- Verify-fix-retry loop: sync or async output verifiers, fail-fast chains,
|
|
242
|
+
built-in rule-based verifiers, feedback into `ctx.retry_feedback`, and an
|
|
243
|
+
opt-in strict mode — all attempts share one budget and the run timeout.
|
|
244
|
+
- Direct wrappers for `AsyncOpenAI.responses.create` and
|
|
245
|
+
`AsyncAnthropic.messages.create`.
|
|
246
|
+
- OpenTelemetry spans for agent runs, LLM calls, tools, and verifiers.
|
|
181
247
|
- Fake-client tests and demos that do not require API keys.
|
|
182
248
|
|
|
183
249
|
## Roadmap
|
|
184
250
|
|
|
185
|
-
- v0.2: per-tool circuit breakers.
|
|
186
|
-
- v0.3:
|
|
251
|
+
- v0.2: per-tool circuit breakers. ✅
|
|
252
|
+
- v0.3: verify-fix-retry loop. ✅
|
|
187
253
|
- v0.4: LangGraph and OpenAI Agents SDK adapters.
|
|
188
|
-
- v0.5: Jaeger/Phoenix trace screenshots,
|
|
254
|
+
- v0.5: Jaeger/Phoenix trace screenshots, demo video, and blog post.
|
|
255
|
+
- v0.6: persistent breaker state, YAML/TOML policy, multi-model pricing, loop detection.
|
|
256
|
+
- v1.0: stable API, changelog, docs site, release checklist.
|
|
257
|
+
|
|
258
|
+
See [docs/roadmap.md](docs/roadmap.md) for details.
|
|
@@ -0,0 +1,218 @@
|
|
|
1
|
+
# GuardLoop
|
|
2
|
+
|
|
3
|
+
GuardLoop is a production runtime guardrail for AI agents. It wraps model
|
|
4
|
+
clients and tools with hard budget caps, timeout control, tool-call limits, and
|
|
5
|
+
per-tool circuit breakers, re-runs an agent against verifiers until the output
|
|
6
|
+
passes, and emits OpenTelemetry traces for every protected call. Runaway agent
|
|
7
|
+
loops can be stopped before they burn through money, flaky tools can be cut off
|
|
8
|
+
before an agent retries them into a bigger incident, and confidently-wrong
|
|
9
|
+
answers get a second pass.
|
|
10
|
+
|
|
11
|
+
The v0.3 focus is intentionally sharp: **runtime guardrails for async Python
|
|
12
|
+
agents** — direct OpenAI and Anthropic wrappers, protected tool calls, per-tool
|
|
13
|
+
circuit breakers, and a verify-fix-retry loop.
|
|
14
|
+
|
|
15
|
+
```python
|
|
16
|
+
from guardloop import (
|
|
17
|
+
GuardLoop,
|
|
18
|
+
BudgetConfig,
|
|
19
|
+
CircuitBreakerConfig,
|
|
20
|
+
CircuitBreakerPolicy,
|
|
21
|
+
RunContext,
|
|
22
|
+
VerifierConfig,
|
|
23
|
+
is_json_object,
|
|
24
|
+
)
|
|
25
|
+
|
|
26
|
+
runtime = GuardLoop(
|
|
27
|
+
budget=BudgetConfig(
|
|
28
|
+
cost_limit_usd="0.10",
|
|
29
|
+
token_limit=10_000,
|
|
30
|
+
time_limit_seconds=60,
|
|
31
|
+
tool_call_limit=20,
|
|
32
|
+
),
|
|
33
|
+
circuit_breakers=CircuitBreakerConfig(
|
|
34
|
+
default=CircuitBreakerPolicy(
|
|
35
|
+
failure_threshold=3,
|
|
36
|
+
recovery_timeout_seconds=30,
|
|
37
|
+
)
|
|
38
|
+
),
|
|
39
|
+
verifiers=[is_json_object(required_keys=["answer"])],
|
|
40
|
+
verifier_config=VerifierConfig(max_retries=2),
|
|
41
|
+
)
|
|
42
|
+
|
|
43
|
+
|
|
44
|
+
async def agent(ctx: RunContext, prompt: str) -> str:
|
|
45
|
+
instructions = prompt
|
|
46
|
+
if ctx.retry_feedback:
|
|
47
|
+
instructions += "\n\nFix the previous attempt: " + "; ".join(ctx.retry_feedback)
|
|
48
|
+
response = await ctx.openai.responses.create(
|
|
49
|
+
model="gpt-5.2",
|
|
50
|
+
input=instructions,
|
|
51
|
+
max_output_tokens=300,
|
|
52
|
+
)
|
|
53
|
+
return str(response.output_text)
|
|
54
|
+
|
|
55
|
+
|
|
56
|
+
result = await runtime.run(agent, "research agent runtime safety")
|
|
57
|
+
print(result.model_dump_json(indent=2))
|
|
58
|
+
```
|
|
59
|
+
|
|
60
|
+
## Why This Exists
|
|
61
|
+
|
|
62
|
+
Agents are loops around probabilistic systems. When they go wrong, they can call
|
|
63
|
+
the same model or tool repeatedly, spend unexpected money, and fail without a
|
|
64
|
+
clear trace. GuardLoop puts an explicit execution layer around that loop:
|
|
65
|
+
|
|
66
|
+
```mermaid
|
|
67
|
+
flowchart LR
|
|
68
|
+
U["User code"] --> R["GuardLoop"]
|
|
69
|
+
R --> B["BudgetController"]
|
|
70
|
+
R --> CB["CircuitBreakerRegistry"]
|
|
71
|
+
R --> V["VerifierChain"]
|
|
72
|
+
R --> T["OpenTelemetry spans"]
|
|
73
|
+
R --> C["RunContext"]
|
|
74
|
+
C --> O["Wrapped OpenAI client"]
|
|
75
|
+
C --> A["Wrapped Anthropic client"]
|
|
76
|
+
C --> W["Wrapped tools"]
|
|
77
|
+
V -. "feedback on retry" .-> C
|
|
78
|
+
```
|
|
79
|
+
|
|
80
|
+
## Verifier Retry Loop
|
|
81
|
+
|
|
82
|
+
Agents can return confidently wrong answers. Attach verifiers — plain callables,
|
|
83
|
+
sync or async — and GuardLoop runs them after the agent finishes. On rejection
|
|
84
|
+
it feeds the verifier's feedback into `ctx.retry_feedback` and re-invokes the
|
|
85
|
+
agent, up to `VerifierConfig.max_retries` times. Every attempt shares the same
|
|
86
|
+
budget and the run's timeout, so the retry loop can never spend past a cap.
|
|
87
|
+
|
|
88
|
+
```python
|
|
89
|
+
from guardloop import GuardLoop, RunContext, VerifierConfig, VerifierContext, VerifierResult
|
|
90
|
+
|
|
91
|
+
|
|
92
|
+
def no_todo(output: object, ctx: VerifierContext) -> VerifierResult:
|
|
93
|
+
if "TODO" in str(output):
|
|
94
|
+
return VerifierResult(passed=False, feedback="Replace the TODO placeholder.")
|
|
95
|
+
return VerifierResult(passed=True)
|
|
96
|
+
|
|
97
|
+
|
|
98
|
+
runtime = GuardLoop(verifiers=[no_todo], verifier_config=VerifierConfig(max_retries=2))
|
|
99
|
+
|
|
100
|
+
|
|
101
|
+
async def agent(ctx: RunContext, task: str) -> str:
|
|
102
|
+
# On a retry, ctx.retry_feedback holds the verifier's complaints — read it.
|
|
103
|
+
...
|
|
104
|
+
|
|
105
|
+
|
|
106
|
+
result = await runtime.run(agent, "draft the release notes")
|
|
107
|
+
print(result.verification_passed, result.verification_attempts, result.verification_feedback)
|
|
108
|
+
```
|
|
109
|
+
|
|
110
|
+
Built-in rule-based verifiers ship in `guardloop`: `non_empty()`,
|
|
111
|
+
`matches_regex(...)`, `is_json_object(required_keys=...)`. By default an output
|
|
112
|
+
that fails every retry comes back as `success=False` with
|
|
113
|
+
`terminated_reason="verification_failed"` but with `output` still populated;
|
|
114
|
+
set `VerifierConfig(raise_on_failure=True)` for a hard stop.
|
|
115
|
+
|
|
116
|
+
## Project Guide
|
|
117
|
+
|
|
118
|
+
For a deeper walkthrough of what has been implemented, how the code is
|
|
119
|
+
organized, and what the next roadmap goals are, read
|
|
120
|
+
[docs/project-overview.md](docs/project-overview.md).
|
|
121
|
+
|
|
122
|
+
## Install
|
|
123
|
+
|
|
124
|
+
Install from PyPI:
|
|
125
|
+
|
|
126
|
+
```bash
|
|
127
|
+
pip install guardloop
|
|
128
|
+
```
|
|
129
|
+
|
|
130
|
+
For local development:
|
|
131
|
+
|
|
132
|
+
```bash
|
|
133
|
+
uv sync
|
|
134
|
+
```
|
|
135
|
+
|
|
136
|
+
Optional OpenTelemetry exporters are available through the `otel` extra:
|
|
137
|
+
|
|
138
|
+
```bash
|
|
139
|
+
pip install "guardloop[otel]"
|
|
140
|
+
```
|
|
141
|
+
|
|
142
|
+
For local development with the extra:
|
|
143
|
+
|
|
144
|
+
```bash
|
|
145
|
+
uv sync --extra otel
|
|
146
|
+
```
|
|
147
|
+
|
|
148
|
+
## Try the No-Key Demo
|
|
149
|
+
|
|
150
|
+
```bash
|
|
151
|
+
uv run python examples/runaway_cost_prevention.py
|
|
152
|
+
```
|
|
153
|
+
|
|
154
|
+
The demo uses a fake OpenAI-compatible client and intentionally loops forever.
|
|
155
|
+
GuardLoop stops it when the next model request would exceed the cost cap.
|
|
156
|
+
|
|
157
|
+
```bash
|
|
158
|
+
uv run python examples/tool_circuit_breaker.py
|
|
159
|
+
```
|
|
160
|
+
|
|
161
|
+
This demo uses a failing fake tool. GuardLoop allows the first failures,
|
|
162
|
+
opens the circuit breaker, then rejects the next call without invoking the tool.
|
|
163
|
+
|
|
164
|
+
```bash
|
|
165
|
+
uv run python examples/verifier_retry_loop.py
|
|
166
|
+
```
|
|
167
|
+
|
|
168
|
+
This demo's agent first returns a bad answer (a `TODO` placeholder, then
|
|
169
|
+
malformed JSON). A verifier chain rejects it with feedback, the agent reads
|
|
170
|
+
`ctx.retry_feedback` and self-corrects, and the run ends with
|
|
171
|
+
`verification_passed: true` after three attempts.
|
|
172
|
+
|
|
173
|
+
## Live Provider Smoke Tests
|
|
174
|
+
|
|
175
|
+
```bash
|
|
176
|
+
export OPENAI_API_KEY="..."
|
|
177
|
+
export ANTHROPIC_API_KEY="..."
|
|
178
|
+
|
|
179
|
+
uv run python examples/live_openai_basic.py
|
|
180
|
+
uv run python examples/live_anthropic_basic.py
|
|
181
|
+
```
|
|
182
|
+
|
|
183
|
+
Both live examples can be customized with `OPENAI_MODEL` or `ANTHROPIC_MODEL`.
|
|
184
|
+
|
|
185
|
+
## Quality Gates
|
|
186
|
+
|
|
187
|
+
```bash
|
|
188
|
+
uv run pytest
|
|
189
|
+
uv run pytest --cov=guardloop
|
|
190
|
+
uv run ruff check .
|
|
191
|
+
uv run ruff format --check .
|
|
192
|
+
uv run pyright
|
|
193
|
+
```
|
|
194
|
+
|
|
195
|
+
## v0.3 Scope
|
|
196
|
+
|
|
197
|
+
- Async Python runtime with `src/` package layout.
|
|
198
|
+
- Hard caps for cost, tokens, time, and tool calls.
|
|
199
|
+
- Per-tool circuit breakers with closed, open, and half-open states; global
|
|
200
|
+
default breaker policy plus per-tool overrides.
|
|
201
|
+
- Verify-fix-retry loop: sync or async output verifiers, fail-fast chains,
|
|
202
|
+
built-in rule-based verifiers, feedback into `ctx.retry_feedback`, and an
|
|
203
|
+
opt-in strict mode — all attempts share one budget and the run timeout.
|
|
204
|
+
- Direct wrappers for `AsyncOpenAI.responses.create` and
|
|
205
|
+
`AsyncAnthropic.messages.create`.
|
|
206
|
+
- OpenTelemetry spans for agent runs, LLM calls, tools, and verifiers.
|
|
207
|
+
- Fake-client tests and demos that do not require API keys.
|
|
208
|
+
|
|
209
|
+
## Roadmap
|
|
210
|
+
|
|
211
|
+
- v0.2: per-tool circuit breakers. ✅
|
|
212
|
+
- v0.3: verify-fix-retry loop. ✅
|
|
213
|
+
- v0.4: LangGraph and OpenAI Agents SDK adapters.
|
|
214
|
+
- v0.5: Jaeger/Phoenix trace screenshots, demo video, and blog post.
|
|
215
|
+
- v0.6: persistent breaker state, YAML/TOML policy, multi-model pricing, loop detection.
|
|
216
|
+
- v1.0: stable API, changelog, docs site, release checklist.
|
|
217
|
+
|
|
218
|
+
See [docs/roadmap.md](docs/roadmap.md) for details.
|
|
@@ -1,4 +1,4 @@
|
|
|
1
|
-
# GuardLoop
|
|
1
|
+
# GuardLoop Design
|
|
2
2
|
|
|
3
3
|
GuardLoop is a wrapper, not an agent framework. A user passes an async agent
|
|
4
4
|
callable to `runtime.run()`. The runtime creates a `RunContext` containing
|
|
@@ -40,9 +40,38 @@ rejections do not count as tool failures.
|
|
|
40
40
|
Built-in prices are defaults, not truth forever. Callers can pass
|
|
41
41
|
`ModelPricing` entries to override or add models as providers update pricing.
|
|
42
42
|
|
|
43
|
+
## Verifier Retry Loop
|
|
44
|
+
|
|
45
|
+
Verifiers are stateless callables (sync or async) that judge an agent's output.
|
|
46
|
+
A `VerifierChain` runs them in order, fail-fast: the first failing verdict wins.
|
|
47
|
+
Anything not a `VerifierResult` is normalized (`True`/`None` -> passed,
|
|
48
|
+
`False` -> failed). If a verifier itself raises, that is a verifier bug, not the
|
|
49
|
+
agent's: the runtime surfaces it as `VerifierExecutionError`
|
|
50
|
+
(`terminated_reason="verifier_error"`) and does not retry.
|
|
51
|
+
|
|
52
|
+
The runtime owns the loop, not the agent. One `BudgetController` and one
|
|
53
|
+
`RunContext` flow through every attempt; the only mutation between attempts is
|
|
54
|
+
appending the failing verifier's feedback to `ctx.retry_feedback` (and bumping
|
|
55
|
+
`ctx.attempt`). The agent is re-invoked with the same `*args`/`**kwargs` and is
|
|
56
|
+
expected to read `ctx.retry_feedback` if it wants to self-correct. Because the
|
|
57
|
+
budget is shared and the whole loop sits inside the run's single
|
|
58
|
+
`asyncio.timeout()`, a verifier loop can never spend past a cap or outlive the
|
|
59
|
+
time limit.
|
|
60
|
+
|
|
61
|
+
When retries are exhausted: by default the runtime returns
|
|
62
|
+
`RunResult(success=False, terminated_reason="verification_failed",
|
|
63
|
+
verification_passed=False)` with `output` still set to the last attempt — the
|
|
64
|
+
agent produced an answer, it just isn't trusted. With
|
|
65
|
+
`VerifierConfig(raise_on_failure=True)` the runtime instead surfaces a
|
|
66
|
+
`VerificationFailed` (same `terminated_reason`, `output=None`, attempt count and
|
|
67
|
+
feedback in `metadata`).
|
|
68
|
+
|
|
43
69
|
## Telemetry
|
|
44
70
|
|
|
45
71
|
Provider wrappers emit OpenTelemetry spans through a small conventions module.
|
|
46
72
|
This keeps GenAI semantic convention names isolated while the standard evolves.
|
|
47
73
|
Tool spans also include circuit breaker state, failure count, and whether a
|
|
48
|
-
call was blocked.
|
|
74
|
+
call was blocked. Each verifier runs in a `verifier_run <name>` child span; the
|
|
75
|
+
root `agent_run` span carries `guardloop.verification.passed` /
|
|
76
|
+
`guardloop.verification.attempts` plus `guardloop.verification.failed`,
|
|
77
|
+
`.retrying`, and `.exhausted` events.
|