PyPI - guardloop - Versions diffs - 0.2.0__tar.gz → 0.3.0__tar.gz - Mend

guardloop 0.2.0tar.gz → 0.3.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (47) hide show

guardloop-0.3.0/CHANGELOG.md ADDED Viewed

@@ -0,0 +1,82 @@
+# Changelog
+All notable changes to GuardLoop are documented here. The format is based on
+[Keep a Changelog](https://keepachangelog.com/en/1.1.0/), and this project
+follows [Semantic Versioning](https://semver.org/spec/v2.0.0.html) (pre-1.0:
+minor releases may include breaking changes).
+## [0.3.0] - 2026-05-10
+### Added
+- **Verifier retry loop (Pillar 3 / self-healing).** After an agent finishes,
+  GuardLoop can run a chain of verifiers against the output; on rejection it
+  appends the verifier's feedback to `RunContext.retry_feedback` and re-invokes
+  the agent, bounded by `VerifierConfig.max_retries`. All attempts share the
+  same budget (cost / tokens / time / tool calls) and the run's single
+  `asyncio.timeout`, so a verifier loop cannot bypass any guardrail.
+- New module `guardloop.verifier` with public exports: `Verifier` (callable
+  type alias — sync or async, returning `VerifierResult`, `bool`, or `None`),
+  `VerifierResult`, `VerifierContext`, `VerifierConfig`, and `VerifierChain`.
+- Built-in rule-based verifier factories: `non_empty()`, `matches_regex(...)`,
+  `is_json_object(required_keys=...)`.
+- `GuardLoop(verifiers=[...], verifier_config=VerifierConfig(...))` constructor
+  parameters and `GuardLoop.add_verifier(fn)`.
+- `RunResult` fields: `verification_passed: bool | None`,
+  `verification_attempts: int`, `verification_feedback: list[str]`.
+- `RunContext.retry_feedback: list[str]` and `RunContext.attempt: int`.
+- New exceptions `VerificationFailed` (`terminated_reason="verification_failed"`,
+  raised only in strict mode) and `VerifierExecutionError`
+  (`terminated_reason="verifier_error"`, raised when a verifier itself throws).
+- OpenTelemetry: `verifier_run <name>` child spans, `agent_run` attributes
+  `guardloop.verification.passed` / `guardloop.verification.attempts`, and
+  `guardloop.verification.failed` / `.retrying` / `.exhausted` span events.
+- No-key demo `examples/verifier_retry_loop.py`.
+### Changed
+- When verification ultimately fails (retries exhausted), `RunResult.success`
+  is `False` with `terminated_reason="verification_failed"`, but `output` still
+  holds the last attempt's text — consistent with how budget/timeout stops
+  report. Set `VerifierConfig(raise_on_failure=True)` for strict behavior
+  (surfaces a `VerificationFailed` with `output=None` and details in
+  `metadata`).
+- `pyproject.toml`: `Changelog` URL now points at this file.
+## [0.2.0] - 2026
+### Added
+- Per-tool circuit breakers with `closed` / `open` / `half_open` states, a
+  global default policy plus per-tool overrides, breaker state that persists on
+  the `GuardLoop` instance across runs, and `runtime.circuit_breaker_snapshots()`
+  / `runtime.reset_circuit_breakers()`.
+- `ctx.call_tool(...)` / `ctx.wrap_tool(...)` route tool calls through the
+  breaker before the tool-call budget is incremented.
+- `CircuitBreakerOpen` exception and circuit-breaker OpenTelemetry attributes
+  on tool spans.
+- No-key demo `examples/tool_circuit_breaker.py`.
+## [0.1.0] - 2026
+### Added
+- Async runtime wrapper: `GuardLoop.run(agent, ...)` returns a structured
+  `RunResult`; controlled stops become `success=False` with a
+  `terminated_reason` instead of raised exceptions.
+- Hard budget caps for cost (`Decimal`), tokens, wall-clock time, and tool
+  calls, enforced pre-flight before each LLM request.
+- Direct wrappers for `AsyncOpenAI.responses.create` and
+  `AsyncAnthropic.messages.create` with usage accounting and pricing.
+- OpenTelemetry spans for agent runs, LLM calls, and tool calls (core depends
+  only on `opentelemetry-api`; exporters via the `otel` extra).
+- Public exception hierarchy: `GuardLoopError`, `BudgetExceeded`,
+  `TokenLimitExceeded`, `ToolCallLimitExceeded`, `TimeLimitExceeded`,
+  `ModelPricingMissing`, `TokenLimitMissing`; `AgentRuntime` / `AgentRuntimeError`
+  compatibility aliases.
+- No-key demo `examples/runaway_cost_prevention.py`; packaged and published to
+  PyPI via GitHub Actions OIDC Trusted Publishing.
+[0.3.0]: https://github.com/awesome-pro/guardloop/releases/tag/v0.3.0
+[0.2.0]: https://github.com/awesome-pro/guardloop/releases/tag/v0.2.0
+[0.1.0]: https://github.com/awesome-pro/guardloop/releases/tag/v0.1.0

{guardloop-0.2.0 → guardloop-0.3.0}/PKG-INFO RENAMED Viewed

@@ -1,17 +1,17 @@
 Metadata-Version: 2.4
 Name: guardloop
-Version: 0.2.0
-Summary: A production runtime guardrail for AI agents: budget caps, timeouts, tool limits, and OpenTelemetry traces.
+Version: 0.3.0
+Summary: A production runtime guardrail for AI agents: budget caps, timeouts, tool limits, circuit breakers, verifier retries, and OpenTelemetry traces.
 Project-URL: Homepage, https://github.com/awesome-pro/guardloop
 Project-URL: Documentation, https://github.com/awesome-pro/guardloop#readme
 Project-URL: Repository, https://github.com/awesome-pro/guardloop
 Project-URL: Issues, https://github.com/awesome-pro/guardloop/issues
-Project-URL: Changelog, https://github.com/awesome-pro/guardloop/releases
+Project-URL: Changelog, https://github.com/awesome-pro/guardloop/blob/main/CHANGELOG.md
 Author-email: awesome-pro <147910430+awesome-pro@users.noreply.github.com>
 Maintainer-email: awesome-pro <147910430+awesome-pro@users.noreply.github.com>
 License-Expression: MIT
 License-File: LICENSE
-Keywords: agentic-ai,ai-agents,ai-safety,anthropic,circuit-breaker,llm,mlops,openai,opentelemetry,runtime-guardrails
+Keywords: agentic-ai,ai-agents,ai-safety,anthropic,circuit-breaker,llm,mlops,openai,opentelemetry,retry,runtime-guardrails,self-healing,verifier
 Classifier: Development Status :: 3 - Alpha
 Classifier: Framework :: AsyncIO
 Classifier: Intended Audience :: Developers
@@ -42,12 +42,15 @@ Description-Content-Type: text/markdown
 GuardLoop is a production runtime guardrail for AI agents. It wraps model
 clients and tools with hard budget caps, timeout control, tool-call limits, and
-per-tool circuit breakers, with OpenTelemetry traces for every protected call.
-Runaway agent loops can be stopped before they burn through money, and flaky
-tools can be cut off before an agent retries them into a bigger incident.
+per-tool circuit breakers, re-runs an agent against verifiers until the output
+passes, and emits OpenTelemetry traces for every protected call. Runaway agent
+loops can be stopped before they burn through money, flaky tools can be cut off
+before an agent retries them into a bigger incident, and confidently-wrong
+answers get a second pass.
-The v0.2 focus is intentionally sharp: **runtime guardrails for async Python
-agents** using direct OpenAI and Anthropic wrappers plus protected tool calls.
+The v0.3 focus is intentionally sharp: **runtime guardrails for async Python
+agents** — direct OpenAI and Anthropic wrappers, protected tool calls, per-tool
+circuit breakers, and a verify-fix-retry loop.
 ```python
 from guardloop import (
@@ -56,6 +59,8 @@ from guardloop import (
     CircuitBreakerConfig,
     CircuitBreakerPolicy,
     RunContext,
+    VerifierConfig,
+    is_json_object,
 )
 runtime = GuardLoop(
@@ -71,13 +76,18 @@ runtime = GuardLoop(
             recovery_timeout_seconds=30,
         )
     ),
+    verifiers=[is_json_object(required_keys=["answer"])],
+    verifier_config=VerifierConfig(max_retries=2),
 )
 async def agent(ctx: RunContext, prompt: str) -> str:
+    instructions = prompt
+    if ctx.retry_feedback:
+        instructions += "\n\nFix the previous attempt: " + "; ".join(ctx.retry_feedback)
     response = await ctx.openai.responses.create(
         model="gpt-5.2",
-        input=prompt,
+        input=instructions,
         max_output_tokens=300,
     )
     return str(response.output_text)
@@ -98,16 +108,60 @@ flowchart LR
     U["User code"] --> R["GuardLoop"]
     R --> B["BudgetController"]
     R --> CB["CircuitBreakerRegistry"]
+    R --> V["VerifierChain"]
     R --> T["OpenTelemetry spans"]
     R --> C["RunContext"]
     C --> O["Wrapped OpenAI client"]
     C --> A["Wrapped Anthropic client"]
     C --> W["Wrapped tools"]
+    V -. "feedback on retry" .-> C
 ```
+## Verifier Retry Loop
+Agents can return confidently wrong answers. Attach verifiers — plain callables,
+sync or async — and GuardLoop runs them after the agent finishes. On rejection
+it feeds the verifier's feedback into `ctx.retry_feedback` and re-invokes the
+agent, up to `VerifierConfig.max_retries` times. Every attempt shares the same
+budget and the run's timeout, so the retry loop can never spend past a cap.
+```python
+from guardloop import GuardLoop, RunContext, VerifierConfig, VerifierContext, VerifierResult
+def no_todo(output: object, ctx: VerifierContext) -> VerifierResult:
+    if "TODO" in str(output):
+        return VerifierResult(passed=False, feedback="Replace the TODO placeholder.")
+    return VerifierResult(passed=True)
+runtime = GuardLoop(verifiers=[no_todo], verifier_config=VerifierConfig(max_retries=2))
+async def agent(ctx: RunContext, task: str) -> str:
+    # On a retry, ctx.retry_feedback holds the verifier's complaints — read it.
+    ...
+result = await runtime.run(agent, "draft the release notes")
+print(result.verification_passed, result.verification_attempts, result.verification_feedback)
+```
+Built-in rule-based verifiers ship in `guardloop`: `non_empty()`,
+`matches_regex(...)`, `is_json_object(required_keys=...)`. By default an output
+that fails every retry comes back as `success=False` with
+`terminated_reason="verification_failed"` but with `output` still populated;
+set `VerifierConfig(raise_on_failure=True)` for a hard stop.
+## Project Guide
+For a deeper walkthrough of what has been implemented, how the code is
+organized, and what the next roadmap goals are, read
+[docs/project-overview.md](docs/project-overview.md).
 ## Install
-After the first PyPI release is published:
+Install from PyPI:
 ```bash
 pip install guardloop
@@ -147,6 +201,15 @@ uv run python examples/tool_circuit_breaker.py
 This demo uses a failing fake tool. GuardLoop allows the first failures,
 opens the circuit breaker, then rejects the next call without invoking the tool.
+```bash
+uv run python examples/verifier_retry_loop.py
+```
+This demo's agent first returns a bad answer (a `TODO` placeholder, then
+malformed JSON). A verifier chain rejects it with feedback, the agent reads
+`ctx.retry_feedback` and self-corrects, and the run ends with
+`verification_passed: true` after three attempts.
 ## Live Provider Smoke Tests
 ```bash
@@ -169,20 +232,27 @@ uv run ruff format --check .
 uv run pyright
 ```
-## v0.2 Scope
+## v0.3 Scope
 - Async Python runtime with `src/` package layout.
 - Hard caps for cost, tokens, time, and tool calls.
-- Per-tool circuit breakers with closed, open, and half-open states.
-- Global default breaker policy plus per-tool overrides.
-- Direct wrappers for `AsyncOpenAI.responses.create`.
-- Direct wrappers for `AsyncAnthropic.messages.create`.
-- OpenTelemetry spans for agent runs, LLM calls, and tools.
+- Per-tool circuit breakers with closed, open, and half-open states; global
+  default breaker policy plus per-tool overrides.
+- Verify-fix-retry loop: sync or async output verifiers, fail-fast chains,
+  built-in rule-based verifiers, feedback into `ctx.retry_feedback`, and an
+  opt-in strict mode — all attempts share one budget and the run timeout.
+- Direct wrappers for `AsyncOpenAI.responses.create` and
+  `AsyncAnthropic.messages.create`.
+- OpenTelemetry spans for agent runs, LLM calls, tools, and verifiers.
 - Fake-client tests and demos that do not require API keys.
 ## Roadmap
-- v0.2: per-tool circuit breakers.
-- v0.3: verifier/self-healing retry loop.
+- v0.2: per-tool circuit breakers. ✅
+- v0.3: verify-fix-retry loop. ✅
 - v0.4: LangGraph and OpenAI Agents SDK adapters.
-- v0.5: Jaeger/Phoenix trace screenshots, blog post, and GitHub release.
+- v0.5: Jaeger/Phoenix trace screenshots, demo video, and blog post.
+- v0.6: persistent breaker state, YAML/TOML policy, multi-model pricing, loop detection.
+- v1.0: stable API, changelog, docs site, release checklist.
+See [docs/roadmap.md](docs/roadmap.md) for details.

guardloop-0.3.0/README.md ADDED Viewed

@@ -0,0 +1,218 @@
+# GuardLoop
+GuardLoop is a production runtime guardrail for AI agents. It wraps model
+clients and tools with hard budget caps, timeout control, tool-call limits, and
+per-tool circuit breakers, re-runs an agent against verifiers until the output
+passes, and emits OpenTelemetry traces for every protected call. Runaway agent
+loops can be stopped before they burn through money, flaky tools can be cut off
+before an agent retries them into a bigger incident, and confidently-wrong
+answers get a second pass.
+The v0.3 focus is intentionally sharp: **runtime guardrails for async Python
+agents** — direct OpenAI and Anthropic wrappers, protected tool calls, per-tool
+circuit breakers, and a verify-fix-retry loop.
+```python
+from guardloop import (
+    GuardLoop,
+    BudgetConfig,
+    CircuitBreakerConfig,
+    CircuitBreakerPolicy,
+    RunContext,
+    VerifierConfig,
+    is_json_object,
+)
+runtime = GuardLoop(
+    budget=BudgetConfig(
+        cost_limit_usd="0.10",
+        token_limit=10_000,
+        time_limit_seconds=60,
+        tool_call_limit=20,
+    ),
+    circuit_breakers=CircuitBreakerConfig(
+        default=CircuitBreakerPolicy(
+            failure_threshold=3,
+            recovery_timeout_seconds=30,
+        )
+    ),
+    verifiers=[is_json_object(required_keys=["answer"])],
+    verifier_config=VerifierConfig(max_retries=2),
+)
+async def agent(ctx: RunContext, prompt: str) -> str:
+    instructions = prompt
+    if ctx.retry_feedback:
+        instructions += "\n\nFix the previous attempt: " + "; ".join(ctx.retry_feedback)
+    response = await ctx.openai.responses.create(
+        model="gpt-5.2",
+        input=instructions,
+        max_output_tokens=300,
+    )
+    return str(response.output_text)
+result = await runtime.run(agent, "research agent runtime safety")
+print(result.model_dump_json(indent=2))
+```
+## Why This Exists
+Agents are loops around probabilistic systems. When they go wrong, they can call
+the same model or tool repeatedly, spend unexpected money, and fail without a
+clear trace. GuardLoop puts an explicit execution layer around that loop:
+```mermaid
+flowchart LR
+    U["User code"] --> R["GuardLoop"]
+    R --> B["BudgetController"]
+    R --> CB["CircuitBreakerRegistry"]
+    R --> V["VerifierChain"]
+    R --> T["OpenTelemetry spans"]
+    R --> C["RunContext"]
+    C --> O["Wrapped OpenAI client"]
+    C --> A["Wrapped Anthropic client"]
+    C --> W["Wrapped tools"]
+    V -. "feedback on retry" .-> C
+```
+## Verifier Retry Loop
+Agents can return confidently wrong answers. Attach verifiers — plain callables,
+sync or async — and GuardLoop runs them after the agent finishes. On rejection
+it feeds the verifier's feedback into `ctx.retry_feedback` and re-invokes the
+agent, up to `VerifierConfig.max_retries` times. Every attempt shares the same
+budget and the run's timeout, so the retry loop can never spend past a cap.
+```python
+from guardloop import GuardLoop, RunContext, VerifierConfig, VerifierContext, VerifierResult
+def no_todo(output: object, ctx: VerifierContext) -> VerifierResult:
+    if "TODO" in str(output):
+        return VerifierResult(passed=False, feedback="Replace the TODO placeholder.")
+    return VerifierResult(passed=True)
+runtime = GuardLoop(verifiers=[no_todo], verifier_config=VerifierConfig(max_retries=2))
+async def agent(ctx: RunContext, task: str) -> str:
+    # On a retry, ctx.retry_feedback holds the verifier's complaints — read it.
+    ...
+result = await runtime.run(agent, "draft the release notes")
+print(result.verification_passed, result.verification_attempts, result.verification_feedback)
+```
+Built-in rule-based verifiers ship in `guardloop`: `non_empty()`,
+`matches_regex(...)`, `is_json_object(required_keys=...)`. By default an output
+that fails every retry comes back as `success=False` with
+`terminated_reason="verification_failed"` but with `output` still populated;
+set `VerifierConfig(raise_on_failure=True)` for a hard stop.
+## Project Guide
+For a deeper walkthrough of what has been implemented, how the code is
+organized, and what the next roadmap goals are, read
+[docs/project-overview.md](docs/project-overview.md).
+## Install
+Install from PyPI:
+```bash
+pip install guardloop
+```
+For local development:
+```bash
+uv sync
+```
+Optional OpenTelemetry exporters are available through the `otel` extra:
+```bash
+pip install "guardloop[otel]"
+```
+For local development with the extra:
+```bash
+uv sync --extra otel
+```
+## Try the No-Key Demo
+```bash
+uv run python examples/runaway_cost_prevention.py
+```
+The demo uses a fake OpenAI-compatible client and intentionally loops forever.
+GuardLoop stops it when the next model request would exceed the cost cap.
+```bash
+uv run python examples/tool_circuit_breaker.py
+```
+This demo uses a failing fake tool. GuardLoop allows the first failures,
+opens the circuit breaker, then rejects the next call without invoking the tool.
+```bash
+uv run python examples/verifier_retry_loop.py
+```
+This demo's agent first returns a bad answer (a `TODO` placeholder, then
+malformed JSON). A verifier chain rejects it with feedback, the agent reads
+`ctx.retry_feedback` and self-corrects, and the run ends with
+`verification_passed: true` after three attempts.
+## Live Provider Smoke Tests
+```bash
+export OPENAI_API_KEY="..."
+export ANTHROPIC_API_KEY="..."
+uv run python examples/live_openai_basic.py
+uv run python examples/live_anthropic_basic.py
+```
+Both live examples can be customized with `OPENAI_MODEL` or `ANTHROPIC_MODEL`.
+## Quality Gates
+```bash
+uv run pytest
+uv run pytest --cov=guardloop
+uv run ruff check .
+uv run ruff format --check .
+uv run pyright
+```
+## v0.3 Scope
+- Async Python runtime with `src/` package layout.
+- Hard caps for cost, tokens, time, and tool calls.
+- Per-tool circuit breakers with closed, open, and half-open states; global
+  default breaker policy plus per-tool overrides.
+- Verify-fix-retry loop: sync or async output verifiers, fail-fast chains,
+  built-in rule-based verifiers, feedback into `ctx.retry_feedback`, and an
+  opt-in strict mode — all attempts share one budget and the run timeout.
+- Direct wrappers for `AsyncOpenAI.responses.create` and
+  `AsyncAnthropic.messages.create`.
+- OpenTelemetry spans for agent runs, LLM calls, tools, and verifiers.
+- Fake-client tests and demos that do not require API keys.
+## Roadmap
+- v0.2: per-tool circuit breakers. ✅
+- v0.3: verify-fix-retry loop. ✅
+- v0.4: LangGraph and OpenAI Agents SDK adapters.
+- v0.5: Jaeger/Phoenix trace screenshots, demo video, and blog post.
+- v0.6: persistent breaker state, YAML/TOML policy, multi-model pricing, loop detection.
+- v1.0: stable API, changelog, docs site, release checklist.
+See [docs/roadmap.md](docs/roadmap.md) for details.

{guardloop-0.2.0 → guardloop-0.3.0}/docs/design.md RENAMED Viewed

@@ -1,4 +1,4 @@
-# GuardLoop v0.2 Design
+# GuardLoop Design
 GuardLoop is a wrapper, not an agent framework. A user passes an async agent
 callable to `runtime.run()`. The runtime creates a `RunContext` containing
@@ -40,9 +40,38 @@ rejections do not count as tool failures.
 Built-in prices are defaults, not truth forever. Callers can pass
 `ModelPricing` entries to override or add models as providers update pricing.
+## Verifier Retry Loop
+Verifiers are stateless callables (sync or async) that judge an agent's output.
+A `VerifierChain` runs them in order, fail-fast: the first failing verdict wins.
+Anything not a `VerifierResult` is normalized (`True`/`None` -> passed,
+`False` -> failed). If a verifier itself raises, that is a verifier bug, not the
+agent's: the runtime surfaces it as `VerifierExecutionError`
+(`terminated_reason="verifier_error"`) and does not retry.
+The runtime owns the loop, not the agent. One `BudgetController` and one
+`RunContext` flow through every attempt; the only mutation between attempts is
+appending the failing verifier's feedback to `ctx.retry_feedback` (and bumping
+`ctx.attempt`). The agent is re-invoked with the same `*args`/`**kwargs` and is
+expected to read `ctx.retry_feedback` if it wants to self-correct. Because the
+budget is shared and the whole loop sits inside the run's single
+`asyncio.timeout()`, a verifier loop can never spend past a cap or outlive the
+time limit.
+When retries are exhausted: by default the runtime returns
+`RunResult(success=False, terminated_reason="verification_failed",
+verification_passed=False)` with `output` still set to the last attempt — the
+agent produced an answer, it just isn't trusted. With
+`VerifierConfig(raise_on_failure=True)` the runtime instead surfaces a
+`VerificationFailed` (same `terminated_reason`, `output=None`, attempt count and
+feedback in `metadata`).
 ## Telemetry
 Provider wrappers emit OpenTelemetry spans through a small conventions module.
 This keeps GenAI semantic convention names isolated while the standard evolves.
 Tool spans also include circuit breaker state, failure count, and whether a
-call was blocked.
+call was blocked. Each verifier runs in a `verifier_run <name>` child span; the
+root `agent_run` span carries `guardloop.verification.passed` /
+`guardloop.verification.attempts` plus `guardloop.verification.failed`,
+`.retrying`, and `.exhausted` events.

guardloop 0.2.0__tar.gz → 0.3.0__tar.gz

guardloop 0.2.0tar.gz → 0.3.0tar.gz