handoff-guard 0.1.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- handoff_guard-0.1.0/.claude/skills/update-docs/SKILL.md +141 -0
- handoff_guard-0.1.0/.github/dependabot.yml +11 -0
- handoff_guard-0.1.0/.github/workflows/ci.yml +54 -0
- handoff_guard-0.1.0/.github/workflows/publish.yml +24 -0
- handoff_guard-0.1.0/.gitignore +10 -0
- handoff_guard-0.1.0/.pre-commit-config.yaml +7 -0
- handoff_guard-0.1.0/AGENTS.md +177 -0
- handoff_guard-0.1.0/LICENSE +21 -0
- handoff_guard-0.1.0/PKG-INFO +233 -0
- handoff_guard-0.1.0/README.md +203 -0
- handoff_guard-0.1.0/examples/llm_demo/README.md +60 -0
- handoff_guard-0.1.0/examples/llm_demo/__init__.py +1 -0
- handoff_guard-0.1.0/examples/llm_demo/agents.py +189 -0
- handoff_guard-0.1.0/examples/llm_demo/run_demo.py +192 -0
- handoff_guard-0.1.0/examples/llm_demo/schemas.py +26 -0
- handoff_guard-0.1.0/examples/rag_demo/README.md +73 -0
- handoff_guard-0.1.0/examples/rag_demo/__init__.py +1 -0
- handoff_guard-0.1.0/examples/rag_demo/pipeline.py +178 -0
- handoff_guard-0.1.0/examples/rag_demo/run_demo.py +162 -0
- handoff_guard-0.1.0/examples/rag_demo/schemas.py +71 -0
- handoff_guard-0.1.0/pyproject.toml +38 -0
- handoff_guard-0.1.0/src/handoff/__init__.py +19 -0
- handoff_guard-0.1.0/src/handoff/core.py +95 -0
- handoff_guard-0.1.0/src/handoff/guard.py +469 -0
- handoff_guard-0.1.0/src/handoff/langgraph.py +76 -0
- handoff_guard-0.1.0/src/handoff/retry.py +145 -0
- handoff_guard-0.1.0/src/handoff/testing.py +39 -0
- handoff_guard-0.1.0/src/handoff/utils.py +49 -0
- handoff_guard-0.1.0/tests/test_guard.py +236 -0
- handoff_guard-0.1.0/tests/test_retry.py +344 -0
- handoff_guard-0.1.0/tests/test_testing.py +23 -0
- handoff_guard-0.1.0/tests/test_utils.py +30 -0
|
@@ -0,0 +1,141 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: update-docs
|
|
3
|
+
description: Update project documentation (README.md, AGENTS.md, example READMEs) to reflect current codebase state.
|
|
4
|
+
user-invocable: true
|
|
5
|
+
allowed-tools: Read, Glob, Grep, Edit, Bash
|
|
6
|
+
---
|
|
7
|
+
|
|
8
|
+
# Update Documentation
|
|
9
|
+
|
|
10
|
+
This skill updates the project's documentation files to reflect the current state of the codebase.
|
|
11
|
+
|
|
12
|
+
## Files Updated
|
|
13
|
+
|
|
14
|
+
1. **`README.md`** (project root) — Public-facing README with API docs and examples
|
|
15
|
+
2. **`AGENTS.md`** (project root) — Guide for AI coding agents
|
|
16
|
+
3. **`examples/llm_demo/README.md`** — LLM demo documentation
|
|
17
|
+
4. **`examples/rag_demo/README.md`** — RAG demo documentation
|
|
18
|
+
|
|
19
|
+
## Process
|
|
20
|
+
|
|
21
|
+
### Step 1: Scan for Changes
|
|
22
|
+
|
|
23
|
+
1. **Library source:**
|
|
24
|
+
```
|
|
25
|
+
Glob: src/handoff/**/*.py
|
|
26
|
+
```
|
|
27
|
+
Check for new/removed files, public API changes, new exports in `__init__.py`.
|
|
28
|
+
|
|
29
|
+
2. **Public API:**
|
|
30
|
+
```
|
|
31
|
+
Read: src/handoff/__init__.py
|
|
32
|
+
```
|
|
33
|
+
Verify `__all__` matches what's documented in README and AGENTS.md.
|
|
34
|
+
|
|
35
|
+
3. **Guard decorator:**
|
|
36
|
+
```
|
|
37
|
+
Read: src/handoff/guard.py
|
|
38
|
+
```
|
|
39
|
+
Check for new parameters, changed defaults, new on_fail modes.
|
|
40
|
+
|
|
41
|
+
4. **Retry system:**
|
|
42
|
+
```
|
|
43
|
+
Read: src/handoff/retry.py
|
|
44
|
+
```
|
|
45
|
+
Check for new RetryState properties, new Diagnostic fields, proxy changes.
|
|
46
|
+
|
|
47
|
+
5. **Utilities:**
|
|
48
|
+
```
|
|
49
|
+
Read: src/handoff/utils.py
|
|
50
|
+
```
|
|
51
|
+
Check for new utility functions or changed behavior.
|
|
52
|
+
|
|
53
|
+
6. **Examples:**
|
|
54
|
+
```
|
|
55
|
+
Glob: examples/**/*.py
|
|
56
|
+
```
|
|
57
|
+
Check for new demos, changed CLI flags, new agent patterns.
|
|
58
|
+
|
|
59
|
+
7. **Tests:**
|
|
60
|
+
```
|
|
61
|
+
Glob: tests/test_*.py
|
|
62
|
+
```
|
|
63
|
+
Check for new test files or significantly expanded coverage areas.
|
|
64
|
+
|
|
65
|
+
8. **Package config:**
|
|
66
|
+
```
|
|
67
|
+
Read: pyproject.toml
|
|
68
|
+
```
|
|
69
|
+
Check version, dependencies, optional extras.
|
|
70
|
+
|
|
71
|
+
### Step 2: Update README.md
|
|
72
|
+
|
|
73
|
+
- **Hero example**: Verify it uses current API correctly (`guard`, `retry`, `parse_json`)
|
|
74
|
+
- **Quick Start**: Verify install command and demo commands work
|
|
75
|
+
- **Features list**: Add any new features, remove deprecated ones
|
|
76
|
+
- **API section**: Sync `@guard` params, `retry` proxy properties, `parse_json` behavior, `HandoffViolation` attributes with actual source
|
|
77
|
+
- **Handle Failures**: Verify on_fail modes match implementation
|
|
78
|
+
- **Examples table**: Verify both demo links and descriptions are accurate
|
|
79
|
+
- **LangGraph section**: Verify `guarded_node` API is current
|
|
80
|
+
- **Comparison table**: Update if positioning has changed
|
|
81
|
+
- **Roadmap**: Check off completed items, add new planned items
|
|
82
|
+
|
|
83
|
+
### Step 3: Update AGENTS.md
|
|
84
|
+
|
|
85
|
+
- **Repository Structure**: Match current file tree exactly — no deleted files, no missing new files
|
|
86
|
+
- **Public API**: Sync imports with `__init__.py` `__all__`
|
|
87
|
+
- **@guard decorator**: Sync all parameters and their defaults with `guard.py`
|
|
88
|
+
- **retry proxy**: Sync all properties with `_RetryProxy` class in `retry.py`
|
|
89
|
+
- **HandoffViolation**: Sync attributes with `core.py`
|
|
90
|
+
- **Architecture Decisions**: Add any new decisions, remove outdated ones
|
|
91
|
+
- **Development Commands**: Verify all commands work, especially demo commands with current CLI flags
|
|
92
|
+
- **Testing section**: List all test files with accurate descriptions of what they cover
|
|
93
|
+
|
|
94
|
+
### Step 4: Update Example READMEs
|
|
95
|
+
|
|
96
|
+
For each example README (`examples/llm_demo/README.md`, `examples/rag_demo/README.md`):
|
|
97
|
+
|
|
98
|
+
- **Quick Start commands**: Verify all CLI flags match `argparse` in `run_demo.py`
|
|
99
|
+
- **Pipeline diagram**: Verify stages match actual agent/function names
|
|
100
|
+
- **Key Patterns code**: Verify code snippet matches actual implementation
|
|
101
|
+
- **Schemas table**: Verify schema names and validation rules match `schemas.py`
|
|
102
|
+
- **Requirements**: Verify install commands and dependencies
|
|
103
|
+
|
|
104
|
+
### Step 5: Report Summary
|
|
105
|
+
|
|
106
|
+
Output a summary of changes:
|
|
107
|
+
```
|
|
108
|
+
## Documentation Update Summary
|
|
109
|
+
|
|
110
|
+
### README.md
|
|
111
|
+
- Updated: API section (new guard parameter X)
|
|
112
|
+
- Updated: Features list (added Y)
|
|
113
|
+
|
|
114
|
+
### AGENTS.md
|
|
115
|
+
- Updated: Repository Structure (new file Z)
|
|
116
|
+
- Updated: Testing section (new test file)
|
|
117
|
+
|
|
118
|
+
### examples/llm_demo/README.md
|
|
119
|
+
- No changes needed
|
|
120
|
+
|
|
121
|
+
### examples/rag_demo/README.md
|
|
122
|
+
- Updated: Schemas table (new field in RAGOutput)
|
|
123
|
+
```
|
|
124
|
+
|
|
125
|
+
## Important Notes
|
|
126
|
+
|
|
127
|
+
- Keep README.md concise and user-facing — it's for people evaluating the library
|
|
128
|
+
- Keep AGENTS.md comprehensive — it's for AI agents working on the code
|
|
129
|
+
- Don't remove documentation for features that still exist
|
|
130
|
+
- Verify code examples in README actually work (correct imports, correct API)
|
|
131
|
+
- Keep the roadmap in README.md up to date — check off shipped features
|
|
132
|
+
- Example READMEs should match the actual CLI interface exactly
|
|
133
|
+
|
|
134
|
+
## When to Run
|
|
135
|
+
|
|
136
|
+
- After adding new public API (new exports, new guard parameters, new retry properties)
|
|
137
|
+
- After adding or removing source files
|
|
138
|
+
- After changing CLI flags in demo runners
|
|
139
|
+
- After adding new test files
|
|
140
|
+
- After changing package version or dependencies
|
|
141
|
+
- Before releasing a new version
|
|
@@ -0,0 +1,11 @@
|
|
|
1
|
+
# To get started with Dependabot version updates, you'll need to specify which
|
|
2
|
+
# package ecosystems to update and where the package manifests are located.
|
|
3
|
+
# Please see the documentation for all configuration options:
|
|
4
|
+
# https://docs.github.com/code-security/dependabot/dependabot-version-updates/configuration-options-for-the-dependabot.yml-file
|
|
5
|
+
|
|
6
|
+
version: 2
|
|
7
|
+
updates:
|
|
8
|
+
- package-ecosystem: "pip"
|
|
9
|
+
directory: "/" # Location of package manifests
|
|
10
|
+
schedule:
|
|
11
|
+
interval: "weekly"
|
|
@@ -0,0 +1,54 @@
|
|
|
1
|
+
name: CI
|
|
2
|
+
|
|
3
|
+
on:
|
|
4
|
+
push:
|
|
5
|
+
branches: [main]
|
|
6
|
+
pull_request:
|
|
7
|
+
branches: [main]
|
|
8
|
+
|
|
9
|
+
permissions:
|
|
10
|
+
contents: read
|
|
11
|
+
|
|
12
|
+
jobs:
|
|
13
|
+
test:
|
|
14
|
+
runs-on: ubuntu-latest
|
|
15
|
+
strategy:
|
|
16
|
+
matrix:
|
|
17
|
+
python-version: ["3.10", "3.11", "3.12"]
|
|
18
|
+
|
|
19
|
+
steps:
|
|
20
|
+
- uses: actions/checkout@v4
|
|
21
|
+
|
|
22
|
+
- name: Set up Python ${{ matrix.python-version }}
|
|
23
|
+
uses: actions/setup-python@v5
|
|
24
|
+
with:
|
|
25
|
+
python-version: ${{ matrix.python-version }}
|
|
26
|
+
|
|
27
|
+
- name: Install dependencies
|
|
28
|
+
run: |
|
|
29
|
+
python -m pip install --upgrade pip
|
|
30
|
+
pip install -e ".[dev]"
|
|
31
|
+
|
|
32
|
+
- name: Run tests
|
|
33
|
+
run: pytest tests/ -v
|
|
34
|
+
|
|
35
|
+
- name: Run demo (no LLM)
|
|
36
|
+
run: python -m examples.rag_demo.run_demo
|
|
37
|
+
|
|
38
|
+
lint:
|
|
39
|
+
runs-on: ubuntu-latest
|
|
40
|
+
steps:
|
|
41
|
+
- uses: actions/checkout@v4
|
|
42
|
+
|
|
43
|
+
- name: Set up Python
|
|
44
|
+
uses: actions/setup-python@v5
|
|
45
|
+
with:
|
|
46
|
+
python-version: "3.11"
|
|
47
|
+
|
|
48
|
+
- name: Install dependencies
|
|
49
|
+
run: |
|
|
50
|
+
python -m pip install --upgrade pip
|
|
51
|
+
pip install ruff
|
|
52
|
+
|
|
53
|
+
- name: Lint with ruff
|
|
54
|
+
run: ruff check src/ tests/ examples/
|
|
@@ -0,0 +1,24 @@
|
|
|
1
|
+
name: Publish to PyPI
|
|
2
|
+
|
|
3
|
+
on:
|
|
4
|
+
release:
|
|
5
|
+
types: [published]
|
|
6
|
+
|
|
7
|
+
jobs:
|
|
8
|
+
publish:
|
|
9
|
+
runs-on: ubuntu-latest
|
|
10
|
+
environment: pypi
|
|
11
|
+
permissions:
|
|
12
|
+
id-token: write # For trusted publishing
|
|
13
|
+
|
|
14
|
+
steps:
|
|
15
|
+
- uses: actions/checkout@v4
|
|
16
|
+
|
|
17
|
+
- name: Set up uv
|
|
18
|
+
uses: astral-sh/setup-uv@v5
|
|
19
|
+
|
|
20
|
+
- name: Build package
|
|
21
|
+
run: uv build
|
|
22
|
+
|
|
23
|
+
- name: Publish to PyPI
|
|
24
|
+
run: uv publish
|
|
@@ -0,0 +1,177 @@
|
|
|
1
|
+
# AGENTS.md
|
|
2
|
+
|
|
3
|
+
Guide for AI coding agents working on this codebase.
|
|
4
|
+
|
|
5
|
+
## Project Overview
|
|
6
|
+
|
|
7
|
+
**handoff-guard** is a lightweight Python library that validates data at agent/pipeline boundaries using Pydantic schemas. It wraps functions with a `@guard` decorator that validates output, retries with feedback on failure, and raises rich `HandoffViolation` exceptions identifying the exact node, field, and suggested fix.
|
|
8
|
+
|
|
9
|
+
- **Package name (PyPI):** `handoff-guard`
|
|
10
|
+
- **Import name:** `handoff`
|
|
11
|
+
- **Python:** >= 3.10
|
|
12
|
+
- **Core dependency:** Pydantic >= 2.0
|
|
13
|
+
|
|
14
|
+
## Repository Structure
|
|
15
|
+
|
|
16
|
+
```
|
|
17
|
+
src/handoff/ # Library source code
|
|
18
|
+
__init__.py # Public API exports
|
|
19
|
+
core.py # ViolationContext dataclass, HandoffViolation exception
|
|
20
|
+
guard.py # @guard decorator, retry loop, validation logic
|
|
21
|
+
retry.py # RetryState, Diagnostic, AttemptRecord, _RetryProxy (retry singleton)
|
|
22
|
+
utils.py # parse_json(), ParseError
|
|
23
|
+
testing.py # mock_retry() context manager for tests
|
|
24
|
+
langgraph.py # LangGraph adapter: guarded_node, validate_state
|
|
25
|
+
|
|
26
|
+
tests/
|
|
27
|
+
test_guard.py # Guard decorator tests (sync, async, on_fail modes)
|
|
28
|
+
test_retry.py # Retry loop, proxy, history, parse error retry tests
|
|
29
|
+
test_testing.py # mock_retry tests
|
|
30
|
+
test_utils.py # parse_json tests
|
|
31
|
+
|
|
32
|
+
examples/
|
|
33
|
+
llm_demo/ # Multi-agent pipeline (Planner -> Researcher -> Writer)
|
|
34
|
+
__init__.py
|
|
35
|
+
schemas.py # PlannerOutput, ResearcherOutput, WriterOutput
|
|
36
|
+
agents.py # Guarded agents with retry, module-level _mock_responses
|
|
37
|
+
run_demo.py # Entry point: python -m examples.llm_demo.run_demo
|
|
38
|
+
README.md
|
|
39
|
+
rag_demo/ # RAG pipeline (Parser -> Retriever -> Reranker -> Generator)
|
|
40
|
+
__init__.py
|
|
41
|
+
schemas.py # ParsedQuery, RetrievedDocs, RankedDocs, RAGOutput, etc.
|
|
42
|
+
pipeline.py # Guarded pipeline stages with retry on generator
|
|
43
|
+
run_demo.py # Entry point: python -m examples.rag_demo.run_demo
|
|
44
|
+
README.md
|
|
45
|
+
|
|
46
|
+
.github/workflows/
|
|
47
|
+
ci.yml # Tests on Python 3.10/3.11/3.12 + ruff lint
|
|
48
|
+
publish.yml # Auto-publish to PyPI on GitHub release
|
|
49
|
+
```
|
|
50
|
+
|
|
51
|
+
## Key Concepts
|
|
52
|
+
|
|
53
|
+
### Public API
|
|
54
|
+
|
|
55
|
+
```python
|
|
56
|
+
from handoff import guard, GuardConfig, HandoffViolation, ViolationContext
|
|
57
|
+
from handoff import retry, RetryState, Diagnostic, AttemptRecord
|
|
58
|
+
from handoff import parse_json, ParseError
|
|
59
|
+
from handoff.langgraph import guarded_node, validate_state
|
|
60
|
+
```
|
|
61
|
+
|
|
62
|
+
### `@guard` decorator
|
|
63
|
+
|
|
64
|
+
```python
|
|
65
|
+
@guard(
|
|
66
|
+
input=Schema, # Pydantic model for input validation
|
|
67
|
+
output=Schema, # Pydantic model for output validation
|
|
68
|
+
node_name="...", # Identifies the node in errors (default: function name)
|
|
69
|
+
max_attempts=3, # Retry up to N times (default: 1, no retry)
|
|
70
|
+
retry_on=("validation", "parse"), # Error types that trigger retry
|
|
71
|
+
on_fail="raise", # "raise" | "return_none" | "return_input" | callable
|
|
72
|
+
)
|
|
73
|
+
```
|
|
74
|
+
|
|
75
|
+
- Input validation happens once, outside the retry loop
|
|
76
|
+
- Output validation happens inside the retry loop
|
|
77
|
+
- Parse errors (`ParseError`, `json.JSONDecodeError`, `KeyError`, `TypeError`) are retried when `"parse"` is in `retry_on`
|
|
78
|
+
- If the function has a `retry` parameter, `RetryState` is auto-injected
|
|
79
|
+
|
|
80
|
+
### `retry` proxy
|
|
81
|
+
|
|
82
|
+
Module-level singleton that reads from a ContextVar, safe to use anywhere:
|
|
83
|
+
|
|
84
|
+
```python
|
|
85
|
+
from handoff import retry
|
|
86
|
+
|
|
87
|
+
retry.is_retry # True if attempt > 1
|
|
88
|
+
retry.attempt # Current attempt number (1-based)
|
|
89
|
+
retry.max_attempts # Total allowed attempts
|
|
90
|
+
retry.remaining # Attempts left
|
|
91
|
+
retry.is_final_attempt # True if no more retries
|
|
92
|
+
retry.feedback() # Formatted error string from last attempt, or None
|
|
93
|
+
retry.last_error # Diagnostic object, or None
|
|
94
|
+
retry.history # List of AttemptRecord
|
|
95
|
+
```
|
|
96
|
+
|
|
97
|
+
### `parse_json`
|
|
98
|
+
|
|
99
|
+
Parses JSON from LLM text output, stripping markdown code fences and BOM. Raises `ParseError` (retryable by `@guard`) on failure.
|
|
100
|
+
|
|
101
|
+
### `HandoffViolation`
|
|
102
|
+
|
|
103
|
+
Exception with:
|
|
104
|
+
- `.context` — `ViolationContext` (node_name, contract_type, field_path, expected, received, suggestion)
|
|
105
|
+
- `.history` — `list[AttemptRecord]` (empty if no retries)
|
|
106
|
+
- `.node_name`, `.field_path` — shortcuts
|
|
107
|
+
- `.total_attempts` — `len(history)` or 1
|
|
108
|
+
- `.to_dict()` — serializable for logging
|
|
109
|
+
|
|
110
|
+
### on_fail modes
|
|
111
|
+
|
|
112
|
+
- `"raise"` — Raise HandoffViolation (default)
|
|
113
|
+
- `"return_none"` — Return None
|
|
114
|
+
- `"return_input"` — Return the original input unchanged
|
|
115
|
+
- `callable` — Call with the HandoffViolation, return its result
|
|
116
|
+
|
|
117
|
+
### Suggestion generation
|
|
118
|
+
|
|
119
|
+
Auto-generated in `guard.py:_generate_suggestion` based on Pydantic error types: `missing`, `string_type`, `int_type`, `string_too_short`, `string_too_long`, `too_short`, `too_long`, `greater_than_equal`, `less_than_equal`, `string_pattern_mismatch`. Add new error types there.
|
|
120
|
+
|
|
121
|
+
## Development Commands
|
|
122
|
+
|
|
123
|
+
```bash
|
|
124
|
+
# Install in editable mode with dev dependencies
|
|
125
|
+
pip install -e ".[dev]"
|
|
126
|
+
|
|
127
|
+
# Run tests
|
|
128
|
+
pytest tests/ -v
|
|
129
|
+
|
|
130
|
+
# Run LLM demo (no API key needed)
|
|
131
|
+
python -m examples.llm_demo.run_demo # retry demo (default)
|
|
132
|
+
python -m examples.llm_demo.run_demo --failure-demo # exhausted retries
|
|
133
|
+
python -m examples.llm_demo.run_demo --pipeline # mock pipeline
|
|
134
|
+
|
|
135
|
+
# Run RAG demo (no API key needed)
|
|
136
|
+
python -m examples.rag_demo.run_demo # pipeline + hallucination check
|
|
137
|
+
|
|
138
|
+
# Run with real LLM calls (needs OPENROUTER_API_KEY)
|
|
139
|
+
python -m examples.llm_demo.run_demo --pipeline --api
|
|
140
|
+
python -m examples.rag_demo.run_demo --api
|
|
141
|
+
|
|
142
|
+
# Lint
|
|
143
|
+
ruff check src/ tests/ examples/
|
|
144
|
+
|
|
145
|
+
# Build package
|
|
146
|
+
python -m build
|
|
147
|
+
```
|
|
148
|
+
|
|
149
|
+
## Architecture Decisions
|
|
150
|
+
|
|
151
|
+
- **Pydantic v2 only** — Uses `model_validate`, `model_dump`, not v1 API
|
|
152
|
+
- **No runtime deps beyond Pydantic** — LangGraph/httpx are optional
|
|
153
|
+
- **Sync and async** — `@guard` detects `async def` and wraps accordingly
|
|
154
|
+
- **First violation wins** — Only the first validation error is raised (not all)
|
|
155
|
+
- **Dict-oriented** — Agents typically pass dicts; the decorator validates dicts against Pydantic models without requiring the function to use models directly
|
|
156
|
+
- **ContextVar for retry state** — The `retry` proxy uses a ContextVar so it's async-safe and doesn't require passing state through function signatures
|
|
157
|
+
- **Input validation outside retry loop** — Input doesn't change between retries, so it's validated once upfront
|
|
158
|
+
- **Parse errors retryable** — `ParseError`, `json.JSONDecodeError`, `KeyError`, `TypeError` are caught and retried when `max_attempts > 1`
|
|
159
|
+
|
|
160
|
+
## Adding New Features
|
|
161
|
+
|
|
162
|
+
When adding a new error type suggestion, edit `_generate_suggestion` in `src/handoff/guard.py`.
|
|
163
|
+
|
|
164
|
+
When adding a new framework adapter (e.g., CrewAI), create `src/handoff/crewai.py` following the pattern in `langgraph.py` and export from `__init__.py`.
|
|
165
|
+
|
|
166
|
+
When adding a new demo, create `examples/<name>_demo/` with `__init__.py`, `schemas.py`, `run_demo.py` and a `README.md`. Run it with `python -m examples.<name>_demo.run_demo`.
|
|
167
|
+
|
|
168
|
+
## Testing
|
|
169
|
+
|
|
170
|
+
Tests are split across four files:
|
|
171
|
+
|
|
172
|
+
- `test_guard.py` — Guard decorator: valid passthrough, invalid input/output raises, on_fail modes, custom node_name, input/output-only, async support, violation context and serialization
|
|
173
|
+
- `test_retry.py` — Retry loop: succeeds on later attempt, exhausts max_attempts, RetryState injection, proxy behavior, feedback text, violation history, parse error retry, retry_on filtering, on_fail after retry, input validation skips retry, async retry
|
|
174
|
+
- `test_testing.py` — `mock_retry()` context manager sets context and proxy works
|
|
175
|
+
- `test_utils.py` — `parse_json`: valid JSON, code fence stripping, invalid raises ParseError, non-string raises, BOM stripping
|
|
176
|
+
|
|
177
|
+
Run with: `pytest tests/ -v`
|
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2025 Arnold
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
|
@@ -0,0 +1,233 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: handoff-guard
|
|
3
|
+
Version: 0.1.0
|
|
4
|
+
Summary: Lightweight validation at agent boundaries. Know what broke and where.
|
|
5
|
+
Project-URL: Homepage, https://github.com/acartag7/handoff-guard
|
|
6
|
+
Project-URL: Repository, https://github.com/acartag7/handoff-guard
|
|
7
|
+
Author-email: Arnold <cartagena.arnold@gmail.com>
|
|
8
|
+
License: MIT
|
|
9
|
+
License-File: LICENSE
|
|
10
|
+
Keywords: agents,handoff,langgraph,multi-agent,validation
|
|
11
|
+
Classifier: Development Status :: 3 - Alpha
|
|
12
|
+
Classifier: Intended Audience :: Developers
|
|
13
|
+
Classifier: License :: OSI Approved :: MIT License
|
|
14
|
+
Classifier: Programming Language :: Python :: 3.10
|
|
15
|
+
Classifier: Programming Language :: Python :: 3.11
|
|
16
|
+
Classifier: Programming Language :: Python :: 3.12
|
|
17
|
+
Requires-Python: >=3.10
|
|
18
|
+
Requires-Dist: pydantic>=2.0.0
|
|
19
|
+
Provides-Extra: dev
|
|
20
|
+
Requires-Dist: pre-commit>=3.0.0; extra == 'dev'
|
|
21
|
+
Requires-Dist: pytest-asyncio>=0.23.0; extra == 'dev'
|
|
22
|
+
Requires-Dist: pytest>=8.0.0; extra == 'dev'
|
|
23
|
+
Requires-Dist: ruff>=0.1.0; extra == 'dev'
|
|
24
|
+
Provides-Extra: langgraph
|
|
25
|
+
Requires-Dist: langgraph>=0.2.0; extra == 'langgraph'
|
|
26
|
+
Provides-Extra: llm
|
|
27
|
+
Requires-Dist: httpx>=0.25.0; extra == 'llm'
|
|
28
|
+
Requires-Dist: langgraph>=0.2.0; extra == 'llm'
|
|
29
|
+
Description-Content-Type: text/markdown
|
|
30
|
+
|
|
31
|
+
# handoff-guard
|
|
32
|
+
|
|
33
|
+
> Validation for LLM agents that retries with feedback.
|
|
34
|
+
|
|
35
|
+
[](https://badge.fury.io/py/handoff-guard)
|
|
36
|
+
[](https://opensource.org/licenses/MIT)
|
|
37
|
+
|
|
38
|
+
## The Problem
|
|
39
|
+
|
|
40
|
+
When an LLM agent returns bad output, you get a generic error and no recovery path:
|
|
41
|
+
|
|
42
|
+
```
|
|
43
|
+
ValidationError: 1 validation error for State
|
|
44
|
+
field required (type=value_error.missing)
|
|
45
|
+
```
|
|
46
|
+
|
|
47
|
+
Which node? Which field? What was passed? Can the agent fix it?
|
|
48
|
+
|
|
49
|
+
## The Solution
|
|
50
|
+
|
|
51
|
+
```python
|
|
52
|
+
from handoff import guard, retry, parse_json
|
|
53
|
+
from pydantic import BaseModel, Field
|
|
54
|
+
|
|
55
|
+
class WriterOutput(BaseModel):
|
|
56
|
+
draft: str = Field(min_length=100)
|
|
57
|
+
word_count: int = Field(ge=50)
|
|
58
|
+
tone: str
|
|
59
|
+
title: str
|
|
60
|
+
|
|
61
|
+
@guard(output=WriterOutput, node_name="writer", max_attempts=3)
|
|
62
|
+
def writer_agent(state: dict) -> dict:
|
|
63
|
+
prompt = "Write a JSON response with: draft, word_count, tone, title."
|
|
64
|
+
|
|
65
|
+
if retry.is_retry:
|
|
66
|
+
prompt += f"\n\nYour previous attempt failed:\n{retry.feedback()}"
|
|
67
|
+
|
|
68
|
+
response = call_llm(prompt)
|
|
69
|
+
return parse_json(response)
|
|
70
|
+
```
|
|
71
|
+
|
|
72
|
+
When validation fails, the agent retries with feedback about what went wrong. After all attempts are exhausted:
|
|
73
|
+
|
|
74
|
+
```
|
|
75
|
+
HandoffViolation in 'writer' (attempt 3/3):
|
|
76
|
+
Contract: output
|
|
77
|
+
Field: draft
|
|
78
|
+
Expected: String should have at least 100 characters
|
|
79
|
+
Suggestion: Increase the length of 'draft'
|
|
80
|
+
History: 3 failed attempts
|
|
81
|
+
```
|
|
82
|
+
|
|
83
|
+
## Quick Start
|
|
84
|
+
|
|
85
|
+
```bash
|
|
86
|
+
pip install handoff-guard
|
|
87
|
+
```
|
|
88
|
+
|
|
89
|
+
```bash
|
|
90
|
+
# See retry-with-feedback in action (no API key needed)
|
|
91
|
+
python -m examples.llm_demo.run_demo
|
|
92
|
+
|
|
93
|
+
# Run with real LLM calls
|
|
94
|
+
export OPENROUTER_API_KEY=your_key
|
|
95
|
+
python -m examples.llm_demo.run_demo --pipeline --api
|
|
96
|
+
```
|
|
97
|
+
|
|
98
|
+
## Features
|
|
99
|
+
|
|
100
|
+
- **Retry with feedback** — Failed outputs are fed back to the agent as context
|
|
101
|
+
- **Know which node failed** — No more guessing from stack traces
|
|
102
|
+
- **Know which field failed** — Exact path to the problem
|
|
103
|
+
- **Get fix suggestions** — Actionable error messages
|
|
104
|
+
- **`parse_json`** — Strips code fences, handles BOM, raises `ParseError` on failure
|
|
105
|
+
- **Framework agnostic** — Works with LangGraph, CrewAI, or plain Python
|
|
106
|
+
- **Lightweight** — Just Pydantic, no Docker, no telemetry servers
|
|
107
|
+
|
|
108
|
+
## API
|
|
109
|
+
|
|
110
|
+
### `@guard` decorator
|
|
111
|
+
|
|
112
|
+
```python
|
|
113
|
+
@guard(
|
|
114
|
+
input=InputSchema, # Pydantic model for input validation
|
|
115
|
+
output=OutputSchema, # Pydantic model for output validation
|
|
116
|
+
node_name="my_node", # Identifies the node in errors (default: function name)
|
|
117
|
+
max_attempts=3, # Retry up to 3 times (default: 1, no retry)
|
|
118
|
+
retry_on=("validation", "parse"), # What errors trigger retry (default)
|
|
119
|
+
on_fail="raise", # "raise" | "return_none" | "return_input" | callable
|
|
120
|
+
)
|
|
121
|
+
```
|
|
122
|
+
|
|
123
|
+
### `retry` proxy
|
|
124
|
+
|
|
125
|
+
Access retry state inside any guarded function:
|
|
126
|
+
|
|
127
|
+
```python
|
|
128
|
+
from handoff import retry
|
|
129
|
+
|
|
130
|
+
retry.is_retry # True if attempt > 1
|
|
131
|
+
retry.attempt # Current attempt number
|
|
132
|
+
retry.max_attempts # Total allowed attempts
|
|
133
|
+
retry.remaining # Attempts left
|
|
134
|
+
retry.is_final_attempt
|
|
135
|
+
retry.feedback() # Formatted string describing last error, or None
|
|
136
|
+
retry.last_error # Diagnostic object, or None
|
|
137
|
+
retry.history # List of AttemptRecord objects
|
|
138
|
+
```
|
|
139
|
+
|
|
140
|
+
### `parse_json`
|
|
141
|
+
|
|
142
|
+
```python
|
|
143
|
+
from handoff import parse_json
|
|
144
|
+
|
|
145
|
+
data = parse_json('```json\n{"key": "value"}\n```')
|
|
146
|
+
# Returns: {"key": "value"}
|
|
147
|
+
# Raises ParseError on failure (retryable by @guard)
|
|
148
|
+
```
|
|
149
|
+
|
|
150
|
+
### `HandoffViolation`
|
|
151
|
+
|
|
152
|
+
Raised when all retry attempts are exhausted:
|
|
153
|
+
|
|
154
|
+
```python
|
|
155
|
+
from handoff import HandoffViolation
|
|
156
|
+
|
|
157
|
+
try:
|
|
158
|
+
result = my_agent(state)
|
|
159
|
+
except HandoffViolation as e:
|
|
160
|
+
print(e.node_name) # "writer"
|
|
161
|
+
print(e.total_attempts) # 3
|
|
162
|
+
print(e.history) # List of AttemptRecord with diagnostics
|
|
163
|
+
print(e.to_dict()) # Serializable for logging
|
|
164
|
+
```
|
|
165
|
+
|
|
166
|
+
### Handle Failures
|
|
167
|
+
|
|
168
|
+
```python
|
|
169
|
+
@guard(output=Schema, on_fail="raise") # Raise exception (default)
|
|
170
|
+
@guard(output=Schema, on_fail="return_none") # Return None on failure
|
|
171
|
+
@guard(output=Schema, on_fail="return_input") # Return input unchanged
|
|
172
|
+
@guard(output=Schema, on_fail=my_handler) # Custom handler
|
|
173
|
+
```
|
|
174
|
+
|
|
175
|
+
## Examples
|
|
176
|
+
|
|
177
|
+
| Demo | What it shows |
|
|
178
|
+
|------|---------------|
|
|
179
|
+
| [`examples/llm_demo`](examples/llm_demo/) | Retry-with-feedback: writer fails, gets feedback, self-corrects |
|
|
180
|
+
| [`examples/rag_demo`](examples/rag_demo/) | Multi-stage pipeline validation + hallucinated citation detection |
|
|
181
|
+
|
|
182
|
+
Both demos support `--api` for real LLM calls and run with mock data by default.
|
|
183
|
+
|
|
184
|
+
## With LangGraph
|
|
185
|
+
|
|
186
|
+
```python
|
|
187
|
+
from handoff.langgraph import guarded_node
|
|
188
|
+
from pydantic import BaseModel, Field
|
|
189
|
+
|
|
190
|
+
class RouterOutput(BaseModel):
|
|
191
|
+
next_agent: str = Field(pattern="^(writer|reviewer|done)$")
|
|
192
|
+
messages: list
|
|
193
|
+
|
|
194
|
+
@guarded_node(output=RouterOutput)
|
|
195
|
+
def router(state: dict) -> dict:
|
|
196
|
+
return {
|
|
197
|
+
"next_agent": "writer",
|
|
198
|
+
"messages": state["messages"]
|
|
199
|
+
}
|
|
200
|
+
```
|
|
201
|
+
|
|
202
|
+
## Why not just use Pydantic directly?
|
|
203
|
+
|
|
204
|
+
You should! Handoff uses Pydantic under the hood.
|
|
205
|
+
|
|
206
|
+
The difference:
|
|
207
|
+
|
|
208
|
+
| Pydantic alone | Handoff |
|
|
209
|
+
|----------------|---------|
|
|
210
|
+
| `ValidationError: 1 validation error` | `HandoffViolation in 'router_node'` |
|
|
211
|
+
| Generic stack trace | Exact node + field + suggestion |
|
|
212
|
+
| You wire up validation manually | One decorator |
|
|
213
|
+
| No retry | Automatic retry with feedback |
|
|
214
|
+
| Errors are for developers | Errors are actionable for agents |
|
|
215
|
+
|
|
216
|
+
## Roadmap
|
|
217
|
+
|
|
218
|
+
- [ ] Invariant contracts (input/output relationships)
|
|
219
|
+
- [ ] CrewAI adapter
|
|
220
|
+
- [x] Retry with feedback loop
|
|
221
|
+
- [ ] VS Code extension for violation inspection
|
|
222
|
+
|
|
223
|
+
## Contributing
|
|
224
|
+
|
|
225
|
+
Contributions welcome! Please open an issue first to discuss what you'd like to change.
|
|
226
|
+
|
|
227
|
+
## License
|
|
228
|
+
|
|
229
|
+
MIT
|
|
230
|
+
|
|
231
|
+
---
|
|
232
|
+
|
|
233
|
+
Built for developers who are tired of debugging agent handoffs.
|