verifyloop 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,15 @@
1
+ __pycache__/
2
+ *.pyc
3
+ *.pyo
4
+ *.egg-info/
5
+ dist/
6
+ build/
7
+ .eggs/
8
+ *.egg
9
+ .pytest_cache/
10
+ .mypy_cache/
11
+ .ruff_cache/
12
+ .venv/
13
+ venv/
14
+ *.so
15
+ .env
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2025 FableForge Contributors
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1,383 @@
1
+ Metadata-Version: 2.4
2
+ Name: verifyloop
3
+ Version: 0.1.0
4
+ Summary: Agent framework implementing Plan → Execute → Verify → Recover with trained verification
5
+ Project-URL: Homepage, https://github.com/fableforge/verifyloop
6
+ Project-URL: Repository, https://github.com/fableforge/verifyloop
7
+ Author-email: FableForge <dev@fableforge.ai>
8
+ License-Expression: MIT
9
+ License-File: LICENSE
10
+ Keywords: agent,autonomous,llm,loop,verification
11
+ Classifier: Development Status :: 3 - Alpha
12
+ Classifier: Intended Audience :: Developers
13
+ Classifier: License :: OSI Approved :: MIT License
14
+ Classifier: Programming Language :: Python :: 3.11
15
+ Classifier: Programming Language :: Python :: 3.12
16
+ Classifier: Topic :: Software Development :: Libraries
17
+ Requires-Python: >=3.11
18
+ Requires-Dist: aiofiles>=23.0
19
+ Requires-Dist: click>=8.0
20
+ Requires-Dist: httpx>=0.27
21
+ Requires-Dist: litellm>=1.40
22
+ Requires-Dist: pydantic>=2.5
23
+ Requires-Dist: rich>=13.0
24
+ Requires-Dist: tree-sitter>=0.21
25
+ Provides-Extra: dev
26
+ Requires-Dist: pytest-asyncio>=0.23; extra == 'dev'
27
+ Requires-Dist: pytest-cov>=5.0; extra == 'dev'
28
+ Requires-Dist: pytest>=8.0; extra == 'dev'
29
+ Requires-Dist: ruff>=0.4; extra == 'dev'
30
+ Provides-Extra: docker
31
+ Requires-Dist: docker>=7.0; extra == 'docker'
32
+ Description-Content-Type: text/markdown
33
+
34
+ # VerifyLoop
35
+
36
+ [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE) [![Python 3.10+](https://img.shields.io/badge/python-3.10%2B-blue.svg)](https://www.python.org/downloads/) [![Tests](https://img.shields.io/badge/tests-65-green.svg)](tests/)
37
+
38
+
39
+ > **The Instagram moment for agents.** Plan → Execute → Verify → Recover.
40
+
41
+ VerifyLoop is an agent framework where the **verify** step uses a trained model — not a prompt. Every other agent framework verifies with the same LLM that generated the code. That's like asking the person who wrote the bug to confirm there's no bug.
42
+
43
+ ## Architecture
44
+
45
+ ```
46
+ ┌─────────────────────────────────────────────────────────┐
47
+ │ AgentPipeline │
48
+ │ │
49
+ │ ┌─────────┐ ┌──────────┐ ┌─────────┐ ┌──────┐ │
50
+ │ │ PLAN │───▶│ EXECUTE │───▶│ VERIFY │───▶│ DONE │ │
51
+ │ │ │ │ │ │ │ │ ✓ │ │
52
+ │ └─────────┘ └──────────┘ └────┬────┘ └──────┘ │
53
+ │ │ │
54
+ │ ┌──────▼──────┐ │
55
+ │ │ Confidence │ │
56
+ │ │ < 0.8 ? │ │
57
+ │ └──────┬──────┘ │
58
+ │ │ Yes │
59
+ │ ┌──────▼──────┐ │
60
+ │ │ RECOVER │ │
61
+ │ │ Fix errors │ │
62
+ │ └──────┬──────┘ │
63
+ │ │ │
64
+ │ Loop back to EXECUTE │
65
+ └─────────────────────────────────────────────────────────┘
66
+ ```
67
+
68
+ ### Why VerifyLoop is different
69
+
70
+ | Feature | Other Agents | VerifyLoop |
71
+ |---------|-------------|------------|
72
+ | Verification | LLM prompt (same model) | Trained ReasonCritic model |
73
+ | Error recovery | Retry or re-prompt | Pattern-matched recovery strategies |
74
+ | Confidence scoring | None or vibes | Numeric confidence threshold |
75
+ | Recovery loop | None or ad-hoc | Structured Plan→Exec→Verify→Recover |
76
+ | Token tracking | Best-effort | Built-in per-phase tracking |
77
+
78
+ ## Quick Start
79
+
80
+ ### Install
81
+
82
+ ```bash
83
+ pip install verifyloop
84
+ ```
85
+
86
+ ### CLI
87
+
88
+ ```bash
89
+ # Run a task
90
+ vl run "add authentication to app.py"
91
+
92
+ # Run from a task file
93
+ vl run --task-file tasks/fix_bug.json
94
+
95
+ # Interactive mode (confirm each step)
96
+ vl run --interactive "refactor the database layer"
97
+
98
+ # Specify models
99
+ vl run --model gpt-4o --verify-model reason-critic-7b "write tests"
100
+
101
+ # Dry run (plan only, don't execute)
102
+ vl run --dry-run "create a REST API"
103
+
104
+ # Limit iterations
105
+ vl run --max-iterations 3 "fix the flaky test"
106
+
107
+ # Docker sandbox for bash commands
108
+ vl run --sandbox "install dependencies and run tests"
109
+ ```
110
+
111
+ ### Python API
112
+
113
+ ```python
114
+ import asyncio
115
+ from verifyloop import AgentPipeline, PipelineConfig
116
+
117
+ async def main():
118
+ config = PipelineConfig(
119
+ model="gpt-4o",
120
+ verify_model="reason-critic-7b",
121
+ max_iterations=5,
122
+ confidence_threshold=0.8,
123
+ )
124
+
125
+ pipeline = AgentPipeline(config)
126
+
127
+ # Stream events
128
+ async def on_event(event, data):
129
+ print(f"[{event}] {data}")
130
+
131
+ pipeline.on_event(on_event)
132
+
133
+ result = await pipeline.run(
134
+ task="Add a hello() function to app.py",
135
+ context="Python project with a Flask web app",
136
+ )
137
+
138
+ print(f"Status: {result.status}")
139
+ print(f"Steps: {len(result.steps)}")
140
+ print(f"Duration: {result.duration_seconds:.2f}s")
141
+
142
+ asyncio.run(main())
143
+ ```
144
+
145
+ ### Individual Components
146
+
147
+ ```python
148
+ from verifyloop import PlanGenerator, Executor, Verifier, VerifierConfig, Recoverer
149
+
150
+ # Use components individually
151
+ planner = PlanGenerator(model="gpt-4o")
152
+ plan = await planner.generate_plan("Fix the login bug in auth.py")
153
+
154
+ executor = Executor(working_dir=".")
155
+ step = await executor.bash("pytest tests/")
156
+
157
+ verifier = Verifier(VerifierConfig(verify_model="reason-critic-7b"))
158
+ result = await verifier.verify_file_state("auth.py", expected_content="def login()")
159
+
160
+ recoverer = Recoverer(model="gpt-4o")
161
+ recovery = await recoverer.recover("FileNotFoundError: auth.py not found")
162
+ ```
163
+
164
+ ## API Reference
165
+
166
+ ### `PipelineConfig`
167
+
168
+ | Field | Type | Default | Description |
169
+ |-------|------|---------|-------------|
170
+ | `model` | `str` | `"gpt-4o"` | LLM model for planning/recovery |
171
+ | `verify_model` | `str` | `"reason-critic-7b"` | Trained verification model |
172
+ | `max_iterations` | `int` | `5` | Max Plan→Execute→Verify loops |
173
+ | `confidence_threshold` | `float` | `0.8` | Minimum confidence to accept result |
174
+ | `max_recovery_attempts` | `int` | `3` | Max recovery attempts per iteration |
175
+ | `working_dir` | `str` | `"."` | Working directory for file ops |
176
+ | `dry_run` | `bool` | `False` | Plan only, don't execute |
177
+ | `interactive` | `bool` | `False` | Confirm each step before execution |
178
+ | `sandbox` | `bool` | `False` | Run bash in Docker container |
179
+ | `sandbox_image` | `str` | `"python:3.11-slim"` | Docker image for sandbox |
180
+
181
+ ### `AgentPipeline`
182
+
183
+ ```python
184
+ pipeline = AgentPipeline(config)
185
+
186
+ # Run a task
187
+ result: AgentRun = await pipeline.run(task, context, max_iterations)
188
+
189
+ # Register event callbacks
190
+ pipeline.on_event(callback) # async def callback(event: str, data: dict)
191
+
192
+ # Access token usage
193
+ print(pipeline.token_usage)
194
+ ```
195
+
196
+ ### `AgentRun`
197
+
198
+ | Field | Type | Description |
199
+ |-------|------|-------------|
200
+ | `task` | `str` | Original task description |
201
+ | `steps` | `list[Step]` | All plan/execute/verify/recover steps |
202
+ | `status` | `RunStatus` | `pending` / `planning` / `executing` / `verifying` / `recovering` / `completed` / `failed` |
203
+ | `token_usage` | `TokenUsage` | Prompt + completion token counts |
204
+ | `duration_seconds` | `float` | Total wall-clock time |
205
+ | `iteration` | `int` | Which iteration completed |
206
+ | `metadata` | `dict` | Additional metadata |
207
+
208
+ ### `Executor`
209
+
210
+ ```python
211
+ executor = Executor(working_dir=".", sandbox=False)
212
+
213
+ # Tools
214
+ result = await executor.bash("ls -la")
215
+ result = await executor.read("app.py")
216
+ result = await executor.write("new_file.py", content)
217
+ result = await executor.edit("app.py", old_content, new_content)
218
+ result = await executor.web_search("python requests library")
219
+ result = await executor.web_fetch("https://example.com/docs")
220
+
221
+ # File history and rollback
222
+ history = executor.get_file_history("app.py")
223
+ executor.rollback_file("app.py")
224
+ ```
225
+
226
+ ### `Verifier`
227
+
228
+ ```python
229
+ verifier = Verifier(VerifierConfig(
230
+ verify_model="reason-critic-7b",
231
+ confidence_threshold=0.8,
232
+ prefer_trained_model=True,
233
+ ))
234
+
235
+ # Verification methods
236
+ result = await verifier.verify_code_edits(plan, execute_steps)
237
+ result = await verifier.verify_bash_output("pytest", output, expected="passed")
238
+ result = await verifier.verify_file_state("app.py", expected_content="def hello")
239
+ result = await verifier.verify_tests("pytest tests/", working_dir=".")
240
+ ```
241
+
242
+ ### `Recoverer`
243
+
244
+ ```python
245
+ recoverer = Recoverer(model="gpt-4o", max_recovery_attempts=3)
246
+
247
+ # Recovery with pattern matching
248
+ recovery = await recoverer.recover(
249
+ error="SyntaxError: invalid syntax",
250
+ context="File: app.py, Line 42",
251
+ attempt=1,
252
+ )
253
+
254
+ # Pattern types: edit, create, retry, simplify, analyze
255
+ print(recovery.recovery_type) # "edit"
256
+ print(recovery.recovery_attempt) # "Fix syntax error in the file"
257
+ print(recovery.exhausted) # False
258
+
259
+ # Check if retry is worthwhile
260
+ should_retry = recoverer.should_retry("TimeoutError", attempt=2) # True
261
+ ```
262
+
263
+ ### `InMemoryStore` / `FileStore`
264
+
265
+ ```python
266
+ from verifyloop import InMemoryStore, FileStore
267
+
268
+ # In-memory (default)
269
+ memory = InMemoryStore()
270
+ await memory.store("key", {"data": "value"})
271
+ result = await memory.retrieve("key")
272
+ results = await memory.search("value")
273
+
274
+ # Persistent file storage
275
+ memory = FileStore(base_dir=".verifyloop_memory")
276
+ await memory.store("key", {"data": "value"}, namespace="project1")
277
+ ```
278
+
279
+ ### `ConversationContext`
280
+
281
+ ```python
282
+ from verifyloop.memory import ConversationContext
283
+
284
+ ctx = ConversationContext()
285
+ ctx.add_message("user", "Fix the bug in main.py")
286
+ ctx.add_file_context("main.py", "def broken():\n return 1/0")
287
+
288
+ # Build context string for LLM
289
+ context = ctx.build_context_string()
290
+ ```
291
+
292
+ ## Configuration
293
+
294
+ ### Environment Variables
295
+
296
+ | Variable | Description |
297
+ |----------|-------------|
298
+ | `OPENAI_API_KEY` | OpenAI API key (for GPT models) |
299
+ | `ANTHROPIC_API_KEY` | Anthropic API key (for Claude models) |
300
+ | `VERIFYLOOP_VERIFY_MODEL` | Override the verification model |
301
+ | `VERIFYLOOP_CONFIDENCE` | Override confidence threshold (0.0-1.0) |
302
+
303
+ ### Task File Format
304
+
305
+ ```json
306
+ {
307
+ "task": "Add authentication to app.py",
308
+ "context": "Flask application with a login route",
309
+ "model": "gpt-4o",
310
+ "verify_model": "reason-critic-7b",
311
+ "max_iterations": 3
312
+ }
313
+ ```
314
+
315
+ ## Comparison with Other Agent Frameworks
316
+
317
+ ### vs. AutoGPT / BabyAGI
318
+
319
+ | Aspect | AutoGPT | VerifyLoop |
320
+ |--------|---------|------------|
321
+ | Planning | Single prompt | Decomposed substeps with tool estimation |
322
+ | Verification | None | Trained model with confidence scoring |
323
+ | Recovery | Basic retry | Pattern-matched strategies (5 types) |
324
+ | Loop control | Infinite loop risk | Bounded iterations + convergence check |
325
+
326
+ ### vs. LangChain Agents
327
+
328
+ | Aspect | LangChain | VerifyLoop |
329
+ |--------|-----------|------------|
330
+ | Verification | LLM-as-judge (same model) | Dedicated trained verification model |
331
+ | Structured output | Optional | Enforced via Pydantic models |
332
+ | Recovery | Chain retries | Typed recovery with strategy selection |
333
+ | Token tracking | Callback-based | Built-in per-phase tracking |
334
+
335
+ ### vs. Claude Code / Cursor
336
+
337
+ | Aspect | Claude Code | VerifyLoop |
338
+ |--------|-------------|------------|
339
+ | Verification | Same model self-review | Dedicated ReasonCritic model |
340
+ | Recovery | Re-prompt | Pattern-matched with LLM fallback |
341
+ | Programmatic | Limited CLI | Full Python API + CLI |
342
+ | Extensibility | Plugin system | Tool interface + plugin system |
343
+
344
+ ## Verification Model: ReasonCritic
345
+
346
+ The key differentiator. VerifyLoop uses **ReasonCritic**, a trained model specifically for verification:
347
+
348
+ 1. **Not a prompt** — It's a model fine-tuned on verification tasks (code review, test analysis, output comparison)
349
+ 2. **Falls back gracefully** — If ReasonCritic is unavailable, falls back to a general LLM with structured verification prompts
350
+ 3. **Confidence scoring** — Numeric 0-1 confidence score, not binary pass/fail
351
+ 4. **Actionable failures** — Every failure comes with fix suggestions, not just "it broke"
352
+
353
+ ## License
354
+
355
+ MIT
356
+
357
+ ## Ecosystem
358
+
359
+ Part of the [FableForge](../) ecosystem — 21 open-source projects built from 210K real agent traces:
360
+
361
+ | Project | Description |
362
+ | --- | --- |
363
+ | **[Anvil](../anvil)** | Self-verified coding agent |
364
+ | **[VerifyLoop](../verifyloop)** | Plan→Execute→Verify→Recover framework |
365
+ | **[ErrorRecovery](../error-recovery)** | Self-healing middleware (3,725 error patterns) |
366
+ | **[FableForge-14B](../fableforge-14b)** | The fine-tuned 14B model (4-stage training) |
367
+ | **[ShellWhisperer](../shell-whisperer)** | 1.5B edge agent (phone/RPi, 50ms) |
368
+ | **[ReasonCritic](../reason-critic)** | Verification model (130 benchmark tasks) |
369
+ | **[TraceCompiler](../trace-compiler)** | Compile traces → LoRA skills |
370
+ | **[AgentRuntime](../agent-runtime)** | Persistent agent daemon (systemd for AI) |
371
+ | **[AgentSwarm](../agent-swarm)** | Multi-agent from real trace transitions |
372
+ | **[AgentTelemetry](../agent-telemetry)** | Datadog for agents (token tracking, costs) |
373
+ | **[BenchAgent](../bench-agent)** | HumanEval for tool-use (107 tasks) |
374
+ | **[AgentDev](../agent-dev)** | VSCode extension with verification |
375
+ | **[TraceViz](../trace-viz)** | Trace replay visualizer (Next.js) |
376
+ | **[AgentSkills](../agent-skills)** | npm for agent behaviors |
377
+ | **[AgentCurriculum](../agent-curriculum)** | 5-stage progressive training |
378
+ | **[AgentFuzzer](../agent-fuzzer)** | Adversarial testing for agents |
379
+ | **[AgentConstitution](../agent-constitution)** | Safety guardrails from traces |
380
+ | **[CostOptimizer](../cost-optimizer)** | Token cost reduction (50-80%) |
381
+ | **[AgentProfiler](../agent-profiler)** | Behavioral fingerprinting |
382
+ | **[TrajectoryDistiller](../trajectory-distiller)** | Trace→training data pipeline |
383
+ | **[Fable5-Dataset](../fable5-dataset)** | HuggingFace dataset release |