PyPI - verifyloop - Versions diffs - 0.1.0__tar.gz - Mend

verifyloop 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (19) hide show

verifyloop-0.1.0/.gitignore +15 -0
verifyloop-0.1.0/LICENSE +21 -0
verifyloop-0.1.0/PKG-INFO +383 -0
verifyloop-0.1.0/README.md +350 -0
verifyloop-0.1.0/examples/basic_usage.py +37 -0
verifyloop-0.1.0/examples/coding_agent.py +50 -0
verifyloop-0.1.0/examples/debug_agent.py +51 -0
verifyloop-0.1.0/pyproject.toml +62 -0
verifyloop-0.1.0/src/verifyloop/__init__.py +41 -0
verifyloop-0.1.0/src/verifyloop/cli.py +186 -0
verifyloop-0.1.0/src/verifyloop/executor.py +330 -0
verifyloop-0.1.0/src/verifyloop/memory.py +197 -0
verifyloop-0.1.0/src/verifyloop/models.py +146 -0
verifyloop-0.1.0/src/verifyloop/pipeline.py +246 -0
verifyloop-0.1.0/src/verifyloop/planner.py +190 -0
verifyloop-0.1.0/src/verifyloop/recoverer.py +204 -0
verifyloop-0.1.0/src/verifyloop/verifier.py +390 -0
verifyloop-0.1.0/tests/test_pipeline.py +385 -0
verifyloop-0.1.0/tests/test_verifier.py +220 -0

verifyloop-0.1.0/.gitignore ADDED Viewed

@@ -0,0 +1,15 @@
+__pycache__/
+*.pyc
+*.pyo
+*.egg-info/
+dist/
+build/
+.eggs/
+*.egg
+.pytest_cache/
+.mypy_cache/
+.ruff_cache/
+.venv/
+venv/
+*.so
+.env

verifyloop-0.1.0/LICENSE ADDED Viewed

@@ -0,0 +1,21 @@
+MIT License
+Copyright (c) 2025 FableForge Contributors
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

verifyloop-0.1.0/PKG-INFO ADDED Viewed

@@ -0,0 +1,383 @@
+Metadata-Version: 2.4
+Name: verifyloop
+Version: 0.1.0
+Summary: Agent framework implementing Plan → Execute → Verify → Recover with trained verification
+Project-URL: Homepage, https://github.com/fableforge/verifyloop
+Project-URL: Repository, https://github.com/fableforge/verifyloop
+Author-email: FableForge <dev@fableforge.ai>
+License-Expression: MIT
+License-File: LICENSE
+Keywords: agent,autonomous,llm,loop,verification
+Classifier: Development Status :: 3 - Alpha
+Classifier: Intended Audience :: Developers
+Classifier: License :: OSI Approved :: MIT License
+Classifier: Programming Language :: Python :: 3.11
+Classifier: Programming Language :: Python :: 3.12
+Classifier: Topic :: Software Development :: Libraries
+Requires-Python: >=3.11
+Requires-Dist: aiofiles>=23.0
+Requires-Dist: click>=8.0
+Requires-Dist: httpx>=0.27
+Requires-Dist: litellm>=1.40
+Requires-Dist: pydantic>=2.5
+Requires-Dist: rich>=13.0
+Requires-Dist: tree-sitter>=0.21
+Provides-Extra: dev
+Requires-Dist: pytest-asyncio>=0.23; extra == 'dev'
+Requires-Dist: pytest-cov>=5.0; extra == 'dev'
+Requires-Dist: pytest>=8.0; extra == 'dev'
+Requires-Dist: ruff>=0.4; extra == 'dev'
+Provides-Extra: docker
+Requires-Dist: docker>=7.0; extra == 'docker'
+Description-Content-Type: text/markdown
+# VerifyLoop
+[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE) [![Python 3.10+](https://img.shields.io/badge/python-3.10%2B-blue.svg)](https://www.python.org/downloads/) [![Tests](https://img.shields.io/badge/tests-65-green.svg)](tests/)
+> **The Instagram moment for agents.** Plan → Execute → Verify → Recover.
+VerifyLoop is an agent framework where the **verify** step uses a trained model — not a prompt. Every other agent framework verifies with the same LLM that generated the code. That's like asking the person who wrote the bug to confirm there's no bug.
+## Architecture
+```
+┌─────────────────────────────────────────────────────────┐
+│                     AgentPipeline                        │
+│                                                          │
+│  ┌─────────┐    ┌──────────┐    ┌─────────┐    ┌──────┐ │
+│  │  PLAN    │───▶│ EXECUTE  │───▶│ VERIFY  │───▶│ DONE │ │
+│  │         │    │          │    │         │    │  ✓   │ │
+│  └─────────┘    └──────────┘    └────┬────┘    └──────┘ │
+│                                      │                    │
+│                               ┌──────▼──────┐            │
+│                               │  Confidence  │            │
+│                               │   < 0.8 ?    │            │
+│                               └──────┬──────┘            │
+│                                      │ Yes               │
+│                               ┌──────▼──────┐            │
+│                               │  RECOVER    │            │
+│                               │  Fix errors │            │
+│                               └──────┬──────┘            │
+│                                      │                    │
+│                              Loop back to EXECUTE         │
+└─────────────────────────────────────────────────────────┘
+```
+### Why VerifyLoop is different
+| Feature | Other Agents | VerifyLoop |
+|---------|-------------|------------|
+| Verification | LLM prompt (same model) | Trained ReasonCritic model |
+| Error recovery | Retry or re-prompt | Pattern-matched recovery strategies |
+| Confidence scoring | None or vibes | Numeric confidence threshold |
+| Recovery loop | None or ad-hoc | Structured Plan→Exec→Verify→Recover |
+| Token tracking | Best-effort | Built-in per-phase tracking |
+## Quick Start
+### Install
+```bash
+pip install verifyloop
+```
+### CLI
+```bash
+# Run a task
+vl run "add authentication to app.py"
+# Run from a task file
+vl run --task-file tasks/fix_bug.json
+# Interactive mode (confirm each step)
+vl run --interactive "refactor the database layer"
+# Specify models
+vl run --model gpt-4o --verify-model reason-critic-7b "write tests"
+# Dry run (plan only, don't execute)
+vl run --dry-run "create a REST API"
+# Limit iterations
+vl run --max-iterations 3 "fix the flaky test"
+# Docker sandbox for bash commands
+vl run --sandbox "install dependencies and run tests"
+```
+### Python API
+```python
+import asyncio
+from verifyloop import AgentPipeline, PipelineConfig
+async def main():
+    config = PipelineConfig(
+        model="gpt-4o",
+        verify_model="reason-critic-7b",
+        max_iterations=5,
+        confidence_threshold=0.8,
+    )
+    pipeline = AgentPipeline(config)
+    # Stream events
+    async def on_event(event, data):
+        print(f"[{event}] {data}")
+    pipeline.on_event(on_event)
+    result = await pipeline.run(
+        task="Add a hello() function to app.py",
+        context="Python project with a Flask web app",
+    )
+    print(f"Status: {result.status}")
+    print(f"Steps: {len(result.steps)}")
+    print(f"Duration: {result.duration_seconds:.2f}s")
+asyncio.run(main())
+```
+### Individual Components
+```python
+from verifyloop import PlanGenerator, Executor, Verifier, VerifierConfig, Recoverer
+# Use components individually
+planner = PlanGenerator(model="gpt-4o")
+plan = await planner.generate_plan("Fix the login bug in auth.py")
+executor = Executor(working_dir=".")
+step = await executor.bash("pytest tests/")
+verifier = Verifier(VerifierConfig(verify_model="reason-critic-7b"))
+result = await verifier.verify_file_state("auth.py", expected_content="def login()")
+recoverer = Recoverer(model="gpt-4o")
+recovery = await recoverer.recover("FileNotFoundError: auth.py not found")
+```
+## API Reference
+### `PipelineConfig`
+| Field | Type | Default | Description |
+|-------|------|---------|-------------|
+| `model` | `str` | `"gpt-4o"` | LLM model for planning/recovery |
+| `verify_model` | `str` | `"reason-critic-7b"` | Trained verification model |
+| `max_iterations` | `int` | `5` | Max Plan→Execute→Verify loops |
+| `confidence_threshold` | `float` | `0.8` | Minimum confidence to accept result |
+| `max_recovery_attempts` | `int` | `3` | Max recovery attempts per iteration |
+| `working_dir` | `str` | `"."` | Working directory for file ops |
+| `dry_run` | `bool` | `False` | Plan only, don't execute |
+| `interactive` | `bool` | `False` | Confirm each step before execution |
+| `sandbox` | `bool` | `False` | Run bash in Docker container |
+| `sandbox_image` | `str` | `"python:3.11-slim"` | Docker image for sandbox |
+### `AgentPipeline`
+```python
+pipeline = AgentPipeline(config)
+# Run a task
+result: AgentRun = await pipeline.run(task, context, max_iterations)
+# Register event callbacks
+pipeline.on_event(callback)  # async def callback(event: str, data: dict)
+# Access token usage
+print(pipeline.token_usage)
+```
+### `AgentRun`
+| Field | Type | Description |
+|-------|------|-------------|
+| `task` | `str` | Original task description |
+| `steps` | `list[Step]` | All plan/execute/verify/recover steps |
+| `status` | `RunStatus` | `pending` / `planning` / `executing` / `verifying` / `recovering` / `completed` / `failed` |
+| `token_usage` | `TokenUsage` | Prompt + completion token counts |
+| `duration_seconds` | `float` | Total wall-clock time |
+| `iteration` | `int` | Which iteration completed |
+| `metadata` | `dict` | Additional metadata |
+### `Executor`
+```python
+executor = Executor(working_dir=".", sandbox=False)
+# Tools
+result = await executor.bash("ls -la")
+result = await executor.read("app.py")
+result = await executor.write("new_file.py", content)
+result = await executor.edit("app.py", old_content, new_content)
+result = await executor.web_search("python requests library")
+result = await executor.web_fetch("https://example.com/docs")
+# File history and rollback
+history = executor.get_file_history("app.py")
+executor.rollback_file("app.py")
+```
+### `Verifier`
+```python
+verifier = Verifier(VerifierConfig(
+    verify_model="reason-critic-7b",
+    confidence_threshold=0.8,
+    prefer_trained_model=True,
+))
+# Verification methods
+result = await verifier.verify_code_edits(plan, execute_steps)
+result = await verifier.verify_bash_output("pytest", output, expected="passed")
+result = await verifier.verify_file_state("app.py", expected_content="def hello")
+result = await verifier.verify_tests("pytest tests/", working_dir=".")
+```
+### `Recoverer`
+```python
+recoverer = Recoverer(model="gpt-4o", max_recovery_attempts=3)
+# Recovery with pattern matching
+recovery = await recoverer.recover(
+    error="SyntaxError: invalid syntax",
+    context="File: app.py, Line 42",
+    attempt=1,
+)
+# Pattern types: edit, create, retry, simplify, analyze
+print(recovery.recovery_type)   # "edit"
+print(recovery.recovery_attempt) # "Fix syntax error in the file"
+print(recovery.exhausted)        # False
+# Check if retry is worthwhile
+should_retry = recoverer.should_retry("TimeoutError", attempt=2)  # True
+```
+### `InMemoryStore` / `FileStore`
+```python
+from verifyloop import InMemoryStore, FileStore
+# In-memory (default)
+memory = InMemoryStore()
+await memory.store("key", {"data": "value"})
+result = await memory.retrieve("key")
+results = await memory.search("value")
+# Persistent file storage
+memory = FileStore(base_dir=".verifyloop_memory")
+await memory.store("key", {"data": "value"}, namespace="project1")
+```
+### `ConversationContext`
+```python
+from verifyloop.memory import ConversationContext
+ctx = ConversationContext()
+ctx.add_message("user", "Fix the bug in main.py")
+ctx.add_file_context("main.py", "def broken():\n    return 1/0")
+# Build context string for LLM
+context = ctx.build_context_string()
+```
+## Configuration
+### Environment Variables
+| Variable | Description |
+|----------|-------------|
+| `OPENAI_API_KEY` | OpenAI API key (for GPT models) |
+| `ANTHROPIC_API_KEY` | Anthropic API key (for Claude models) |
+| `VERIFYLOOP_VERIFY_MODEL` | Override the verification model |
+| `VERIFYLOOP_CONFIDENCE` | Override confidence threshold (0.0-1.0) |
+### Task File Format
+```json
+{
+  "task": "Add authentication to app.py",
+  "context": "Flask application with a login route",
+  "model": "gpt-4o",
+  "verify_model": "reason-critic-7b",
+  "max_iterations": 3
+}
+```
+## Comparison with Other Agent Frameworks
+### vs. AutoGPT / BabyAGI
+| Aspect | AutoGPT | VerifyLoop |
+|--------|---------|------------|
+| Planning | Single prompt | Decomposed substeps with tool estimation |
+| Verification | None | Trained model with confidence scoring |
+| Recovery | Basic retry | Pattern-matched strategies (5 types) |
+| Loop control | Infinite loop risk | Bounded iterations + convergence check |
+### vs. LangChain Agents
+| Aspect | LangChain | VerifyLoop |
+|--------|-----------|------------|
+| Verification | LLM-as-judge (same model) | Dedicated trained verification model |
+| Structured output | Optional | Enforced via Pydantic models |
+| Recovery | Chain retries | Typed recovery with strategy selection |
+| Token tracking | Callback-based | Built-in per-phase tracking |
+### vs. Claude Code / Cursor
+| Aspect | Claude Code | VerifyLoop |
+|--------|-------------|------------|
+| Verification | Same model self-review | Dedicated ReasonCritic model |
+| Recovery | Re-prompt | Pattern-matched with LLM fallback |
+| Programmatic | Limited CLI | Full Python API + CLI |
+| Extensibility | Plugin system | Tool interface + plugin system |
+## Verification Model: ReasonCritic
+The key differentiator. VerifyLoop uses **ReasonCritic**, a trained model specifically for verification:
+1. **Not a prompt** — It's a model fine-tuned on verification tasks (code review, test analysis, output comparison)
+2. **Falls back gracefully** — If ReasonCritic is unavailable, falls back to a general LLM with structured verification prompts
+3. **Confidence scoring** — Numeric 0-1 confidence score, not binary pass/fail
+4. **Actionable failures** — Every failure comes with fix suggestions, not just "it broke"
+## License
+MIT
+## Ecosystem
+Part of the [FableForge](../) ecosystem — 21 open-source projects built from 210K real agent traces:
+| Project | Description |
+| --- | --- |
+| **[Anvil](../anvil)** | Self-verified coding agent |
+| **[VerifyLoop](../verifyloop)** | Plan→Execute→Verify→Recover framework |
+| **[ErrorRecovery](../error-recovery)** | Self-healing middleware (3,725 error patterns) |
+| **[FableForge-14B](../fableforge-14b)** | The fine-tuned 14B model (4-stage training) |
+| **[ShellWhisperer](../shell-whisperer)** | 1.5B edge agent (phone/RPi, 50ms) |
+| **[ReasonCritic](../reason-critic)** | Verification model (130 benchmark tasks) |
+| **[TraceCompiler](../trace-compiler)** | Compile traces → LoRA skills |
+| **[AgentRuntime](../agent-runtime)** | Persistent agent daemon (systemd for AI) |
+| **[AgentSwarm](../agent-swarm)** | Multi-agent from real trace transitions |
+| **[AgentTelemetry](../agent-telemetry)** | Datadog for agents (token tracking, costs) |
+| **[BenchAgent](../bench-agent)** | HumanEval for tool-use (107 tasks) |
+| **[AgentDev](../agent-dev)** | VSCode extension with verification |
+| **[TraceViz](../trace-viz)** | Trace replay visualizer (Next.js) |
+| **[AgentSkills](../agent-skills)** | npm for agent behaviors |
+| **[AgentCurriculum](../agent-curriculum)** | 5-stage progressive training |
+| **[AgentFuzzer](../agent-fuzzer)** | Adversarial testing for agents |
+| **[AgentConstitution](../agent-constitution)** | Safety guardrails from traces |
+| **[CostOptimizer](../cost-optimizer)** | Token cost reduction (50-80%) |
+| **[AgentProfiler](../agent-profiler)** | Behavioral fingerprinting |
+| **[TrajectoryDistiller](../trajectory-distiller)** | Trace→training data pipeline |
+| **[Fable5-Dataset](../fable5-dataset)** | HuggingFace dataset release |