PyPI - pyfix-agent - Versions diffs - 1.0.0__tar.gz - Mend

pyfix-agent 1.0.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (55) hide show

pyfix_agent-1.0.0/PKG-INFO ADDED Viewed

@@ -0,0 +1,175 @@
+Metadata-Version: 2.4
+Name: pyfix-agent
+Version: 1.0.0
+Summary: An autonomous, multi-turn AI debugging agent built from scratch using AST surgery.
+Author: Jaswin Reddy
+License: MIT
+Classifier: Programming Language :: Python :: 3
+Classifier: License :: OSI Approved :: MIT License
+Classifier: Operating System :: OS Independent
+Classifier: Environment :: Console
+Classifier: Topic :: Software Development :: Debuggers
+Requires-Python: >=3.10
+Description-Content-Type: text/markdown
+Requires-Dist: huggingface_hub>=0.20.0
+# 🚀 PyFix Agent: Autonomous ReAct Debugging Loop
+An autonomous, multi-turn AI debugging agent built entirely from scratch in Python.
+Unlike standard wrappers that simply ask an LLM to "fix this code," **PyFix Agent** implements a custom **ReAct (Reasoning and Acting)** state machine and utilizes **Abstract Syntax Tree (AST)** manipulation to surgically patch Python files in real-time. It evaluates its own fixes by executing the code inside a sandboxed subprocess, iterating dynamically until the script passes or it reaches the maximum iteration limit.
+---
+## 🧠 Core Architecture
+This project deliberately avoids high-level agentic abstractions (such as LangChain or LlamaIndex) to build the core agentic loop from first principles.
+```mermaid
+graph TD
+    A[Start] --> B[Execute target script via Subprocess]
+    B --> C{Execution successful?}
+    C -- Yes --> D[Stop: Bug Fixed 🎉]
+    C -- No --> E[Extract Stack Trace & Error Message]
+    E --> F[Parse Stack Trace for last function name]
+    F --> G[Construct LLM Prompt with Context Memory]
+    G --> H[Query LLM for patch]
+    H --> I[Clean LLM markdown and parse AST]
+    I --> J{Function-level error?}
+    J -- Yes --> K[Use AST surgery to replace target function node]
+    J -- No --> L[Fallback: Replace entire file]
+    K --> M[Write patched script to disk]
+    L --> M
+    M --> N{Max iterations reached?}
+    N -- Yes --> O[Stop: Max iterations reached ❌]
+    N -- No --> B
+```
+### Key Architectural Pillars
+1. **Execution Engine**: Runs the target script via Python subprocesses, capturing standard outputs, standard errors, and stack traces with safety timeout thresholds.
+2. **Context Memory**: Maintains a chronological conversation history array, allowing the LLM to learn from its previously failed patching attempts without losing the original code context.
+3. **AST Surgery**: Parses the LLM's response and uses Python's native `ast.NodeTransformer` to swap out broken function nodes with the corrected logic, leaving the rest of the file entirely untouched.
+---
+## ⚖️ Design Choices & Trade-offs
+Building an autonomous agent requires balancing safety, context window limits, and real-world unpredictability.
+### 1. AST Function Surgery vs. Full File Overwrites
+* **The Problem**: Asking an LLM to rewrite an entire 1,000-line script to fix a single typo is slow, expensive, and risks the model "truncating" or getting lazy with existing, working code.
+* **The Solution**: The agent extracts the specific `function_name` from the traceback. It prompts the LLM only for the corrected function. The `PythonSurgery` class (inheriting from `ast.NodeTransformer`) then traverses the syntax tree, finds the broken `ast.FunctionDef`, and seamlessly swaps it with the new node.
+* **The Trade-off**: While this guarantees perfect preservation of unrelated code, it requires specialized routing logic for errors that occur at the top-level `<module>` scope, which bypass the AST function surgery and require full-file patching.
+### 2. Execution-Based Evaluation vs. Exact String Matching
+* **The Problem**: How do we benchmark if the agent successfully fixed a bug? Traditional exact string matching fails because the LLM might use different variable names (e.g., `x += 1` instead of `x = x + 1`), resulting in false negatives.
+* **The Solution**: The evaluation suite uses **Execution-Based Benchmarking**. The benchmark dynamically runs automated unit tests or validation scripts containing assertion statements against the patched files. If the patched script exits with code 0, it is marked as a success.
+---
+## 📊 Evaluation Benchmark
+The agent is evaluated against a curated dataset of scripts spanning 5 distinct error categories:
+* **NameError**: Undefined variables, scope issues, and missing imports.
+* **IndexError**: Off-by-one loop conditions and bounds checking.
+* **TypeError**: Data type mismatches and unsupported operations.
+* **AttributeError**: Typographical errors in object methods or calling methods on `NoneType`.
+* **Logic Bugs**: Silent errors that require execution-based assertions to detect.
+*(Currently evaluated against an automated function-level testing benchmark suite inside `eval_dataset/`)*
+---
+## 🛠️ Installation & Usage
+### Option 1: Standard Pip Installation (Recommended)
+To install PyFix Agent locally in editable mode (which registers the CLI tool globally):
+```bash
+# Clone the repository
+git clone https://github.com/yourusername/agent-debugging-loop.git
+cd agent-debugging-loop
+# Install package in editable mode
+pip install -e .
+```
+This registers the global CLI tool `pyfix-agent` which can be executed from anywhere.
+### Option 2: Run as a Python Script
+If you prefer to run it without installing the package:
+```bash
+pip install huggingface_hub
+python pyfix_agent.py --script <path_to_script>
+```
+### Configuration
+Export your Hugging Face Hub token to your environment variables to ensure secure API access:
+**Bash (Linux/macOS):**
+```bash
+export HF_TOKEN="your_huggingface_token_here"
+```
+**PowerShell (Windows):**
+```powershell
+$env:HF_TOKEN="your_huggingface_token_here"
+```
+---
+## 🚀 CLI Usage Guide
+Point the agent at any broken Python script. Use the `--verbose` flag to watch the ReAct state machine's internal thought process.
+```bash
+# Run with default Qwen model
+pyfix-agent --script my_broken_code.py --verbose --max_iter 5
+# Run using a specific Hugging Face model
+pyfix-agent --script my_broken_code.py --model "mistralai/Mixtral-8x7B-Instruct-v0.1" --max_iter 3
+```
+### CLI Command Options
+| Argument | Type | Default | Description |
+|---|---|---|---|
+| `--script` | `str` | *Required* | Path to the broken Python script to debug |
+| `--max_iter` | `int` | `5` | Maximum number of debugging iterations |
+| `--verbose` | `flag` | `False` | Enable logging of reasoning, tracebacks, and raw LLM responses |
+| `--model` | `str` | `Qwen/Qwen2.5-72B-Instruct:cheapest` | Model endpoint ID on Hugging Face Serverless API |
+### Running the Evaluation Benchmark
+To run the full evaluation suite against the benchmark:
+```bash
+python benchmark.py
+```
+---
+## 📦 Packaging & Release Recommendations
+For releasing version 1.0.0 of **PyFix Agent** as a CLI tool:
+### 1. Direct Python Package (Recommended for Python users)
+Distribute PyFix Agent as a Python package via PyPI.
+* **Build tool**: Use `build` (`pip install build`) to compile source distribution `.tar.gz` and wheel `.whl` files.
+* **Upload tool**: Use `twine` to publish the artifacts to PyPI.
+* **Install**: Users can install it directly with `pip install pyfix-agent` or run isolated using `pipx run pyfix-agent`.
+### 2. Standalone Binary Executable (Recommended for Non-Python users)
+Compile the script into a standalone executable using `PyInstaller`.
+* **Compile**:
+  ```bash
+  pip install pyinstaller
+  pyinstaller --onefile --name pyfix-agent pyfix_agent.py
+  ```
+* **Release Artifact**: Upload the compiled executable (`dist/pyfix-agent` or `dist/pyfix-agent.exe`) directly as a release asset in your GitHub Releases.
+* *Note: The target environment still needs a Python interpreter installed to execute target scripts via `sys.executable`.*

pyfix_agent-1.0.0/README.md ADDED Viewed

@@ -0,0 +1,160 @@
+# 🚀 PyFix Agent: Autonomous ReAct Debugging Loop
+An autonomous, multi-turn AI debugging agent built entirely from scratch in Python.
+Unlike standard wrappers that simply ask an LLM to "fix this code," **PyFix Agent** implements a custom **ReAct (Reasoning and Acting)** state machine and utilizes **Abstract Syntax Tree (AST)** manipulation to surgically patch Python files in real-time. It evaluates its own fixes by executing the code inside a sandboxed subprocess, iterating dynamically until the script passes or it reaches the maximum iteration limit.
+---
+## 🧠 Core Architecture
+This project deliberately avoids high-level agentic abstractions (such as LangChain or LlamaIndex) to build the core agentic loop from first principles.
+```mermaid
+graph TD
+    A[Start] --> B[Execute target script via Subprocess]
+    B --> C{Execution successful?}
+    C -- Yes --> D[Stop: Bug Fixed 🎉]
+    C -- No --> E[Extract Stack Trace & Error Message]
+    E --> F[Parse Stack Trace for last function name]
+    F --> G[Construct LLM Prompt with Context Memory]
+    G --> H[Query LLM for patch]
+    H --> I[Clean LLM markdown and parse AST]
+    I --> J{Function-level error?}
+    J -- Yes --> K[Use AST surgery to replace target function node]
+    J -- No --> L[Fallback: Replace entire file]
+    K --> M[Write patched script to disk]
+    L --> M
+    M --> N{Max iterations reached?}
+    N -- Yes --> O[Stop: Max iterations reached ❌]
+    N -- No --> B
+```
+### Key Architectural Pillars
+1. **Execution Engine**: Runs the target script via Python subprocesses, capturing standard outputs, standard errors, and stack traces with safety timeout thresholds.
+2. **Context Memory**: Maintains a chronological conversation history array, allowing the LLM to learn from its previously failed patching attempts without losing the original code context.
+3. **AST Surgery**: Parses the LLM's response and uses Python's native `ast.NodeTransformer` to swap out broken function nodes with the corrected logic, leaving the rest of the file entirely untouched.
+---
+## ⚖️ Design Choices & Trade-offs
+Building an autonomous agent requires balancing safety, context window limits, and real-world unpredictability.
+### 1. AST Function Surgery vs. Full File Overwrites
+* **The Problem**: Asking an LLM to rewrite an entire 1,000-line script to fix a single typo is slow, expensive, and risks the model "truncating" or getting lazy with existing, working code.
+* **The Solution**: The agent extracts the specific `function_name` from the traceback. It prompts the LLM only for the corrected function. The `PythonSurgery` class (inheriting from `ast.NodeTransformer`) then traverses the syntax tree, finds the broken `ast.FunctionDef`, and seamlessly swaps it with the new node.
+* **The Trade-off**: While this guarantees perfect preservation of unrelated code, it requires specialized routing logic for errors that occur at the top-level `<module>` scope, which bypass the AST function surgery and require full-file patching.
+### 2. Execution-Based Evaluation vs. Exact String Matching
+* **The Problem**: How do we benchmark if the agent successfully fixed a bug? Traditional exact string matching fails because the LLM might use different variable names (e.g., `x += 1` instead of `x = x + 1`), resulting in false negatives.
+* **The Solution**: The evaluation suite uses **Execution-Based Benchmarking**. The benchmark dynamically runs automated unit tests or validation scripts containing assertion statements against the patched files. If the patched script exits with code 0, it is marked as a success.
+---
+## 📊 Evaluation Benchmark
+The agent is evaluated against a curated dataset of scripts spanning 5 distinct error categories:
+* **NameError**: Undefined variables, scope issues, and missing imports.
+* **IndexError**: Off-by-one loop conditions and bounds checking.
+* **TypeError**: Data type mismatches and unsupported operations.
+* **AttributeError**: Typographical errors in object methods or calling methods on `NoneType`.
+* **Logic Bugs**: Silent errors that require execution-based assertions to detect.
+*(Currently evaluated against an automated function-level testing benchmark suite inside `eval_dataset/`)*
+---
+## 🛠️ Installation & Usage
+### Option 1: Standard Pip Installation (Recommended)
+To install PyFix Agent locally in editable mode (which registers the CLI tool globally):
+```bash
+# Clone the repository
+git clone https://github.com/yourusername/agent-debugging-loop.git
+cd agent-debugging-loop
+# Install package in editable mode
+pip install -e .
+```
+This registers the global CLI tool `pyfix-agent` which can be executed from anywhere.
+### Option 2: Run as a Python Script
+If you prefer to run it without installing the package:
+```bash
+pip install huggingface_hub
+python pyfix_agent.py --script <path_to_script>
+```
+### Configuration
+Export your Hugging Face Hub token to your environment variables to ensure secure API access:
+**Bash (Linux/macOS):**
+```bash
+export HF_TOKEN="your_huggingface_token_here"
+```
+**PowerShell (Windows):**
+```powershell
+$env:HF_TOKEN="your_huggingface_token_here"
+```
+---
+## 🚀 CLI Usage Guide
+Point the agent at any broken Python script. Use the `--verbose` flag to watch the ReAct state machine's internal thought process.
+```bash
+# Run with default Qwen model
+pyfix-agent --script my_broken_code.py --verbose --max_iter 5
+# Run using a specific Hugging Face model
+pyfix-agent --script my_broken_code.py --model "mistralai/Mixtral-8x7B-Instruct-v0.1" --max_iter 3
+```
+### CLI Command Options
+| Argument | Type | Default | Description |
+|---|---|---|---|
+| `--script` | `str` | *Required* | Path to the broken Python script to debug |
+| `--max_iter` | `int` | `5` | Maximum number of debugging iterations |
+| `--verbose` | `flag` | `False` | Enable logging of reasoning, tracebacks, and raw LLM responses |
+| `--model` | `str` | `Qwen/Qwen2.5-72B-Instruct:cheapest` | Model endpoint ID on Hugging Face Serverless API |
+### Running the Evaluation Benchmark
+To run the full evaluation suite against the benchmark:
+```bash
+python benchmark.py
+```
+---
+## 📦 Packaging & Release Recommendations
+For releasing version 1.0.0 of **PyFix Agent** as a CLI tool:
+### 1. Direct Python Package (Recommended for Python users)
+Distribute PyFix Agent as a Python package via PyPI.
+* **Build tool**: Use `build` (`pip install build`) to compile source distribution `.tar.gz` and wheel `.whl` files.
+* **Upload tool**: Use `twine` to publish the artifacts to PyPI.
+* **Install**: Users can install it directly with `pip install pyfix-agent` or run isolated using `pipx run pyfix-agent`.
+### 2. Standalone Binary Executable (Recommended for Non-Python users)
+Compile the script into a standalone executable using `PyInstaller`.
+* **Compile**:
+  ```bash
+  pip install pyinstaller
+  pyinstaller --onefile --name pyfix-agent pyfix_agent.py
+  ```
+* **Release Artifact**: Upload the compiled executable (`dist/pyfix-agent` or `dist/pyfix-agent.exe`) directly as a release asset in your GitHub Releases.
+* *Note: The target environment still needs a Python interpreter installed to execute target scripts via `sys.executable`.*

pyfix_agent-1.0.0/eval_dataset/AttributeError/patched_script1.py ADDED Viewed

@@ -0,0 +1,5 @@
+def buy_item(inventory, item):
+    inventory.append(item)
+inventory = [1, 2, 3]
+item = 4
+buy_item(inventory, item)

pyfix_agent-1.0.0/eval_dataset/AttributeError/patched_script2.py ADDED Viewed

@@ -0,0 +1,5 @@
+words_list = ['python', 'programming', 'is', 'fun']
+def capitalize_all_words(words_list):
+    return [word.capitalize() for word in words_list]
+print(capitalize_all_words(words_list))

pyfix_agent-1.0.0/eval_dataset/AttributeError/patched_script3.py ADDED Viewed

@@ -0,0 +1,6 @@
+arr = [4, 2, 1, 3]
+def decreasing_order_list(arr):
+    arr.sort(reverse=True)
+    return arr
+print(decreasing_order_list(arr))

pyfix_agent-1.0.0/eval_dataset/AttributeError/script1.py ADDED Viewed

@@ -0,0 +1,6 @@
+def buy_item(inventory, item):
+    inventory.appned(item)
+inventory = [1, 2, 3]
+item = 4
+buy_item(inventory, item)

pyfix_agent-1.0.0/eval_dataset/AttributeError/script2.py ADDED Viewed

@@ -0,0 +1,6 @@
+words_list = ["python", "programming", "is", "fun"]
+def capitalize_all_words(words_list):
+    return words_list.capitalize()
+print(capitalize_all_words(words_list))

pyfix_agent-1.0.0/eval_dataset/AttributeError/script3.py ADDED Viewed

@@ -0,0 +1,8 @@
+arr = [4,2,1,3]
+def decreasing_order_list(arr):
+    sorted_arr = arr.sort()
+    sorted_arr.reverse()
+    return sorted_arr
+print(decreasing_order_list(arr))

pyfix_agent-1.0.0/eval_dataset/AttributeError/test_script1.py ADDED Viewed

@@ -0,0 +1,9 @@
+from patched_script1 import buy_item
+inventory = [1,3,4]
+buy_item(inventory, 2)
+assert inventory == [1, 3, 4, 2]
+inventory = []
+buy_item(inventory, 5)
+assert inventory == [5]

pyfix_agent-1.0.0/eval_dataset/AttributeError/test_script2.py ADDED Viewed

@@ -0,0 +1,5 @@
+from patched_script2 import capitalize_all_words
+assert capitalize_all_words(["hello", "world"]) == ["Hello", "World"]
+assert capitalize_all_words([]) == []
+assert capitalize_all_words(["python"]) == ["Python"]

pyfix_agent-1.0.0/eval_dataset/AttributeError/test_script3.py ADDED Viewed

@@ -0,0 +1,5 @@
+from patched_script3 import decreasing_order_list
+assert decreasing_order_list([4,2,1,3]) == [4, 3, 2, 1]
+assert decreasing_order_list([]) == []
+assert decreasing_order_list([1]) == [1]

pyfix_agent-1.0.0/eval_dataset/IndexError/patched_script1.py ADDED Viewed

@@ -0,0 +1,7 @@
+def get_last_element(arr):
+    if len(arr) == 0:
+        return None
+    return arr[len(arr) - 1]
+arr = [1, 2, 3, 4, 5]
+print(get_last_element(arr))
+print(get_last_element([]))

pyfix_agent-1.0.0/eval_dataset/IndexError/patched_script2.py ADDED Viewed

@@ -0,0 +1,7 @@
+def sum_array_elements(arr):
+    total = 0
+    for i in range(len(arr)):
+        total += arr[i]
+    return total
+arr = [10, 20, 30, 40]
+print(sum_array_elements(arr))

pyfix_agent-1.0.0/eval_dataset/IndexError/patched_script3.py ADDED Viewed

@@ -0,0 +1,5 @@
+def get_tail(items):
+    if not items:
+        return None
+    return items[-1]
+print(get_tail([]))

pyfix_agent-1.0.0/eval_dataset/IndexError/script1.py ADDED Viewed

@@ -0,0 +1,5 @@
+def get_last_element(arr):
+    return arr[len(arr)]
+arr = [1,2,3,4,5]
+print(get_last_element(arr))
+print(get_last_element([]))

pyfix_agent-1.0.0/eval_dataset/IndexError/script2.py ADDED Viewed

@@ -0,0 +1,9 @@
+def sum_array_elements(arr):
+    total = 0
+    for i in range(len(arr) + 1):
+        total += arr[i]
+    return total
+arr = [10, 20, 30, 40]
+print(sum_array_elements(arr))

pyfix_agent-1.0.0/eval_dataset/IndexError/script3.py ADDED Viewed

@@ -0,0 +1,4 @@
+def get_tail(items):
+    return items[-1]
+print(get_tail([]))

pyfix_agent-1.0.0/eval_dataset/IndexError/test_script1.py ADDED Viewed

@@ -0,0 +1,5 @@
+from patched_script1 import get_last_element
+assert get_last_element([1, 2, 3]) == 3
+assert get_last_element([1]) == 1
+assert get_last_element([]) == None

pyfix_agent-1.0.0/eval_dataset/IndexError/test_script2.py ADDED Viewed

@@ -0,0 +1,5 @@
+from patched_script2 import sum_array_elements
+assert sum_array_elements([1, 2, 3, 4]) == 10
+assert sum_array_elements([]) == 0
+assert sum_array_elements([1]) == 1

pyfix_agent-1.0.0/eval_dataset/IndexError/test_script3.py ADDED Viewed

@@ -0,0 +1,5 @@
+from patched_script3 import get_tail
+assert get_tail([1, 2, 3]) == 3
+assert get_tail([]) == None
+assert get_tail([1]) == 1

pyfix_agent-1.0.0/eval_dataset/LogicBugs/patched_script1.py ADDED Viewed

@@ -0,0 +1,6 @@
+def factorial(n):
+    if n == 0:
+        return 1
+    return n * factorial(n - 1)
+print(factorial(5))

pyfix_agent-1.0.0/eval_dataset/LogicBugs/patched_script2.py ADDED Viewed

@@ -0,0 +1,4 @@
+def is_even(n):
+    return n % 2 == 0
+print(is_even(10))

pyfix_agent-1.0.0/eval_dataset/LogicBugs/patched_script3.py ADDED Viewed

@@ -0,0 +1,6 @@
+def find_max(a, b):
+    if a < b:
+        return b
+    return a
+print(find_max(10, 5))

pyfix_agent-1.0.0/eval_dataset/LogicBugs/script1.py ADDED Viewed

@@ -0,0 +1,6 @@
+def factorial(n):
+    if n == 0:
+        return 0
+    return n * factorial(n - 1)
+print(factorial(5))

pyfix_agent-1.0.0/eval_dataset/LogicBugs/script2.py ADDED Viewed

@@ -0,0 +1,4 @@
+def is_even(n):
+    return n % 2 == 1
+print(is_even(10))

pyfix_agent-1.0.0/eval_dataset/LogicBugs/script3.py ADDED Viewed

@@ -0,0 +1,6 @@
+def find_max(a, b):
+    if a < b:
+        return a
+    return b
+print(find_max(10, 5))

pyfix_agent-1.0.0/eval_dataset/LogicBugs/test_script1.py ADDED Viewed

@@ -0,0 +1,5 @@
+from patched_script1 import factorial
+assert factorial(5) == 120
+assert factorial(0) == 1
+assert factorial(1) == 1

pyfix_agent-1.0.0/eval_dataset/LogicBugs/test_script2.py ADDED Viewed

@@ -0,0 +1,4 @@
+from patched_script2 import is_even
+assert is_even(10) == True
+assert is_even(9) == False

pyfix_agent-1.0.0/eval_dataset/LogicBugs/test_script3.py ADDED Viewed

@@ -0,0 +1,5 @@
+from patched_script3 import find_max
+assert find_max(5, 9) == 9
+assert find_max(-10, -5) == -5
+assert find_max(7, 3) == 7

pyfix_agent-1.0.0/eval_dataset/NameError/patched_script1.py ADDED Viewed

@@ -0,0 +1,4 @@
+def calculate_circle_area(radius):
+    pi_value = 3.14159
+    return pi_value * radius ** 2
+print(calculate_circle_area(5))

pyfix_agent-1.0.0/eval_dataset/NameError/patched_script2.py ADDED Viewed

@@ -0,0 +1,4 @@
+def get_square_root(n):
+    import math
+    return math.sqrt(n)
+print(get_square_root(25))

pyfix_agent-1.0.0/eval_dataset/NameError/patched_script3.py ADDED Viewed

@@ -0,0 +1,7 @@
+n1 = 'David'
+n2 = 'Smith'
+def format_greeting(first_name, last_name):
+    full_name = first_name + ' ' + last_name
+    return f'Hello, {full_name}!'
+print(format_greeting(n1, n2))

pyfix_agent-1.0.0/eval_dataset/NameError/script1.py ADDED Viewed

@@ -0,0 +1,5 @@
+def calculate_circle_area(radius):
+    pi_value = 3.14159
+    return p_value * (radius ** 2)
+print(calculate_circle_area(5))

pyfix_agent-1.0.0/eval_dataset/NameError/script2.py ADDED Viewed

@@ -0,0 +1,4 @@
+def get_square_root(n):
+    return math.sqrt(n)
+print(get_square_root(25))

pyfix_agent-1.0.0/eval_dataset/NameError/script3.py ADDED Viewed

@@ -0,0 +1,8 @@
+n1 = "David"
+n2 = "Smith"
+def format_greeting(first_name, last_name):
+    full_name = first_name + " " + last_name
+    return f"Hello, {fullname}!"
+print(format_greeting(n1, n2))

pyfix_agent-1.0.0/eval_dataset/NameError/test_script1.py ADDED Viewed

@@ -0,0 +1,5 @@
+from patched_script1 import calculate_circle_area
+assert calculate_circle_area(5) == 3.14159 * 25
+assert calculate_circle_area(0) == 0
+assert calculate_circle_area(1) == 3.14159

pyfix_agent-1.0.0/eval_dataset/NameError/test_script2.py ADDED Viewed

@@ -0,0 +1,4 @@
+from patched_script2 import get_square_root
+assert get_square_root(4) == 2
+assert get_square_root(9) == 3

pyfix_agent-1.0.0/eval_dataset/NameError/test_script3.py ADDED Viewed

@@ -0,0 +1,4 @@
+from patched_script3 import format_greeting
+assert format_greeting("David", "Smith") == "Hello, David Smith!"
+assert format_greeting("John", "Doe") == "Hello, John Doe!"

pyfix_agent-1.0.0/eval_dataset/TypeError/patched_script1.py ADDED Viewed

@@ -0,0 +1,3 @@
+def message(age):
+    return 'You are ' + str(age) + ' years old.'
+print(message(10))

pyfix_agent-1.0.0/eval_dataset/TypeError/patched_script2.py ADDED Viewed

@@ -0,0 +1,6 @@
+def multiply_three_numbers(a, b, c):
+    return a * b * c
+def calculate(a):
+    return multiply_three_numbers(5, 10, a)
+print(calculate(1))

pyfix_agent-1.0.0/eval_dataset/TypeError/patched_script3.py ADDED Viewed

@@ -0,0 +1,3 @@
+def append_value(arr, val):
+    return arr + [val]
+print(append_value([1, 2, 3], 4))

pyfix_agent-1.0.0/eval_dataset/TypeError/script1.py ADDED Viewed

@@ -0,0 +1,5 @@
+def message(age):
+    return "You are " + age + " years old."
+print(message(10))

pyfix_agent-1.0.0/eval_dataset/TypeError/script2.py ADDED Viewed

@@ -0,0 +1,7 @@
+def multiply_three_numbers(a, b, c):
+    return a * b * c
+def calculate(a):
+    return multiply_three_numbers(5, 10)
+print(calculate(1))

pyfix_agent-1.0.0/eval_dataset/TypeError/script3.py ADDED Viewed

@@ -0,0 +1,4 @@
+def append_value(arr, val):
+    return arr + val
+print(append_value([1, 2, 3], 4))

pyfix_agent-1.0.0/eval_dataset/TypeError/test_script1.py ADDED Viewed

@@ -0,0 +1,4 @@
+from patched_script1 import message
+assert message(10) == "You are 10 years old."
+assert message(0) == "You are 0 years old."

pyfix_agent-1.0.0/eval_dataset/TypeError/test_script2.py ADDED Viewed

@@ -0,0 +1,5 @@
+from patched_script2 import calculate
+assert calculate(1) == 50
+assert calculate(2) == 100
+assert calculate(3) == 150

pyfix_agent-1.0.0/eval_dataset/TypeError/test_script3.py ADDED Viewed

@@ -0,0 +1,5 @@
+from patched_script3 import append_value
+assert append_value([1, 2, 3], 4) == [1, 2, 3, 4]
+assert append_value([], 1) == [1]
+assert append_value([1], 2) == [1, 2]

pyfix_agent-1.0.0/pyfix_agent.egg-info/PKG-INFO ADDED Viewed

@@ -0,0 +1,175 @@
+Metadata-Version: 2.4
+Name: pyfix-agent
+Version: 1.0.0
+Summary: An autonomous, multi-turn AI debugging agent built from scratch using AST surgery.
+Author: Jaswin Reddy
+License: MIT
+Classifier: Programming Language :: Python :: 3
+Classifier: License :: OSI Approved :: MIT License
+Classifier: Operating System :: OS Independent
+Classifier: Environment :: Console
+Classifier: Topic :: Software Development :: Debuggers
+Requires-Python: >=3.10
+Description-Content-Type: text/markdown
+Requires-Dist: huggingface_hub>=0.20.0
+# 🚀 PyFix Agent: Autonomous ReAct Debugging Loop
+An autonomous, multi-turn AI debugging agent built entirely from scratch in Python.
+Unlike standard wrappers that simply ask an LLM to "fix this code," **PyFix Agent** implements a custom **ReAct (Reasoning and Acting)** state machine and utilizes **Abstract Syntax Tree (AST)** manipulation to surgically patch Python files in real-time. It evaluates its own fixes by executing the code inside a sandboxed subprocess, iterating dynamically until the script passes or it reaches the maximum iteration limit.
+---
+## 🧠 Core Architecture
+This project deliberately avoids high-level agentic abstractions (such as LangChain or LlamaIndex) to build the core agentic loop from first principles.
+```mermaid
+graph TD
+    A[Start] --> B[Execute target script via Subprocess]
+    B --> C{Execution successful?}
+    C -- Yes --> D[Stop: Bug Fixed 🎉]
+    C -- No --> E[Extract Stack Trace & Error Message]
+    E --> F[Parse Stack Trace for last function name]
+    F --> G[Construct LLM Prompt with Context Memory]
+    G --> H[Query LLM for patch]
+    H --> I[Clean LLM markdown and parse AST]
+    I --> J{Function-level error?}
+    J -- Yes --> K[Use AST surgery to replace target function node]
+    J -- No --> L[Fallback: Replace entire file]
+    K --> M[Write patched script to disk]
+    L --> M
+    M --> N{Max iterations reached?}
+    N -- Yes --> O[Stop: Max iterations reached ❌]
+    N -- No --> B
+```
+### Key Architectural Pillars
+1. **Execution Engine**: Runs the target script via Python subprocesses, capturing standard outputs, standard errors, and stack traces with safety timeout thresholds.
+2. **Context Memory**: Maintains a chronological conversation history array, allowing the LLM to learn from its previously failed patching attempts without losing the original code context.
+3. **AST Surgery**: Parses the LLM's response and uses Python's native `ast.NodeTransformer` to swap out broken function nodes with the corrected logic, leaving the rest of the file entirely untouched.
+---
+## ⚖️ Design Choices & Trade-offs
+Building an autonomous agent requires balancing safety, context window limits, and real-world unpredictability.
+### 1. AST Function Surgery vs. Full File Overwrites
+* **The Problem**: Asking an LLM to rewrite an entire 1,000-line script to fix a single typo is slow, expensive, and risks the model "truncating" or getting lazy with existing, working code.
+* **The Solution**: The agent extracts the specific `function_name` from the traceback. It prompts the LLM only for the corrected function. The `PythonSurgery` class (inheriting from `ast.NodeTransformer`) then traverses the syntax tree, finds the broken `ast.FunctionDef`, and seamlessly swaps it with the new node.
+* **The Trade-off**: While this guarantees perfect preservation of unrelated code, it requires specialized routing logic for errors that occur at the top-level `<module>` scope, which bypass the AST function surgery and require full-file patching.
+### 2. Execution-Based Evaluation vs. Exact String Matching
+* **The Problem**: How do we benchmark if the agent successfully fixed a bug? Traditional exact string matching fails because the LLM might use different variable names (e.g., `x += 1` instead of `x = x + 1`), resulting in false negatives.
+* **The Solution**: The evaluation suite uses **Execution-Based Benchmarking**. The benchmark dynamically runs automated unit tests or validation scripts containing assertion statements against the patched files. If the patched script exits with code 0, it is marked as a success.
+---
+## 📊 Evaluation Benchmark
+The agent is evaluated against a curated dataset of scripts spanning 5 distinct error categories:
+* **NameError**: Undefined variables, scope issues, and missing imports.
+* **IndexError**: Off-by-one loop conditions and bounds checking.
+* **TypeError**: Data type mismatches and unsupported operations.
+* **AttributeError**: Typographical errors in object methods or calling methods on `NoneType`.
+* **Logic Bugs**: Silent errors that require execution-based assertions to detect.
+*(Currently evaluated against an automated function-level testing benchmark suite inside `eval_dataset/`)*
+---
+## 🛠️ Installation & Usage
+### Option 1: Standard Pip Installation (Recommended)
+To install PyFix Agent locally in editable mode (which registers the CLI tool globally):
+```bash
+# Clone the repository
+git clone https://github.com/yourusername/agent-debugging-loop.git
+cd agent-debugging-loop
+# Install package in editable mode
+pip install -e .
+```
+This registers the global CLI tool `pyfix-agent` which can be executed from anywhere.
+### Option 2: Run as a Python Script
+If you prefer to run it without installing the package:
+```bash
+pip install huggingface_hub
+python pyfix_agent.py --script <path_to_script>
+```
+### Configuration
+Export your Hugging Face Hub token to your environment variables to ensure secure API access:
+**Bash (Linux/macOS):**
+```bash
+export HF_TOKEN="your_huggingface_token_here"
+```
+**PowerShell (Windows):**
+```powershell
+$env:HF_TOKEN="your_huggingface_token_here"
+```
+---
+## 🚀 CLI Usage Guide
+Point the agent at any broken Python script. Use the `--verbose` flag to watch the ReAct state machine's internal thought process.
+```bash
+# Run with default Qwen model
+pyfix-agent --script my_broken_code.py --verbose --max_iter 5
+# Run using a specific Hugging Face model
+pyfix-agent --script my_broken_code.py --model "mistralai/Mixtral-8x7B-Instruct-v0.1" --max_iter 3
+```
+### CLI Command Options
+| Argument | Type | Default | Description |
+|---|---|---|---|
+| `--script` | `str` | *Required* | Path to the broken Python script to debug |
+| `--max_iter` | `int` | `5` | Maximum number of debugging iterations |
+| `--verbose` | `flag` | `False` | Enable logging of reasoning, tracebacks, and raw LLM responses |
+| `--model` | `str` | `Qwen/Qwen2.5-72B-Instruct:cheapest` | Model endpoint ID on Hugging Face Serverless API |
+### Running the Evaluation Benchmark
+To run the full evaluation suite against the benchmark:
+```bash
+python benchmark.py
+```
+---
+## 📦 Packaging & Release Recommendations
+For releasing version 1.0.0 of **PyFix Agent** as a CLI tool:
+### 1. Direct Python Package (Recommended for Python users)
+Distribute PyFix Agent as a Python package via PyPI.
+* **Build tool**: Use `build` (`pip install build`) to compile source distribution `.tar.gz` and wheel `.whl` files.
+* **Upload tool**: Use `twine` to publish the artifacts to PyPI.
+* **Install**: Users can install it directly with `pip install pyfix-agent` or run isolated using `pipx run pyfix-agent`.
+### 2. Standalone Binary Executable (Recommended for Non-Python users)
+Compile the script into a standalone executable using `PyInstaller`.
+* **Compile**:
+  ```bash
+  pip install pyinstaller
+  pyinstaller --onefile --name pyfix-agent pyfix_agent.py
+  ```
+* **Release Artifact**: Upload the compiled executable (`dist/pyfix-agent` or `dist/pyfix-agent.exe`) directly as a release asset in your GitHub Releases.
+* *Note: The target environment still needs a Python interpreter installed to execute target scripts via `sys.executable`.*

pyfix_agent-1.0.0/pyfix_agent.egg-info/SOURCES.txt ADDED Viewed

@@ -0,0 +1,53 @@
+README.md
+pyproject.toml
+eval_dataset/AttributeError/patched_script1.py
+eval_dataset/AttributeError/patched_script2.py
+eval_dataset/AttributeError/patched_script3.py
+eval_dataset/AttributeError/script1.py
+eval_dataset/AttributeError/script2.py
+eval_dataset/AttributeError/script3.py
+eval_dataset/AttributeError/test_script1.py
+eval_dataset/AttributeError/test_script2.py
+eval_dataset/AttributeError/test_script3.py
+eval_dataset/IndexError/patched_script1.py
+eval_dataset/IndexError/patched_script2.py
+eval_dataset/IndexError/patched_script3.py
+eval_dataset/IndexError/script1.py
+eval_dataset/IndexError/script2.py
+eval_dataset/IndexError/script3.py
+eval_dataset/IndexError/test_script1.py
+eval_dataset/IndexError/test_script2.py
+eval_dataset/IndexError/test_script3.py
+eval_dataset/LogicBugs/patched_script1.py
+eval_dataset/LogicBugs/patched_script2.py
+eval_dataset/LogicBugs/patched_script3.py
+eval_dataset/LogicBugs/script1.py
+eval_dataset/LogicBugs/script2.py
+eval_dataset/LogicBugs/script3.py
+eval_dataset/LogicBugs/test_script1.py
+eval_dataset/LogicBugs/test_script2.py
+eval_dataset/LogicBugs/test_script3.py
+eval_dataset/NameError/patched_script1.py
+eval_dataset/NameError/patched_script2.py
+eval_dataset/NameError/patched_script3.py
+eval_dataset/NameError/script1.py
+eval_dataset/NameError/script2.py
+eval_dataset/NameError/script3.py
+eval_dataset/NameError/test_script1.py
+eval_dataset/NameError/test_script2.py
+eval_dataset/NameError/test_script3.py
+eval_dataset/TypeError/patched_script1.py
+eval_dataset/TypeError/patched_script2.py
+eval_dataset/TypeError/patched_script3.py
+eval_dataset/TypeError/script1.py
+eval_dataset/TypeError/script2.py
+eval_dataset/TypeError/script3.py
+eval_dataset/TypeError/test_script1.py
+eval_dataset/TypeError/test_script2.py
+eval_dataset/TypeError/test_script3.py
+pyfix_agent.egg-info/PKG-INFO
+pyfix_agent.egg-info/SOURCES.txt
+pyfix_agent.egg-info/dependency_links.txt
+pyfix_agent.egg-info/entry_points.txt
+pyfix_agent.egg-info/requires.txt
+pyfix_agent.egg-info/top_level.txt

pyfix_agent-1.0.0/pyfix_agent.egg-info/dependency_links.txt ADDED Viewed

	@@ -0,0 +1 @@
1	+

pyfix_agent-1.0.0/pyfix_agent.egg-info/entry_points.txt ADDED Viewed

	@@ -0,0 +1,2 @@
1	+ [console_scripts]
2	+ pyfix-agent = pyfix_agent:main

pyfix_agent-1.0.0/pyfix_agent.egg-info/requires.txt ADDED Viewed

	@@ -0,0 +1 @@
1	+ huggingface_hub>=0.20.0

pyfix_agent-1.0.0/pyfix_agent.egg-info/top_level.txt ADDED Viewed

	@@ -0,0 +1 @@
1	+ eval_dataset

pyfix_agent-1.0.0/pyproject.toml ADDED Viewed

@@ -0,0 +1,27 @@
+[build-system]
+requires = ["setuptools>=61.0.0"]
+build-backend = "setuptools.build_meta"
+[project]
+name = "pyfix-agent"
+version = "1.0.0"
+description = "An autonomous, multi-turn AI debugging agent built from scratch using AST surgery."
+readme = "README.md"
+requires-python = ">=3.10"
+license = {text = "MIT"}
+authors = [
+    {name = "Jaswin Reddy"}
+]
+classifiers = [
+    "Programming Language :: Python :: 3",
+    "License :: OSI Approved :: MIT License",
+    "Operating System :: OS Independent",
+    "Environment :: Console",
+    "Topic :: Software Development :: Debuggers",
+]
+dependencies = [
+    "huggingface_hub>=0.20.0",
+]
+[project.scripts]
+pyfix-agent = "pyfix_agent:main"

pyfix_agent-1.0.0/setup.cfg ADDED Viewed

@@ -0,0 +1,4 @@
+[egg_info]
+tag_build =
+tag_date = 0