npm - open-agents-ai - Versions diffs - 0.15.3 → 0.15.5 - Mend

open-agents-ai 0.15.3 → 0.15.5

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (3) hide show

package/README.md CHANGED Viewed

@@ -1,8 +1,42 @@
+<p align="center">
+  <img src="https://img.shields.io/npm/v/open-agents-ai?color=7C3AED&style=flat-square" alt="npm version" />
+  <img src="https://img.shields.io/npm/dm/open-agents-ai?color=06B6D4&style=flat-square" alt="npm downloads" />
+  <img src="https://img.shields.io/badge/license-MIT-10B981?style=flat-square" alt="license" />
+  <img src="https://img.shields.io/badge/node-%3E%3D20-F59E0B?style=flat-square" alt="node version" />
+  <img src="https://img.shields.io/badge/models-open--weight-EC4899?style=flat-square" alt="open-weight models" />
+</p>
+<p align="center">
+  <code style="color:#5fafff">freedom of information</code> · <code style="color:#5fd7ff">freedom of patterns</code> · <code style="color:#5fffff">creating freely</code> · <code style="color:#5fffaf">open-weights</code><br>
+  <code style="color:#ffaf00">libertad de informacion</code> · <code style="color:#ff8700">crear libremente</code> · <code style="color:#d7afff">creer librement</code> · <code style="color:#d7d7ff">liberte d'expression</code><br>
+  <code style="color:#5fd75f">Freiheit der Muster</code> · <code style="color:#ff5f87">jiyuu ni souzou suru</code> · <code style="color:#8787ff">jayuroun changjak</code> · <code style="color:#5fafaf">svoboda tvorchestva</code><br>
+  <code style="color:#d7af5f">liberdade de criar</code> · <code style="color:#afaf87">creare liberamente</code> · <code style="color:#afff87">ozgurce yarat</code> · <code style="color:#87d7d7">skapa fritt</code><br>
+  <code style="color:#afd787">vrij creeren</code> · <code style="color:#d7d7af">tworz swobodnie</code> · <code style="color:#5fafff">dimiourgia elefthera</code> · <code style="color:#ff5f87">khuli soch</code><br>
+  <code style="color:#ffd787">hurriyat al-ibdaa</code> · <code style="color:#87ffaf">code is poetry</code> · <code style="color:#ff87d7">democratize AI</code> · <code style="color:#d7afff">imagine freely</code>
+</p>
+---
 # Open Agents
-**AI coding agent framework powered by open-weight models via Ollama.**
+**AI coding agent powered entirely by open-weight models via Ollama and OpenAI-compatible APIs.**
+An autonomous multi-turn tool-calling agent that reads your code, makes changes, runs tests, and fixes failures iteratively until the task is complete — running 100% locally on your hardware with open-weight models. No API keys required. No cloud dependencies. Your code never leaves your machine.
+## Features
-A multi-turn agentic tool-calling loop that iteratively reads code, makes changes, runs tests, and fixes failures until the task is complete — modeled after how Claude Code operates, but running entirely on local open-weight models.
+- **26 autonomous tools** — file I/O, shell, grep, web search/fetch, memory, sub-agents, background tasks, image/OCR, git, diagnostics
+- **Parallel tool execution** — read-only tools run concurrently via `Promise.allSettled` for faster feedback loops
+- **Sub-agent delegation** — spawn independent agents for parallel workstreams with `background=true`
+- **Auto-expanding context window** — detects your RAM/VRAM and creates an optimized model variant on first run
+- **Neural TTS voice feedback** — hear what the agent is doing via GLaDOS or Overwatch ONNX voices
+- **Mid-task steering** — type while the agent works to add context without interrupting
+- **Smart context compaction** — long conversations compressed preserving files, commands, errors, and decisions
+- **Persistent memory** — learned patterns stored in `.oa/memory/` across sessions
+- **Self-learning** — auto-fetches docs from the web when encountering unfamiliar APIs
+- **Multilingual TUI** — inspirational messages in 15+ languages on startup
+- **Seamless `/update`** — in-place update and reload without losing context
+- **Dynamic code rendering** — syntax-aware tool output with terminal-width cropping
 ## How It Works
@@ -16,7 +50,7 @@ Agent: [Turn 1] file_read(src/auth.ts)
        [Turn 5] task_complete(summary="Fixed null check — all tests pass")
 ```
-The agent has **18 tools** (including 3 AIWG SDLC tools and 4 advanced analysis tools) and uses them autonomously in a loop, reading errors, fixing code, and re-running validation until the task succeeds or the turn limit is reached.
+The agent uses tools autonomously in a loop — reading errors, fixing code, and re-running validation until the task succeeds or the turn limit is reached.
 ## Quick Start
@@ -26,11 +60,11 @@ The agent has **18 tools** (including 3 AIWG SDLC tools and 4 advanced analysis
 # Install globally — provides `open-agents` and `oa` commands
 npm i -g open-agents-ai
-# Run it — first launch auto-detects your system and pulls the best model
+# Run it — first launch auto-detects your system and configures the optimal model
 oa "fix the null check in auth.ts"
 ```
-On first run, the setup wizard detects your RAM/VRAM and recommends the optimal qwen3.5 variant.
+On first run, the setup wizard detects your RAM/VRAM and creates an expanded-context model variant automatically.
 ### Install from source
@@ -47,88 +81,88 @@ git clone https://github.com/robit-man/open-agents.git && cd open-agents
 # 4. Use it
 oa "add pagination to the users endpoint"
-open-agents "refactor the auth module into separate files"
 ```
-## Installation
+## Interactive TUI
-### Prerequisites
-- **Node.js** >= 20
-- **pnpm** (`npm install -g pnpm`)
-- **Ollama** ([ollama.com](https://ollama.com)) with a model that supports tool calling
-### Install System-Wide
+Launch without arguments to enter the interactive REPL with a rich terminal interface:
 ```bash
-# Install to ~/.local/bin (no sudo needed)
-./scripts/install.sh
+oa
+```
-# Install to /usr/local/bin
-sudo ./scripts/install.sh --global
+The TUI features:
+- **Animated multilingual phrase carousel** — creativity and open-source messages scrolling in 15+ languages
+- **Live metrics bar** — token in/out counts, context window usage with pastel-colored labels
+- **Rotating tips** — helpful hints cycling every 10 seconds
+- **Syntax-highlighted output** — tool results rendered with language-aware formatting
+- **Dynamic terminal cropping** — output adapts to terminal width on resize
-# Custom prefix
-./scripts/install.sh --prefix ~/bin
+### Slash Commands
-# Uninstall
-./scripts/install.sh --uninstall
-```
+| Command | Description |
+|---------|-------------|
+| `/help` | Show all available commands |
+| `/model <name>` | Switch to a different Ollama model |
+| `/endpoint <url>` | Connect to a remote vLLM or OpenAI-compatible API |
+| `/voice [model]` | Toggle TTS voice feedback (GLaDOS, Overwatch) |
+| `/stream` | Toggle streaming token display |
+| `/update` | Check for and install updates (seamless reload) |
+| `/config` | Show current configuration |
+| `/clear` | Clear the screen |
+| `/exit` | Quit |
-The installer will:
-1. Check Node.js and pnpm versions
-2. Install workspace dependencies
-3. Build all packages
-4. Create `open-agents` and `oa` symlinks
-5. Configure an optimized Ollama model (auto-detects RAM for context window sizing)
+### Mid-Task Steering
-### Manual Build
+While the agent is working (shown by the `+` prompt), type to add context:
-```bash
-pnpm install
-pnpm -r build
-pnpm -r test   # 911 tests across 77 files
+```
+> fix the auth bug
+  ⎿  Read: src/auth.ts
++ also check the session handling        ← typed while agent works
+  ↪ Context added: also check the session handling
+  ⎿  Search: session
+  ⎿  Edit: src/auth.ts
 ```
-## Tools
+Press `Ctrl+C` to abort the current task.
-The agent has access to 26 tools that it calls autonomously:
+## Tools (26)
 | Tool | Description |
 |------|-------------|
 | `file_read` | Read file contents with line numbers (supports offset/limit) |
 | `file_write` | Create or overwrite files |
-| `file_edit` | Precise string replacement in files (preferred over full rewrites) |
-| `shell` | Execute any shell command (tests, builds, git, etc.) |
-| `grep_search` | Search file contents with regex (uses ripgrep when available) |
+| `file_edit` | Precise string replacement in files |
+| `shell` | Execute any shell command |
+| `grep_search` | Search file contents with regex (ripgrep when available) |
 | `find_files` | Find files by glob pattern |
 | `list_directory` | List directory contents with types and sizes |
 | `web_search` | Search the web via DuckDuckGo |
-| `web_fetch` | Fetch and extract text from web pages (docs, MDN, w3schools) |
+| `web_fetch` | Fetch and extract text from web pages |
 | `memory_read` | Read from persistent memory store |
 | `memory_write` | Store patterns and solutions for future tasks |
-| `aiwg_setup` | Deploy AIWG SDLC framework in the project |
-| `aiwg_health` | Analyze project SDLC health and readiness |
-| `aiwg_workflow` | Execute AIWG commands and workflows |
 | `batch_edit` | Multiple precise edits across files in one call |
 | `codebase_map` | High-level project structure overview |
 | `diagnostic` | Run lint/typecheck/test/build validation pipeline |
 | `git_info` | Structured git status, log, diff, and branch info |
-| `background_run` | Run a shell command in the background (returns task ID) |
+| `background_run` | Run a shell command in the background |
 | `task_status` | Check status of background tasks |
 | `task_output` | Read output from a background task |
 | `task_stop` | Stop a running background task |
 | `sub_agent` | Delegate a sub-task to an independent agent |
-| `image_read` | Read image files (base64 + dimensions + OCR text) |
+| `image_read` | Read image files (base64 + dimensions + OCR) |
 | `screenshot` | Capture screen or window to file |
-| `ocr` | Extract text from images (supports region cropping/zoom) |
+| `ocr` | Extract text from images |
+| `aiwg_setup` | Deploy AIWG SDLC framework |
+| `aiwg_health` | Analyze project SDLC health and readiness |
+| `aiwg_workflow` | Execute AIWG commands and workflows |
 ### Parallel Execution & Sub-Agents
-The agent can run multiple operations in parallel:
+Read-only tools (`file_read`, `grep_search`, `find_files`, `list_directory`, `web_fetch`, `web_search`, `memory_read`) execute concurrently when called in the same turn. Mutating tools run sequentially to ensure safety.
 ```
-You: oa "run the test suite and lint checks in parallel, then fix any issues"
 Agent: [Turn 1] background_run(command="npm test")        → task-1
        [Turn 2] background_run(command="npm run lint")     → task-2
        [Turn 3] task_status()                              → task-1: running, task-2: completed
@@ -138,7 +172,7 @@ Agent: [Turn 1] background_run(command="npm test")        → task-1
        [Turn 7] task_complete(summary="Fixed lint, tests pass")
 ```
-Sub-agents can be delegated independent tasks:
+Sub-agents can run independent tasks in parallel:
 ```
 Agent: [Turn 1] sub_agent(task="refactor auth module", background=true)  → task-3
@@ -148,14 +182,7 @@ Agent: [Turn 1] sub_agent(task="refactor auth module", background=true)  → tas
 ### Image & Visual Context
-Drag-and-drop image files onto the terminal to provide visual context:
-```bash
-# Drop an image file path while agent is working → injected as context
-# Drop an image file path at idle prompt → agent describes and analyzes it
-```
-The agent can also take screenshots and extract text via OCR:
+Drag-and-drop image files onto the terminal to provide visual context. The agent can also take screenshots and extract text via OCR:
 ```
 Agent: [Turn 1] screenshot(region="active")     → captured window
@@ -163,96 +190,119 @@ Agent: [Turn 1] screenshot(region="active")     → captured window
        [Turn 3] image_read(path="mockup.png")    → base64 + OCR text
 ```
-### Mid-Task Steering
+## Auto-Expanding Context Window
+On startup (and when switching models with `/model`), Open Agents automatically:
+1. Detects available system RAM and GPU VRAM
+2. Checks if an expanded-context variant of your model exists
+3. Creates one via Ollama Modelfile if needed, with optimal `num_ctx`:
+| Available Memory | Context Window |
+|-----------------|---------------|
+| 200GB+ | 128K tokens |
+| 100GB+ | 64K tokens |
+| 50GB+ | 32K tokens |
+| 20GB+ | 16K tokens |
+| 8GB+ | 8K tokens |
+| < 8GB | 4K tokens |
+The expanded model is named `open-agents-{model}` and reused across sessions.
-While the agent is working (shown by the `+` prompt), you can type to add context:
+## Voice Feedback (TTS)
+Neural TTS voices speak what the agent is doing in real-time:
+```bash
+/voice              # Toggle voice on/off (default: GLaDOS)
+/voice glados       # GLaDOS voice
+/voice overwatch    # Overwatch voice
 ```
-> fix the auth bug
-  ⎿  📄 Read: src/auth.ts
-+ also check the session handling        ← typed while agent works
-  ↪ Context added: also check the session handling
-  ⎿  🔍 Search: session
-  ⎿  ✏️  Edit: src/auth.ts
+On first enable, auto-downloads the ONNX voice model (~50MB). For best quality:
+```bash
+# Ubuntu/Debian
+sudo apt install espeak-ng
+# macOS
+brew install espeak-ng
 ```
-Press `Ctrl+C` to abort the current task. Slash commands (`/model`, `/help`) work during active tasks.
+## Self-Learning & Error Recovery
-### Self-Learning
+**Self-learning**: When encountering an unfamiliar API, the agent automatically searches the web, fetches documentation, stores the pattern in persistent memory, and applies it.
-When the agent encounters an unfamiliar API or language feature, it automatically:
-1. Searches the web for documentation
-2. Fetches the relevant page (w3schools.com, MDN, official docs)
-3. Stores the learned pattern in persistent memory
-4. Applies the knowledge to the current task
+**Error recovery**: The agent follows an iterative fix loop — run validation, read errors, identify the exact file and line, fix with `file_edit`, re-run until passing.
-### Error Recovery
+## Configuration
-The agent follows an iterative fix loop:
-1. Run validation (tests/build/lint)
-2. Read the full error output
-3. Identify the exact file, line, and failure
-4. Fix with `file_edit`
-5. Re-run validation
-6. Repeat until passing
+Config priority: CLI flags > environment variables > `~/.open-agents/config.json` > defaults.
-### Dynamic System Prompt
+```bash
+# Set defaults
+open-agents config set model qwen3.5:122b
+open-agents config set backendUrl http://localhost:11434
+open-agents config set backendType ollama
-The agent's system prompt is dynamically enriched at task start with:
+# Environment variables
+export OPEN_AGENTS_MODEL=qwen3.5:122b
+export OPEN_AGENTS_BACKEND_URL=http://localhost:11434
+export OPEN_AGENTS_BACKEND_TYPE=ollama
+```
-| Source | Description |
-|--------|-------------|
-| **Project context files** | `.open-agents.md`, `AGENTS.md`, or `.open-agents/context.md` — loaded from project root and parent directories |
-| **Git state** | Current branch, working tree status, recent commits |
-| **Persistent memory** | Learned patterns from previous sessions (project-local and global) |
-| **Environment** | Working directory, Node version, OS, date |
+### Project Context Files
-Create a `.open-agents.md` file in your project root to give the agent project-specific instructions:
+Create `AGENTS.md`, `OA.md`, or `.open-agents.md` in your project root to give the agent project-specific instructions:
 ```markdown
 # Project Context
 - This is a TypeScript monorepo using pnpm workspaces
 - Run tests with: pnpm -r test
-- Build with: pnpm -r build
 - Always use file_edit over file_write for existing files
-- Database migrations are in src/db/migrations/
 ```
-Context files are merged from parent → child directories, so you can set global defaults at `~/.open-agents.md` and override per-project.
+Context files merge from parent to child directories — set global defaults at `~/.open-agents.md` and override per-project.
 ### `.oa/` Project Directory
-Each project gets a `.oa/` directory (similar to `.claude/` for Claude Code) that persists artifacts across sessions:
+Each project gets a `.oa/` directory that persists state across sessions:
 ```
 .oa/
 ├── config.json              # Per-project configuration overrides
+├── settings.json            # TUI settings (voice, streaming, etc.)
 ├── memory/                  # Persistent memory store
 │   └── {topic}.json         # Topic-based key-value memories
 ├── index/                   # Cached codebase index
 │   ├── repo-profile.json    # Repository metadata
-│   ├── file-summaries.json  # Per-file purpose, exports, domain, risk
+│   ├── file-summaries.json  # Per-file purpose, exports, domain
 │   ├── symbols.json         # Symbol table cache
 │   ├── graph.json           # Import/dependency graph
-│   └── meta.json            # Index metadata (timestamp, hash)
+│   └── meta.json            # Index metadata
 ├── context/                 # Auto-generated project context
 │   └── project-map.md       # Generated overview for system prompt
 └── history/                 # Session history
     └── {session-id}.json    # Per-session task log
 ```
-The agent auto-discovers `AGENTS.md`, `OA.md`, `CLAUDE.md`, and `README.md` from the project root and parent directories, injecting them into the system prompt for project-specific awareness.
+## Model Support
+**Primary target**: Qwen3.5-122B-A10B via Ollama (MoE architecture, runs on 48GB+ VRAM)
-### Smart Context Compaction
+Any model that supports tool calling via Ollama or an OpenAI-compatible API works:
-When conversations exceed the context window, the agent compacts older messages while preserving:
-- Files that were read and modified
-- Shell commands that were run and their outcomes
-- Errors that were encountered
-- Key decisions that were made
+```bash
+# Different Ollama model
+oa --model qwen2.5-coder:32b "fix the bug"
-This structured summary prevents the agent from repeating work or losing track of what's been done.
+# vLLM backend
+oa --backend vllm --backend-url http://localhost:8000/v1 "add tests"
+# Any OpenAI-compatible API
+oa --backend-url http://10.0.0.5:11434 "refactor auth"
+```
 ## Commands
@@ -282,104 +332,22 @@ This structured summary prevents the agent from repeating work or losing track o
 -V, --version              Show version
 ```
-### Voice Feedback (TTS)
-The agent can speak what it's doing using neural TTS voices. Enable it in the interactive REPL:
-```bash
-/voice              # Toggle voice on/off (default: GLaDOS)
-/voice glados       # Switch to GLaDOS voice
-/voice overwatch    # Switch to Overwatch voice
-```
-On first enable, the agent auto-downloads the ONNX voice model (~50MB) and installs `onnxruntime-node` in `~/.open-agents/voice/`. For best quality, install `espeak-ng`:
-```bash
-# Ubuntu/Debian
-sudo apt install espeak-ng
-# macOS
-brew install espeak-ng
-```
-When enabled, the agent speaks brief descriptions of each tool call ("Reading auth.ts", "Running tests", "Editing config.js") through your system speakers.
-### Configuration
-Config priority: CLI flags > environment variables > `~/.open-agents/config.json` > defaults.
-```bash
-# Set defaults
-open-agents config set model qwen3.5:122b
-open-agents config set backendUrl http://localhost:11434
-open-agents config set backendType ollama
-# Environment variables
-export OPEN_AGENTS_MODEL=qwen3.5:122b
-export OPEN_AGENTS_BACKEND_URL=http://localhost:11434
-export OPEN_AGENTS_BACKEND_TYPE=ollama
-```
-## Model Support
-**Primary target**: Qwen3.5-122B-A10B via Ollama (MoE, runs on 48GB+ VRAM)
-The `setup-model.sh` script auto-configures the context window based on available RAM:
-| RAM | Context Window |
-|-----|---------------|
-| 300GB+ | 128K tokens |
-| 128GB+ | 64K tokens |
-| 64GB+ | 32K tokens |
-| < 64GB | 16K tokens |
-### Other Models
-Any model that supports tool calling via Ollama or an OpenAI-compatible API works:
-```bash
-# Use a different Ollama model
-oa --model qwen2.5-coder:32b "fix the bug"
-# Use vLLM backend
-oa --backend vllm --backend-url http://localhost:8000/v1 "add tests"
-# Use any OpenAI-compatible API
-oa --backend-url http://10.0.0.5:11434 "refactor auth"
-```
 ## AIWG Integration
-Open Agents integrates with [AIWG](https://www.npmjs.com/package/aiwg) (AI Writing Guide) — a cognitive architecture for AI-augmented software development. When AIWG is installed, the agent gains SDLC superpowers:
+Open Agents integrates with [AIWG](https://www.npmjs.com/package/aiwg) (AI Writing Guide) — a cognitive architecture for AI-augmented software development:
 ```bash
-# Install AIWG globally
 npm i -g aiwg
-# The agent can now use AIWG tools automatically:
 oa "analyze this project's SDLC health and set up proper documentation"
-oa "create requirements and architecture docs for this codebase"
 ```
-### What AIWG Adds
 | Capability | Description |
 |-----------|-------------|
-| **Structured Memory** | `.aiwg/` directory persists project knowledge across sessions |
+| **Structured Memory** | `.aiwg/` directory persists project knowledge |
 | **SDLC Artifacts** | Requirements, architecture, test strategy, deployment docs |
-| **Health Analysis** | Score your project's SDLC maturity (testing, CI/CD, docs, etc.) |
+| **Health Analysis** | Score your project's SDLC maturity |
 | **85+ Agents** | Specialized AI personas (Test Engineer, Security Auditor, API Designer) |
-| **Traceability** | @-mention system links requirements → code → tests |
-### AIWG Tools
-The 3 AIWG tools are available when `aiwg` is installed globally:
-- **`aiwg_setup`** — Deploy an AIWG framework (`sdlc`, `marketing`, `forensics`, `research`)
-- **`aiwg_health`** — Analyze project SDLC readiness (works even without AIWG installed)
-- **`aiwg_workflow`** — Run any AIWG CLI command (`runtime-info`, `list`, `mcp info`)
-If AIWG is not installed, the tools return helpful install instructions. The `aiwg_health` tool provides native analysis without requiring AIWG.
+| **Traceability** | @-mention system links requirements to code to tests |
 ## Architecture
@@ -392,53 +360,49 @@ User task
     ↓
 System prompt + tools → LLM
     ↓
-LLM returns tool_calls → Execute tools → Feed results back → LLM
+LLM returns tool_calls → Execute tools (parallel/sequential) → Feed results → LLM
     ↓  (repeat until task_complete or max turns)
 Result: completed/incomplete, turns, tool calls, duration
 ```
-Key design decisions:
+Key design:
 - **Tool-first**: The model explores via tools rather than pre-stuffed context
-- **Iterative**: Tests, sees failures, fixes them — no need for perfect one-shot output
-- **Context compaction**: Long conversations are compressed, preserving only recent context
+- **Iterative**: Tests, sees failures, fixes them — no one-shot guessing
+- **Parallel-safe**: Read-only tools execute concurrently; mutating tools run sequentially
+- **Context compaction**: Long conversations compressed, preserving recent context
 - **Bounded**: Maximum turns, timeout, and output limits prevent runaway loops
-- **Observable**: Every tool call and result is emitted as a real-time event
+- **Observable**: Every tool call and result emitted as a real-time event
 ### Package Structure
 ```
 packages/
-  orchestrator/   - AgenticRunner, OllamaAgenticBackend, RALPH loop
-  execution/      - 11 tools (file, shell, grep, web, memory), validation pipeline
+  orchestrator/   - AgenticRunner, backend integration, parallel execution
+  execution/      - 26 tools (file, shell, grep, web, memory, image, AIWG)
   schemas/        - Zod schemas and TypeScript types
   backend-vllm/   - Ollama + vLLM backend clients (OpenAI-compatible)
   memory/         - SQLite-backed persistent memory stores
   indexer/        - Codebase scanning and symbol extraction
   retrieval/      - Multi-stage retrieval (lexical + semantic + graph)
   prompts/        - Prompt contracts for each agent role
-  cli/            - CLI entry point, commands, config, UI
+  cli/            - CLI entry point, TUI, status bar, carousel, config
 apps/
   api/            - Express API server
   worker/         - Background task processor
-eval/             - 8 evaluation tasks with agentic runner
-scripts/          - install.sh, setup-model.sh, bootstrap.sh
+eval/             - 17 evaluation tasks with agentic runner
+scripts/          - install.sh, setup-model.sh, build-publish.mjs
 ```
 ## Evaluation
-The framework includes 17 evaluation tasks that test the agent's ability to autonomously resolve coding problems:
+17 evaluation tasks test the agent's autonomous coding ability:
 ```bash
-# Run all 8 tasks with agentic tool-calling loop
-node eval/run-agentic.mjs
-# Single task
-node eval/run-agentic.mjs 04-add-test
-# Different model
-node eval/run-agentic.mjs --model qwen2.5-coder:32b
+node eval/run-agentic.mjs              # Run all tasks
+node eval/run-agentic.mjs 04-add-test  # Single task
+node eval/run-agentic.mjs --model qwen2.5-coder:32b  # Different model
 ```
 ### Results (Qwen3.5-122B)
@@ -455,49 +419,6 @@ TASK                 RESULT   TIME       TURNS    TOOLS
 08-multi-file        PASS     75.5s      8        13
 Pass rate: 100% (8/8)
-Total: 39 turns, 55 tool calls, ~10 minutes
-```
-### Task Descriptions
-| ID | Task | Difficulty |
-|----|------|-----------|
-| 01 | Fix typo in function name | Easy |
-| 02 | Add isPrime function | Easy |
-| 03 | Fix off-by-one bug | Easy |
-| 04 | Write comprehensive tests for untested functions | Medium |
-| 05 | Extract functions from long method (refactor) | Medium |
-| 06 | Fix TypeScript type errors | Medium |
-| 07 | Add REST API endpoint | Medium |
-| 08 | Add pagination across multiple files | Hard |
-| 09 | CSS named color lookup (148 colors, web search) | Medium |
-| 10 | HTTP status code lookup (32+ codes, web search) | Medium |
-| 11 | MIME type lookup (30+ types, web search) | Medium |
-| 12 | SDLC health analyzer (AIWG-style scoring) | Medium |
-| 13 | SDLC artifact generator (requirements, arch, tests) | Hard |
-| 14 | Batch refactor variable names across files | Medium |
-| 15 | Codebase overview generator from structure analysis | Medium |
-| 16 | Diagnostic fix loop (find and fix buggy code) | Medium |
-| 17 | Git repository analyzer | Medium |
-## Test Suite
-```
-Package          Tests
-─────────────────────────
-schemas          216
-backend-vllm     162
-execution        136
-indexer            94
-cli                72
-orchestrator       70
-retrieval          66
-memory             58
-prompts            34
-apps/api            1
-apps/worker         2
-─────────────────────────
-Total             911 passing
 ```
 ## Development
@@ -505,10 +426,16 @@ Total             911 passing
 ```bash
 pnpm install          # Install dependencies
 pnpm -r build         # Build all packages
-pnpm -r test          # Run all 911 tests
+pnpm -r test          # Run all tests
 pnpm -r dev           # Watch mode
 ```
+## Prerequisites
+- **Node.js** >= 20
+- **pnpm** (`npm install -g pnpm`)
+- **Ollama** ([ollama.com](https://ollama.com)) with a model that supports tool calling
 ## License
 MIT