npm - @neotx/agents - Versions diffs - 0.1.0-alpha.14 → 0.1.0-alpha.19 - Mend

@neotx/agents 0.1.0-alpha.14 → 0.1.0-alpha.19

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (15) hide show

package/GUIDE.md ADDED Viewed

@@ -0,0 +1,595 @@
+# Neo — AI Integration Guide
+You are reading the neo integration guide. This document explains how an AI agent can use neo to orchestrate autonomous developer agents across git repositories.
+neo is a framework that wraps the Claude Agent SDK with clone isolation, 3-level recovery, DAG workflows, concurrency control, budget guards, and approval gates. Agents work in isolated git clones — your main branch is never touched.
+---
+## Two ways to use neo
+### Mode A: Supervisor (recommended)
+**This is the recommended way to use neo.** The supervisor is a long-lived autonomous daemon that acts as your CTO. You send it messages in natural language and it handles everything — agent selection, dispatch ordering, review cycles, retries, and memory.
+The supervisor is NOT a chatbot. It's an event-driven heartbeat loop that:
+- Picks up your messages at the next heartbeat
+- Dispatches the right agents in the right order
+- Monitors progress and reacts to completions/failures
+- Persists memory across sessions — it learns your codebase over time
+- Handles the full lifecycle: refine → architect → develop → review → fix → done
+```bash
+# Start the supervisor (background daemon)
+neo supervise --detach
+# Send a task — the supervisor handles the rest
+neo supervise --message "Implement user authentication with JWT. Create login/register endpoints, middleware, and tests."
+# The supervisor autonomously:
+#   1. Analyzes your request
+#   2. Dispatches architect if design is needed
+#   3. Dispatches developer for each task
+#   4. Dispatches reviewer to review PRs
+#   5. Dispatches fixer if issues are found
+#   6. Reports back via activity log
+# Check supervisor status
+neo supervisor status
+# View what the supervisor is doing
+neo supervisor activity --limit 10
+# Send follow-up instructions
+neo supervise --message "Prioritize the auth middleware — we need it before the API routes"
+# Check costs
+neo cost --short
+```
+**Why supervisor mode?** You don't need to know which agent to use, how to chain them, or when to retry. The supervisor makes those decisions based on its experience and memory. It also handles edge cases (review cycles, CI failures, anti-loop guards) that are tedious to manage manually.
+### Mode B: Direct dispatch (advanced)
+For cases where you want full control over the workflow — you decide what to build, which agent to use, and when to follow up. Useful for one-off tasks or when you have a specific agent pipeline in mind.
+```bash
+# Dispatch a developer agent
+neo run developer --prompt "Add input validation to POST /api/users" \
+  --repo /path/to/project --branch feat/input-validation \
+  --meta '{"label":"input-validation","stage":"develop"}'
+# Check progress
+neo runs --short --status running
+# Read the result when done
+neo runs <runId>
+# Check costs
+neo cost --short
+```
+You handle the develop → review → fix cycle yourself. See "Typical Workflows" at the end for examples.
+---
+## Installation & Setup
+```bash
+# Prerequisites: Node.js >= 22, git >= 2.20, Claude Code CLI installed
+# Install neo globally
+npm install -g @neotx/cli
+# Verify installation
+neo doctor
+# Initialize in your project
+cd /path/to/your/project
+neo init
+# (Optional) Add MCP integrations
+neo mcp add github    # requires GITHUB_TOKEN env var
+neo mcp add linear    # requires LINEAR_API_KEY env var
+neo mcp add notion    # requires NOTION_TOKEN env var
+```
+---
+## Available Agents
+| Agent | Model | Mode | Use when |
+|-------|-------|------|----------|
+| `developer` | opus | writable | Implementing code changes, bug fixes, new features |
+| `architect` | opus | readonly | Designing systems, planning features, decomposing work |
+| `reviewer` | sonnet | readonly | Code review — blocks on ≥1 CRITICAL or ≥3 WARNINGs |
+| `fixer` | opus | writable | Fixing issues found by reviewer — targets root causes |
+| `refiner` | opus | readonly | Evaluating ticket quality, splitting vague tickets |
+**Custom agents:** Drop a YAML file in `.neo/agents/` to extend built-in agents:
+```yaml
+# .neo/agents/my-developer.yml
+name: my-developer
+extends: developer
+promptAppend: |
+  Always use our internal logger instead of console.log.
+  Follow the patterns in src/shared/conventions.ts.
+```
+List all agents: `neo agents`
+---
+## Complete Command Reference
+### neo run — Dispatch an agent
+```bash
+neo run <agent> --prompt "..." --repo <path> --branch <name> [flags]
+```
+| Flag | Type | Default | Description |
+|------|------|---------|-------------|
+| `--prompt` | string | required | Task description for the agent |
+| `--repo` | string | `.` | Target repository path |
+| `--branch` | string | required | Branch name for the isolated clone |
+| `--priority` | string | `medium` | `critical`, `high`, `medium`, `low` |
+| `--meta` | JSON string | — | Metadata: `{"label":"...","ticketId":"...","stage":"..."}` |
+| `--detach`, `-d` | boolean | `true` | Run in background, return immediately |
+| `--sync`, `-s` | boolean | `false` | Run in foreground (blocking) |
+| `--git-strategy` | string | `branch` | `branch` (push only) or `pr` (create PR) |
+| `--output` | string | — | `json` for machine-readable output |
+**Detached output:** returns `runId` and PID immediately. Use `neo logs -f <runId>` to follow.
+**Example with full metadata:**
+```bash
+neo run developer \
+  --prompt "Add rate limiting to POST /api/upload: max 10 req/min/IP, return 429 with Retry-After" \
+  --repo /path/to/api \
+  --branch feat/rate-limiting \
+  --priority high \
+  --meta '{"label":"T1-rate-limit","ticketId":"PROJ-42","stage":"develop"}' \
+  --git-strategy pr
+```
+### neo runs — Monitor runs
+```bash
+neo runs                          # List all runs for current repo
+neo runs <runId>                  # Full details + agent output (prefix match on ID)
+neo runs --short                  # Compact output (minimal tokens)
+neo runs --short --status running # Check active runs
+neo runs --last 5                 # Last N runs
+neo runs --status failed          # Filter by status: completed, failed, running
+neo runs --repo my-project        # Filter by repo
+neo runs --output json            # Machine-readable
+```
+**Important:** After an agent completes, ALWAYS read `neo runs <runId>` — it contains the agent's structured output (PR URLs, issues found, plans, milestones).
+### neo supervise — Manage the supervisor daemon
+```bash
+neo supervise                     # Start daemon + open live TUI
+neo supervise --detach            # Start daemon in background (no TUI)
+neo supervise --attach            # Open TUI for running daemon
+neo supervise --status            # Show supervisor status (PID, port, costs, heartbeats)
+neo supervise --kill              # Stop the running supervisor
+neo supervise --message "..."     # Send a message to the supervisor inbox
+neo supervise --name my-supervisor  # Use a named supervisor instance (default: "supervisor")
+```
+**Status output includes:** PID, port, session ID, started timestamp, heartbeat count, last heartbeat, cost today, cost total, status (running/idle/stopped).
+### neo supervisor — Query supervisor state
+```bash
+neo supervisor status             # Current status + recent activity (top 5)
+neo supervisor status --json      # Machine-readable status
+neo supervisor activity           # Activity log (last 50 entries)
+neo supervisor activity --type dispatch  # Filter: decision, action, error, event, message, plan, dispatch
+neo supervisor activity --since "2024-01-15T00:00:00Z" --until "2024-01-16T00:00:00Z"
+neo supervisor activity --limit 20
+neo supervisor activity --json    # Machine-readable
+```
+### neo logs — Event journal
+```bash
+neo logs                          # Last 20 events
+neo logs --last 50                # Last N events
+neo logs --type session:complete  # Filter: session:start, session:complete, session:fail, cost:update, budget:alert
+neo logs --run <runId>            # Events for a specific run
+neo logs --follow --run <runId>   # Live tail of a running agent's log
+neo logs --short                  # Ultra-compact (one-line per event)
+neo logs --output json            # Machine-readable
+```
+### neo log — Report to supervisor
+Agents use this to report progress. Reports appear in the supervisor's TUI and activity log.
+```bash
+neo log progress "3/5 endpoints done"
+neo log action "Pushed to branch feat/auth"
+neo log decision "Chose JWT over sessions — simpler for MVP"
+neo log blocker "Tests failing, missing dependency"     # Also wakes the supervisor via inbox
+neo log milestone "All tests passing, PR opened"
+neo log discovery "Repo uses Prisma + PostgreSQL"
+```
+Flags: `--memory` (force to memory store), `--knowledge` (force to knowledge), `--procedure` (write as procedure memory).
+### neo cost — Budget tracking
+```bash
+neo cost                          # Today's total + all-time + breakdown by agent and repo
+neo cost --short                  # One-liner: today=$X.XX sessions=N agent1=$X.XX
+neo cost --repo my-project        # Filter by repo
+neo cost --output json            # Machine-readable
+```
+### neo memory — Persistent memory store
+The supervisor maintains semantic memory using SQLite + FTS5 + optional vector embeddings.
+```bash
+# Write memory
+neo memory write --type fact --scope /path/to/repo "main branch uses protected merges"
+neo memory write --type procedure --scope /path/to/repo "After architect run: parse milestones, create tasks"
+neo memory write --type focus --expires 2h "Working on auth module — 3 tasks remaining"
+neo memory write --type task --scope /path/to/repo --severity high --category "neo runs abc123" "Implement login endpoint"
+neo memory write --type feedback --scope /path/to/repo "User wants PR descriptions in French"
+# Update
+neo memory update <id> "Updated content"
+neo memory update <id> --outcome done          # pending, in_progress, done, blocked, abandoned
+# Search & list
+neo memory search "authentication"              # Semantic search (uses embeddings if available)
+neo memory list                                 # All memories
+neo memory list --type fact                     # Filter by type: fact, procedure, episode, focus, feedback, task
+# Delete
+neo memory forget <id>
+# Statistics
+neo memory stats                                # Count by type and scope
+```
+**Memory types:**
+| Type | Use when | TTL |
+|------|----------|-----|
+| `fact` | Stable truth affecting dispatch decisions | Permanent (decays) |
+| `procedure` | Same failure 3+ times, reusable how-to | Permanent |
+| `focus` | Current working context (scratchpad) | `--expires` required |
+| `task` | Planned work items | Until done/abandoned |
+| `feedback` | Recurring review complaints | Permanent |
+| `episode` | Event log entries | Permanent (decays) |
+Additional flags: `--scope` (default: global), `--source` (developer/reviewer/supervisor/user), `--severity` (critical/high/medium/low), `--category` (context reference), `--tags` (comma-separated).
+### neo decision — Decision gates
+Decision gates allow the supervisor to pause and wait for user input on important choices. Scout agents create decisions when they find issues that require human judgment. Users answer decisions via CLI, and the supervisor routes based on the answer.
+```bash
+# Create a decision with options
+neo decision create "Should we refactor the auth module or patch the existing code?" \
+  --options "refactor:Full refactor:Clean solution but takes 2 days,patch:Quick patch:Fast but adds tech debt" \
+  --default-answer patch \
+  --expires-in 24h \
+  --context "Found 3 security issues in auth module"
+# List all pending decisions
+neo decision pending
+# List all decisions (including answered)
+neo decision list
+# Get details of a specific decision
+neo decision get dec_abc123
+# Answer a decision
+neo decision answer dec_abc123 refactor
+# JSON output for programmatic use
+neo decision list --json
+neo decision get dec_abc123 --json
+```
+| Flag | Type | Default | Description |
+|------|------|---------|-------------|
+| `--options`, `-o` | string | — | Options in format `key:label` or `key:label:description` (comma-separated) |
+| `--default-answer`, `-d` | string | — | Default answer key used if decision expires |
+| `--expires-in`, `-e` | duration | `24h` | Expiration duration (e.g., `30m`, `24h`, `7d`) |
+| `--type`, `-t` | string | `generic` | Decision type for categorization |
+| `--context`, `-c` | string | — | Additional context for the decision |
+| `--name` | string | `supervisor` | Supervisor name |
+| `--json` | boolean | `false` | Output as JSON |
+**Actions:**
+| Action | Description |
+|--------|-------------|
+| `create` | Create a new decision gate (VALUE = question text) |
+| `list` | List all decisions |
+| `pending` | List only unanswered decisions |
+| `get` | Get details of a decision (VALUE = decision ID) |
+| `answer` | Answer a decision (VALUE = decision ID, followed by answer key) |
+**Typical workflow:**
+1. **Scout finds issue** → Agent discovers something requiring human judgment
+2. **Creates decision** → `neo decision create "..." --options "..."` with clear options
+3. **User is notified** → Decision appears in `neo decision pending`
+4. **User answers** → `neo decision answer <id> <key>`
+5. **Supervisor routes** → Picks up the answer and dispatches appropriate follow-up
+**Example: Security issue triage**
+```bash
+# Scout agent creates a decision after finding vulnerabilities
+neo decision create "Found SQL injection in UserService. How should we proceed?" \
+  --options "block:Block release:Stop deployment until fixed,hotfix:Hotfix now:Emergency patch within 2h,schedule:Schedule fix:Add to next sprint" \
+  --default-answer block \
+  --expires-in 4h \
+  --type security \
+  --context "CVE-2024-1234 affects getUserById(). Risk: HIGH"
+# User checks pending decisions
+neo decision pending
+# Output:
+# ID            TYPE      QUESTION                                    EXPIRES
+# dec_x7k9m2    security  Found SQL injection in UserService...       3h 45m
+# User answers
+neo decision answer dec_x7k9m2 hotfix
+# Supervisor receives the answer and dispatches fixer agent with hotfix priority
+```
+### neo webhooks — Event notifications
+Neo can push events to external URLs when things happen (agent completes, budget alert, etc.).
+```bash
+neo webhooks                      # List all registered webhooks
+neo webhooks add https://example.com/neo-events   # Register a new endpoint
+neo webhooks remove https://example.com/neo-events # Deregister
+neo webhooks test                 # Test all endpoints (shows response codes + latency)
+neo webhooks --output json        # Machine-readable
+```
+**Events emitted:** `supervisor_started`, `heartbeat`, `run_dispatched`, `run_completed`, `supervisor_stopped`, `session:start`, `session:complete`, `session:fail`, `cost:update`, `budget:alert`.
+**Webhook payloads** are JSON. Optional HMAC signature verification via `X-Neo-Signature` header (configure `supervisor.secret` in config).
+**Receiving webhooks in your app:**
+```
+POST /webhook
+Content-Type: application/json
+X-Neo-Signature: sha256=<hmac>
+{
+  "event": "run_completed",
+  "source": "neo-supervisor",
+  "payload": {
+    "runId": "abc-123",
+    "status": "completed",
+    "costUsd": 1.24,
+    "durationMs": 45000
+  }
+}
+```
+### neo mcp — MCP server integrations
+MCP (Model Context Protocol) servers give agents access to external tools (Linear, GitHub, Notion, etc.).
+```bash
+neo mcp list                      # List configured MCP servers
+# Add a preset (auto-configured)
+neo mcp add linear                # Requires LINEAR_API_KEY env var
+neo mcp add github                # Requires GITHUB_TOKEN env var
+neo mcp add notion                # Requires NOTION_TOKEN env var
+neo mcp add jira                  # Requires JIRA_API_TOKEN + JIRA_URL env vars
+neo mcp add slack                 # Requires SLACK_BOT_TOKEN env var
+# Add a custom MCP server
+neo mcp add my-server --type stdio --command npx --serverArgs "@org/my-mcp-server"
+neo mcp add my-http-server --type http --url http://localhost:8080
+# Remove
+neo mcp remove linear
+```
+Once configured, MCP tools are available to the supervisor and agents during their sessions.
+### neo repos — Repository management
+```bash
+neo repos                         # List registered repositories
+neo repos add /path/to/repo --name my-project --branch main
+neo repos remove my-project       # By name or path
+```
+### neo agents — List agents
+```bash
+neo agents                        # Table: name, model, sandbox, source (builtin/custom)
+neo agents --output json          # Machine-readable
+```
+### neo doctor — Health check
+```bash
+neo doctor                        # Check all prerequisites
+neo doctor --fix                  # Auto-fix missing directories, stale sessions
+neo doctor --output json          # Machine-readable
+```
+---
+## Configuration Reference
+Neo stores global configuration in `~/.neo/config.yml`. Created automatically on `neo init`.
+```yaml
+repos:
+  - path: "/path/to/your/repo"
+    defaultBranch: main
+    branchPrefix: feat
+    pushRemote: origin
+    gitStrategy: branch           # "branch" or "pr"
+concurrency:
+  maxSessions: 5                  # Total concurrent agent sessions
+  maxPerRepo: 4                   # Max sessions per repository
+budget:
+  dailyCapUsd: 500                # Hard daily spending limit
+  alertThresholdPct: 80           # Emit budget:alert at this threshold
+recovery:
+  maxRetries: 3                   # Retry attempts per session
+  backoffBaseMs: 30000            # Base delay between retries
+sessions:
+  initTimeoutMs: 120000           # Timeout waiting for session init
+  maxDurationMs: 3600000          # Max session duration (1 hour)
+supervisor:
+  port: 7777                      # Webhook server port
+  dailyCapUsd: 50                 # Supervisor-specific daily cap
+  secret: ""                      # HMAC secret for webhook signature verification
+memory:
+  embeddings: true                # Enable local vector embeddings for semantic search
+```
+### Editing configuration
+The config file is plain YAML — edit directly:
+```bash
+# Open in editor
+nano ~/.neo/config.yml
+# Or use neo init to reset defaults
+neo init
+```
+### Per-project setup
+Each project has a `.neo/` directory (created by `neo init`):
+```
+.neo/
+├── agents/           # Custom agent YAML definitions
+│   └── my-dev.yml    # Extends built-in agents
+└── (created by init)
+```
+---
+## Programmatic API
+For deep integration, use `@neotx/core` directly:
+```typescript
+import { AgentRegistry, loadGlobalConfig, Orchestrator } from "@neotx/core";
+const config = await loadGlobalConfig();
+const orchestrator = new Orchestrator(config);
+// Load agents
+const registry = new AgentRegistry("path/to/agents");
+await registry.load();
+for (const agent of registry.list()) {
+  orchestrator.registerAgent(agent);
+}
+// Listen to events
+orchestrator.on("session:complete", (e) => console.log(`Done: $${e.costUsd}`));
+orchestrator.on("session:fail", (e) => console.log(`Failed: ${e.error}`));
+orchestrator.on("budget:alert", (e) => console.log(`Budget: ${e.utilizationPct}%`));
+// Dispatch
+await orchestrator.start();
+const result = await orchestrator.dispatch({
+  agent: "developer",
+  repo: "/path/to/repo",
+  prompt: "Add rate limiting to the API",
+  priority: "high",
+});
+console.log(result.status);  // "success" | "failure"
+console.log(result.costUsd); // 1.24
+await orchestrator.shutdown();
+```
+---
+## Typical Workflows
+### Feature implementation (supervisor — recommended)
+```bash
+# Just describe what you want — the supervisor orchestrates everything
+neo supervise --message "Implement JWT authentication: login/register endpoints, middleware, refresh tokens, and tests"
+# Monitor progress
+neo supervisor status
+neo supervisor activity --type dispatch
+neo runs --short --status running
+neo cost --short
+# Send follow-up context if needed
+neo supervise --message "The JWT secret should come from env var JWT_SECRET, not hardcoded"
+```
+The supervisor will autonomously: refine the task if vague → dispatch architect for design → dispatch developer for each sub-task → dispatch reviewer → dispatch fixer if issues → report completion.
+### Bug fix (supervisor)
+```bash
+neo supervise --message "Fix: POST /api/users returns 500 when email contains '+'. The Zod schema rejects it. High priority."
+```
+### Code review (supervisor)
+```bash
+neo supervise --message "Review PR #42 on branch feat/caching. Focus on cache invalidation strategy and memory leaks."
+```
+### Feature implementation (direct dispatch — advanced)
+```bash
+# 1. Design
+neo run architect --prompt "Design auth system with JWT" --repo . --branch feat/auth
+# 2. Read architect output, get task list
+neo runs <architectRunId>
+# 3. Implement each task
+neo run developer --prompt "Task 1: Create JWT middleware" --repo . --branch feat/auth \
+  --meta '{"label":"T1-jwt-middleware","stage":"develop"}'
+# 4. Review
+neo run reviewer --prompt "Review PR on branch feat/auth" --repo . --branch feat/auth
+# 5. Fix if needed
+neo run fixer --prompt "Fix issues: missing token expiry check" --repo . --branch feat/auth
+```
+### Bug fix (direct dispatch)
+```bash
+neo run developer --prompt "Fix: POST /api/users returns 500 when email contains '+'. The Zod schema rejects it." \
+  --repo . --branch fix/email-validation --priority high
+```

package/README.md CHANGED Viewed

@@ -1,8 +1,8 @@
 # @neotx/agents
-Built-in agent definitions and workflow templates for `@neotx/core`.
+Built-in agent definitions for `@neotx/core`.
-This package contains YAML configuration files and Markdown prompts that define the 9 built-in agents and 5 workflows used by the Neo orchestrator. It's a data package — no TypeScript, no runtime code.
+This package contains YAML configuration files and Markdown prompts that define the 5 built-in agents used by the Neo orchestrator. It's a data package — no TypeScript, no runtime code.
 ## Contents
@@ -13,19 +13,13 @@ packages/agents/
 │   ├── developer.yml
 │   ├── fixer.yml
 │   ├── refiner.yml
-│   ├── reviewer-coverage.yml
-│   ├── reviewer-perf.yml
-│   ├── reviewer-quality.yml
-│   ├── reviewer-security.yml
 │   └── reviewer.yml
-├── prompts/          # Markdown system prompts
-│   └── *.md
-└── workflows/        # Workflow YAML definitions
-    ├── feature.yml
-    ├── hotfix.yml
-    ├── refine.yml
-    ├── review-fast.yml
-    └── review.yml
+└── prompts/          # Markdown system prompts
+    ├── architect.md
+    ├── developer.md
+    ├── fixer.md
+    ├── refiner.md
+    └── reviewer.md
 ```
 ## Built-in Agents
@@ -36,11 +30,7 @@ packages/agents/
 | **developer** | opus | writable | Read, Write, Edit, Bash, Glob, Grep | Implementation worker. Executes atomic tasks from specs in isolated clones. |
 | **fixer** | opus | writable | Read, Write, Edit, Bash, Glob, Grep | Auto-correction agent. Fixes issues found by reviewers. Targets root causes, not symptoms. |
 | **refiner** | opus | readonly | Read, Glob, Grep, WebSearch, WebFetch | Ticket quality evaluator. Assesses clarity and splits vague tickets into precise sub-tickets. |
-| **reviewer-quality** | sonnet | readonly | Read, Glob, Grep, Bash | Code quality reviewer. Catches bugs and DRY violations. Approves by default. |
-| **reviewer-security** | opus | readonly | Read, Glob, Grep, Bash | Security auditor. Flags directly exploitable vulnerabilities. Approves by default. |
-| **reviewer-perf** | sonnet | readonly | Read, Glob, Grep, Bash | Performance reviewer. Flags N+1 queries and O(n²) on unbounded data. Approves by default. |
-| **reviewer-coverage** | sonnet | readonly | Read, Glob, Grep, Bash | Test coverage reviewer. Recommends missing tests. Never blocks merge. |
-| **reviewer** | sonnet | readonly | Read, Glob, Grep, Bash | Single-pass unified reviewer. Covers all 4 lenses in one sweep. Lightweight alternative to parallel review. |
+| **reviewer** | sonnet | readonly | Read, Glob, Grep, Bash | Single-pass unified reviewer. Covers quality, security, performance, and test coverage in one sweep. Challenges by default — blocks on critical issues. |
 ### Sandbox Modes
@@ -52,82 +42,6 @@ packages/agents/
 - **opus**: Used for complex reasoning (architecture, security, implementation)
 - **sonnet**: Used for focused review tasks (quality, performance, coverage)
-## Built-in Workflows
-### feature
-Full development cycle: plan, implement, review, and fix.
-```yaml
-steps:
-  plan:
-    agent: architect
-    sandbox: readonly
-  implement:
-    agent: developer
-    dependsOn: [plan]
-  review:
-    agent: reviewer-quality
-    dependsOn: [implement]
-    sandbox: readonly
-  fix:
-    agent: fixer
-    dependsOn: [review]
-    condition: "output(review).hasIssues == true"
-```
-### review
-Parallel 4-lens code review. All reviewers run concurrently.
-```yaml
-steps:
-  quality:
-    agent: reviewer-quality
-    sandbox: readonly
-  security:
-    agent: reviewer-security
-    sandbox: readonly
-  perf:
-    agent: reviewer-perf
-    sandbox: readonly
-  coverage:
-    agent: reviewer-coverage
-    sandbox: readonly
-```
-### review-fast
-Single-pass lightweight review. One agent covers all 4 lenses — ideal for small PRs or budget-constrained runs.
-```yaml
-steps:
-  review:
-    agent: reviewer
-    sandbox: readonly
-```
-### hotfix
-Fast-track single-agent implementation. Skips planning for urgent fixes.
-```yaml
-steps:
-  implement:
-    agent: developer
-```
-### refine
-Ticket evaluation and decomposition for backlog grooming.
-```yaml
-steps:
-  evaluate:
-    agent: refiner
-    sandbox: readonly
-```
 ## Creating Custom Agents
 Custom agents are defined in `.neo/agents/` in your project. You can create entirely new agents or extend built-in ones.
@@ -210,7 +124,7 @@ promptAppend: |
 Each agent has a corresponding Markdown prompt in `prompts/`. The prompt defines:
 - The agent's role and responsibilities
-- Workflow and execution protocol
+- Execution protocol
 - Output format expectations
 - Hard rules and constraints
 - Escalation conditions
@@ -257,8 +171,6 @@ The `@neotx/core` orchestrator:
 2. Loads all YAML files from `.neo/agents/` as custom agents
 3. Resolves extensions and merges configurations
 4. Reads and injects prompts into agent sessions
-5. Loads workflows from `packages/agents/workflows/` and `.neo/workflows/`
 Custom agents in `.neo/agents/` override or extend the built-ins from this package.
 ## License

package/SUPERVISOR.md CHANGED Viewed

@@ -11,6 +11,7 @@ This file contains domain-specific knowledge for the supervisor. Commands, heart
 | `fixer` | opus | writable | Fixing issues found by reviewer — targets root causes |
 | `refiner` | opus | readonly | Evaluating ticket quality, splitting vague tickets |
 | `reviewer` | sonnet | readonly | Thorough single-pass review: quality, standards, security, perf, and coverage. Challenges by default — blocks on ≥1 CRITICAL or ≥3 WARNINGs |
+| `scout` | opus | readonly | Autonomous codebase explorer. Deep-dives into a repo to surface bugs, improvements, security issues, and tech debt. Creates decisions for the user |
 ## Agent Output Contracts
@@ -49,6 +50,17 @@ React to:
 - `action: "decompose"` → create sub-tickets from `sub_tickets[]`, dispatch in order
 - `action: "escalate"` → mark ticket blocked, log questions
+### scout → `findings[]` + `decisions_created`
+React to:
+- Parse `findings[]` — each has `severity`, `category`, `suggestion`, and optional `decision_id`
+- CRITICAL findings with `decision_id` → wait for user decision before acting
+- HIGH findings with `decision_id` → wait for user decision before acting
+- User answers "yes" on a decision → route the finding as a ticket (dispatch `developer` or `architect` based on `effort`)
+- User answers "later" → backlog the finding
+- User answers "no" → discard
+- MEDIUM/LOW findings (no decisions created) → log for reference, no action needed
 ## Dispatch — `--meta` fields
 Use `--meta` for traceability and idempotency:
@@ -104,6 +116,12 @@ neo run architect --prompt "Design decomposition for multi-tenant auth system" \
   --repo /path/to/repo \
   --branch feat/PROJ-99-multi-tenant-auth \
   --meta '{"ticketId":"PROJ-99","stage":"refine"}'
+# scout
+neo run scout --prompt "Explore this repository and surface bugs, improvements, security issues, and tech debt. Create decisions for critical and high-impact findings." \
+  --repo /path/to/repo \
+  --branch main \
+  --meta '{"stage":"scout"}'
 ```
 ## Protocol
@@ -128,6 +146,7 @@ neo run architect --prompt "Design decomposition for multi-tenant auth system" \
 | Clear criteria + small scope (< 5 points) | Dispatch `developer` |
 | Complexity ≥ 5 | Dispatch `architect` first |
 | Unclear criteria or vague scope | Dispatch `refiner` |
+| Proactive exploration / no specific ticket | Dispatch `scout` on target repo |
 ### 3. On Refiner Completion
@@ -161,7 +180,18 @@ Parse fixer's JSON output:
 - `status: "FIXED"` → update tracker → in review, re-dispatch `reviewer`.
 - `status: "ESCALATED"` → update tracker → blocked.
-### 8. On Agent Failure
+### 8. On Scout Completion
+Parse scout's JSON output:
+- For each finding with `decision_id`: wait for user decision at future heartbeat.
+- User answers "yes" on a decision:
+  - `effort: "XS" | "S"` → dispatch `developer` with finding as ticket
+  - `effort: "M" | "L"` → dispatch `architect` for design first
+- User answers "later" → log to backlog, no dispatch
+- User answers "no" → discard finding, no action
+- Log `health_score` and `strengths` for project context.
+### 9. On Agent Failure
 Update tracker → abandoned. Log the failure reason.
@@ -201,6 +231,41 @@ Infer missing fields before routing:
 **Priority** (when unset): `medium`
+## Idle Behavior — Scout Dispatch
+When the supervisor has **no events, no active runs, and no pending tasks**, it enters idle mode.
+Instead of doing nothing, dispatch a `scout` agent to proactively explore a repository:
+1. **Check preconditions:**
+   - Budget remaining > 10% — do not scout if budget is tight
+   - No pending decisions from a previous scout — wait for user to answer before scouting again
+   - No active runs — scout only when truly idle
+2. **Pick a repo:**
+   - Choose the repo least recently scouted (check memory for previous `scout` runs)
+   - If no scout has ever run, pick the first configured repo
+   - Rotate across repos over time — do not scout the same repo twice in a row
+3. **Dispatch:**
+   ```bash
+   neo log decision "Idle — dispatching scout on <repo-name>"
+   neo run scout --prompt "Explore this repository. Surface bugs, improvements, security issues, and tech debt. Create decisions for critical and high-impact findings." \
+     --repo <path> \
+     --branch <default-branch> \
+     --meta '{"stage":"scout","label":"scout-<repo-name>"}'
+   ```
+4. **On scout completion** (see Protocol §8):
+   - Read the output with `neo runs <runId>`
+   - The scout has already created decisions via `neo decision create`
+   - Log the `health_score` and finding count as a fact
+   - Wait for user to answer decisions at future heartbeats
+5. **Frequency guard:**
+   - Max ONE scout per repo per 24h — do not re-scout a repo that was scouted today
+   - Write a fact after each scout: `neo memory write --type fact --scope <repo> "Last scouted: <date>, health: <score>/10, <N> findings"`
 ## Safety Guards
 ### Anti-Loop Guard

package/agents/scout.yml ADDED Viewed

@@ -0,0 +1,12 @@
+name: scout
+description: "Autonomous codebase explorer. Deep-dives into a repository to surface bugs, improvements, security issues, tech debt, and optimization opportunities. Produces actionable decisions for the supervisor."
+model: opus
+tools:
+  - Read
+  - Glob
+  - Grep
+  - Bash
+  - WebSearch
+  - WebFetch
+sandbox: readonly
+prompt: ../prompts/scout.md

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@neotx/agents",
-  "version": "0.1.0-alpha.14",
+  "version": "0.1.0-alpha.19",
   "description": "Built-in agent definitions and prompts for @neotx/core",
   "type": "module",
   "license": "MIT",
@@ -12,8 +12,8 @@
   "files": [
     "agents",
     "prompts",
-    "workflows",
-    "SUPERVISOR.md"
+    "SUPERVISOR.md",
+    "GUIDE.md"
   ],
   "keywords": [
     "ai-agents",

package/prompts/architect.md CHANGED Viewed

@@ -74,22 +74,6 @@ that depends on all implementation tasks.
 }
 ```
-## Memory & Reporting
-You receive a "Known context" section with facts and procedures from previous runs. These are retrieved via semantic search — the most relevant memories for your task are automatically selected.
-Write stable discoveries to memory so future agents benefit. Memories are embedded locally for semantic retrieval — write clear, descriptive content:
-```bash
-neo memory write --type fact --scope $NEO_REPOSITORY "Monorepo with 3 packages: core engine, CLI wrapper, agent definitions"
-neo memory write --type fact --scope $NEO_REPOSITORY "Event-driven architecture using typed EventEmitter, all modules emit events"
-```
-Report progress to the supervisor (chain with commands, never standalone):
-```bash
-neo log milestone "Architecture design complete with 3 milestones, 8 tasks"
-neo log decision "Chose event-driven over polling for webhook integration"
-```
 ## Escalation
 STOP and report when:

package/prompts/developer.md CHANGED Viewed

@@ -110,22 +110,6 @@ Output the PR URL on a dedicated line: `PR_URL: https://...`
 }
 ```
-## Memory & Reporting
-You receive a "Known context" section with facts and procedures from previous runs. These are retrieved via semantic search — the most relevant memories for your task are automatically selected.
-Write stable discoveries to memory so future agents benefit. Memories are embedded locally for semantic retrieval — write clear, descriptive content:
-```bash
-neo memory write --type fact --scope $NEO_REPOSITORY "Uses Prisma ORM with PostgreSQL for all database access"
-neo memory write --type procedure --scope $NEO_REPOSITORY "Run pnpm test:e2e for integration tests, requires DATABASE_URL"
-```
-Report progress to the supervisor (chain with commands, never standalone):
-```bash
-pnpm test && neo log milestone "All tests passing" || neo log blocker "Tests failing"
-git push origin HEAD && neo log action "Pushed to branch"
-```
 ## Escalation
 STOP and report when:

package/prompts/fixer.md CHANGED Viewed

@@ -91,22 +91,6 @@ You MUST push — the clone is destroyed after session ends.
 }
 ```
-## Memory & Reporting
-You receive a "Known context" section with facts and procedures from previous runs. These are retrieved via semantic search — the most relevant memories for your task are automatically selected.
-Write stable discoveries to memory so future agents benefit. Memories are embedded locally for semantic retrieval — write clear, descriptive content:
-```bash
-neo memory write --type fact --scope $NEO_REPOSITORY "Error handling uses custom AppError class in src/errors.ts"
-neo memory write --type procedure --scope $NEO_REPOSITORY "Integration tests require DATABASE_URL env var to be set"
-```
-Report progress to the supervisor (chain with commands, never standalone):
-```bash
-git push origin HEAD && neo log action "Pushed fixes to branch"
-pnpm test && neo log milestone "All tests passing" || neo log blocker "Tests still failing"
-```
 ## Limits
 | Limit             | Value | On exceed |

package/prompts/refiner.md CHANGED Viewed

@@ -102,22 +102,6 @@ Split into atomic sub-tickets. Each MUST have:
 }
 ```
-## Memory & Reporting
-You receive a "Known context" section with facts and procedures from previous runs. These are retrieved via semantic search — the most relevant memories for your task are automatically selected.
-Write stable discoveries to memory so future agents benefit. Memories are embedded locally for semantic retrieval — write clear, descriptive content:
-```bash
-neo memory write --type fact --scope $NEO_REPOSITORY "Uses Drizzle ORM with PostgreSQL for database access"
-neo memory write --type fact --scope $NEO_REPOSITORY "Feature modules follow src/modules/<name>/ directory pattern"
-```
-Report progress to the supervisor (chain with commands, never standalone):
-```bash
-neo log milestone "Ticket decomposed into 4 sub-tickets"
-neo log decision "Decomposing ticket — score 2, vague scope"
-```
 ## Decomposition Rules
 1. No file overlap between sub-tickets (unless dependency-ordered)

package/prompts/reviewer.md CHANGED Viewed

@@ -132,22 +132,6 @@ EOF
 Verdict: any CRITICAL → `CHANGES_REQUESTED`. ≥3 WARNINGs → `CHANGES_REQUESTED`. Otherwise → `APPROVED`.
-## Memory & Reporting
-You receive a "Known context" section with facts and procedures from previous runs. These are retrieved via semantic search — the most relevant memories for your task are automatically selected.
-Write stable discoveries to memory so future agents benefit. Memories are embedded locally for semantic retrieval — write clear, descriptive content:
-```bash
-neo memory write --type fact --scope $NEO_REPOSITORY "CI pipeline takes ~8 min, flaky test in auth.spec.ts"
-neo memory write --type fact --scope $NEO_REPOSITORY "All API endpoints require auth middleware in src/middleware/auth.ts"
-```
-Report progress to the supervisor (chain with commands, never standalone):
-```bash
-gh pr comment 73 --body "..." && neo log action "Posted review on PR #73"
-neo log milestone "Review complete: APPROVED"
-```
 ## Rules
 1. Read-only. Never modify files.

package/prompts/scout.md ADDED Viewed

@@ -0,0 +1,231 @@
+# Scout
+You are an autonomous codebase explorer. You deep-dive into a repository to
+surface bugs, improvements, security issues, tech debt, and optimization
+opportunities. Read-only — never modify files. You produce actionable findings
+that become decisions for the user.
+## Mindset
+- Think like an experienced engineer joining a new team — curious, thorough, opinionated.
+- Look for what matters, not what's easy to find. Prioritize impact over quantity.
+- Every finding must be actionable — if you can't suggest a fix, don't report it.
+- Be honest about severity. Don't inflate minor issues to seem thorough.
+## Budget
+- No limit on tool calls — explore as deeply as needed.
+- Max **20 findings** total across all categories (prioritize by impact).
+- Spend at least 60% of your effort reading code, not searching.
+## Protocol
+### 1. Orientation
+Get a high-level understanding of the project:
+- Read `package.json`, `tsconfig.json`, `CLAUDE.md`, `README.md` (if they exist)
+- Glob the top-level structure: `*`, `src/**` (2 levels deep max)
+- Identify: language, framework, test runner, build tool, dependencies
+- Read any existing lint/format config (biome.json, .eslintrc, etc.)
+### 2. Deep Exploration
+Systematically explore the codebase through these lenses:
+**Architecture & Structure**
+- Module boundaries — are they clean or tangled?
+- Dependency direction — do low-level modules depend on high-level ones?
+- File organization — does it follow a consistent pattern?
+- Dead code — unused exports, unreachable branches, orphan files
+**Code Quality**
+- Complex functions (>60 lines, deep nesting, high cyclomatic complexity)
+- DRY violations — similar logic repeated across files
+- Error handling — silent catches, missing error paths, inconsistent patterns
+- Type safety — `any` usage, missing types, unsafe assertions
+- Naming — misleading names, inconsistent conventions
+**Bugs & Correctness**
+- Race conditions, unhandled promise rejections
+- Off-by-one errors, null/undefined access without guards
+- Logic errors in conditionals or data transformations
+- Stale closures in React hooks
+- Missing cleanup (event listeners, intervals, subscriptions)
+**Security**
+- Injection vectors (SQL, command, path traversal)
+- Auth/authz gaps
+- Hardcoded secrets or credentials
+- Unsafe deserialization, prototype pollution
+- Missing input validation at system boundaries
+**Performance**
+- N+1 queries, unbounded iterations
+- Memory leaks in long-lived processes
+- Unnecessary re-renders, missing memoization on expensive computations
+- Large bundle imports that could be tree-shaken or lazy-loaded
+**Dependencies**
+- Outdated packages with known vulnerabilities
+- Unused dependencies in package.json
+- Duplicate dependencies serving the same purpose
+- Missing peer dependencies
+**Testing**
+- Untested critical paths (auth, payments, data mutations)
+- Test quality — do tests verify behavior or just call functions?
+- Missing edge case coverage
+- Flaky test patterns (timing, shared state, network calls)
+### 3. Synthesize
+Rank all findings by impact:
+- **CRITICAL**: Production risk — bugs, security holes, data loss potential
+- **HIGH**: Significant improvement — major tech debt, performance bottleneck
+- **MEDIUM**: Worthwhile — code quality, missing tests, minor debt
+- **LOW**: Nice-to-have — style improvements, minor optimizations
+### 4. Create Decisions
+For each CRITICAL or HIGH finding, create a decision gate using `neo decision create`.
+The supervisor and user will see these decisions and act on them.
+**Syntax:**
+```bash
+neo decision create "Short actionable question" \
+  --options "yes:Act on it,no:Skip,later:Backlog" \
+  --type approval \
+  --context "Detailed context: what the issue is, where it is, suggested fix, effort estimate" \
+  --expires-in 72h
+```
+**Rules for decisions:**
+- One decision per CRITICAL finding — these deserve individual attention
+- Group related HIGH findings into a single decision when they share a root cause or fix
+- The question must be actionable: "Fix N+1 query in user-list endpoint?" not "Performance issue found"
+- Include enough context so the user can decide without re-reading the code
+- Use `--context` to embed file paths, line numbers, and the suggested approach
+- Capture the returned decision ID (format: `dec_<uuid>`) for your output
+**Examples:**
+```bash
+# Critical security issue — standalone decision
+neo decision create "Fix SQL injection in search endpoint?" \
+  --options "yes:Fix now,no:Accept risk,later:Backlog" \
+  --type approval \
+  --context "src/api/search.ts:42 — user input interpolated directly into SQL query. Fix: use parameterized query. Effort: XS" \
+  --expires-in 72h
+# Group of related HIGH findings
+neo decision create "Refactor error handling to use consistent pattern?" \
+  --options "yes:Refactor,no:Skip,later:Backlog" \
+  --type approval \
+  --context "3 files use different error patterns: src/auth.ts:18 (silent catch), src/api.ts:55 (throws string), src/db.ts:92 (no catch). Fix: adopt AppError class. Effort: S" \
+  --expires-in 72h
+```
+### 5. Write Memory
+This is one of your most important responsibilities. You are the first agent to deeply explore this repo — everything you learn becomes institutional knowledge for every future agent that works here.
+Write memories **as you explore**, not just at the end. Every stable discovery that would change how an agent approaches work should be a memory.
+The test for a good memory: **would an agent fail, waste time, or produce wrong output without this knowledge?** If yes, write it. If it's just "nice to know", skip it.
+**What to memorize:**
+- Things that would make an agent's build/test/push **fail silently or unexpectedly**
+- Constraints that **aren't in docs or config** but are enforced by CI, hooks, or conventions
+- Patterns that **look wrong but are intentional** — so agents don't "fix" them
+- Workflows where **order matters** and getting it wrong breaks things
+**What NOT to memorize:**
+- Anything visible in `package.json`, `README.md`, or config files
+- General best practices the agent model already knows
+- File paths, directory structure, line counts
+- Things that are obvious from reading the code
+<examples type="good">
+```bash
+# Would cause a failed push without this knowledge
+neo memory write --type procedure --scope $NEO_REPOSITORY "pnpm build MUST pass locally before push — CI does not rebuild, it only runs the compiled output"
+# Would cause an agent to write broken code
+neo memory write --type fact --scope $NEO_REPOSITORY "All service methods throw AppError (src/errors.ts), never raw Error — controllers rely on AppError.statusCode for HTTP mapping"
+# Would cause a 30-minute debugging session
+neo memory write --type procedure --scope $NEO_REPOSITORY "After any Drizzle schema change: run pnpm db:generate then pnpm db:push in that order — generate alone won't update the DB"
+# Would cause an agent to miss required auth and ship a security hole
+neo memory write --type fact --scope $NEO_REPOSITORY "Every new API route MUST use authGuard AND tenantGuard — RLS alone is not sufficient, guards set the tenant context"
+# Would cause flaky test failures
+neo memory write --type fact --scope $NEO_REPOSITORY "E2E tests share a single DB — tests that mutate users must use unique emails or they collide in parallel runs"
+# Would cause an agent to break the deploy pipeline
+neo memory write --type fact --scope $NEO_REPOSITORY "env vars in .env.production are baked at build time (Next.js NEXT_PUBLIC_*) — changing them requires a rebuild, not just a restart"
+```
+</examples>
+<examples type="bad">
+```bash
+# Derivable from package.json — DO NOT WRITE
+# "Uses React 19 with TypeScript"
+# "Test runner is vitest"
+# Obvious from reading the code — DO NOT WRITE
+# "Components are in src/components/"
+# "API routes follow REST conventions"
+# Generic knowledge the model already has — DO NOT WRITE
+# "Use parameterized queries to prevent SQL injection"
+# "Always handle errors in async functions"
+```
+</examples>
+**Volume target:** aim for 3-8 high-impact memories per scout run. Every memory must pass the "would an agent fail without this?" test. Zero memories is fine if the repo is well-documented. 20 memories means you're not filtering hard enough.
+### 6. Report
+Log your exploration summary:
+```bash
+neo log milestone "Scout complete: X findings (Y critical, Z high), N memories written"
+```
+## Output
+```json
+{
+  "summary": "1-2 sentence overall assessment of the codebase",
+  "health_score": 1-10,
+  "findings": [
+    {
+      "id": "F-1",
+      "category": "bug | security | performance | quality | architecture | testing | dependency",
+      "severity": "CRITICAL | HIGH | MEDIUM | LOW",
+      "title": "Short descriptive title",
+      "description": "What the issue is and why it matters",
+      "files": ["src/path.ts:42", "src/other.ts:18"],
+      "suggestion": "Concrete fix or approach",
+      "effort": "XS | S | M | L",
+      "decision_id": "dec_xxx or null"
+    }
+  ],
+  "decisions_created": 3,
+  "memories_written": 8,
+  "strengths": [
+    "Things the codebase does well — acknowledge good patterns"
+  ]
+}
+```
+## Rules
+1. Read-only. Never modify files.
+2. Every finding has exact file paths and line numbers.
+3. Be specific — "code quality could be improved" is not a finding.
+4. Acknowledge strengths. A scout reports the full picture, not just problems.
+5. Create decisions only for CRITICAL and HIGH findings — don't flood the user.
+6. Group related findings into single decisions when they share a root cause.
+7. Max 20 findings. If you find more, keep only the highest-impact ones.

package/workflows/feature.yml DELETED Viewed

@@ -1,21 +0,0 @@
-name: feature
-description: "Plan, implement, and review a feature"
-steps:
-  plan:
-    agent: architect
-    sandbox: readonly
-  implement:
-    agent: developer
-    dependsOn: [plan]
-    prompt: |
-      Implement the following based on the architecture plan.
-      Original request: {{prompt}}
-  review:
-    agent: reviewer
-    dependsOn: [implement]
-    sandbox: readonly
-  fix:
-    agent: fixer
-    dependsOn: [review]
-    condition: "output(review).hasIssues == true"

package/workflows/hotfix.yml DELETED Viewed

@@ -1,5 +0,0 @@
-name: hotfix
-description: "Fast-track single-agent implementation"
-steps:
-  implement:
-    agent: developer

package/workflows/refine.yml DELETED Viewed

@@ -1,6 +0,0 @@
-name: refine
-description: "Evaluate and decompose tickets"
-steps:
-  evaluate:
-    agent: refiner
-    sandbox: readonly

package/workflows/review.yml DELETED Viewed

@@ -1,6 +0,0 @@
-name: review
-description: "Single-pass code review covering quality, security, performance, and test coverage"
-steps:
-  review:
-    agent: reviewer
-    sandbox: readonly