npm - opencode-swarm - Versions diffs - 6.1.2 → 6.3.0 - Mend

opencode-swarm 6.1.2 → 6.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (10) hide show

package/README.md +310 -510
package/dist/config/evidence-schema.d.ts +94 -0
package/dist/config/schema.d.ts +53 -0
package/dist/index.js +1443 -55
package/dist/state.d.ts +2 -0
package/dist/tools/imports.d.ts +5 -0
package/dist/tools/index.d.ts +3 -0
package/dist/tools/lint.d.ts +34 -0
package/dist/tools/secretscan.d.ts +31 -0
package/package.json +1 -1

package/README.md CHANGED Viewed

@@ -1,193 +1,199 @@
 <p align="center">
-  <img src="https://img.shields.io/badge/version-6.1.2-blue" alt="Version">
+   <img src="https://img.shields.io/badge/version-6.3.0-blue" alt="Version">
   <img src="https://img.shields.io/badge/license-MIT-green" alt="License">
   <img src="https://img.shields.io/badge/opencode-plugin-purple" alt="OpenCode Plugin">
   <img src="https://img.shields.io/badge/agents-9-orange" alt="Agents">
-  <img src="https://img.shields.io/badge/tests-1280-brightgreen" alt="Tests">
+  <img src="https://img.shields.io/badge/tests-1391-brightgreen" alt="Tests">
 </p>
 <h1 align="center">🐝 OpenCode Swarm</h1>
 <p align="center">
-  <strong>The only multi-agent framework that actually works.</strong><br>
-  Structured phases. Persistent memory. One task at a time. QA on everything.
+  <strong>A structured multi-agent coding framework for OpenCode.</strong><br>
+  Nine specialized agents. Persistent memory. A QA gate on every task. Code that ships.
 </p>
 <p align="center">
-  <a href="#why-swarm">Why Swarm?</a> •
+  <a href="#the-problem">The Problem</a> •
   <a href="#how-it-works">How It Works</a> •
-  <a href="#installation">Installation</a> •
   <a href="#agents">Agents</a> •
-  <a href="#configuration">Configuration</a>
+  <a href="#persistent-memory">Memory</a> •
+  <a href="#guardrails">Guardrails</a> •
+  <a href="#comparison">Comparison</a> •
+  <a href="#installation">Installation</a> •
+  <a href="#roadmap">Roadmap</a>
 </p>
 ---
-## The Problem with Every Other Multi-Agent System
-```
-You: "Build me an authentication system"
-Other Frameworks:
-├── Agent 1 starts auth module...
-├── Agent 2 starts user model... (conflicts with Agent 1)
-├── Agent 3 starts database... (wrong schema)
-├── Agent 4 starts tests... (for code that doesn't exist yet)
-└── Result: Chaos. Conflicts. Context lost. Start over.
-OpenCode Swarm:
-├── Architect analyzes request
-├── Explorer scans codebase (+ gap analysis)
-├── @sme consulted on security domain
-├── Architect creates phased plan with acceptance criteria
-├── @critic reviews plan → APPROVED
-├── Phase 1: User model → Review → Tests (run + PASS) → ✓
-├── Phase 2: Auth logic → Review → Tests (run + PASS) → ✓
-├── Phase 3: Session management → Review → Tests (run + PASS) → ✓
-└── Result: Working code. Documented decisions. Resumable progress.
-```
----
-## Why Swarm?
+## The Problem
-<table>
-<tr>
-<td width="50%">
+Every multi-agent AI coding tool on the market has the same failure mode: they are vibes-driven. You describe a feature. Agents spawn. They race each other to write conflicting code, lose context after 20 messages, hit token limits mid-task, and produce something that sort-of-works until it doesn't. There's no plan. There's no memory. There's no gatekeeper. There's no test that was actually run.
-### ❌ Other Frameworks
+**oh-my-opencode** is a prompt collection. **get-shit-done** is a workflow macro. Neither is a framework with memory, QA enforcement, or the ability to resume a project a week later exactly where you left off.
-- Parallel chaos, hope it converges
-- Single model = correlated failures
-- No planning, just vibes
-- Context lost between sessions
-- QA as afterthought (if at all)
-- Entire codebase in one prompt
-- No way to resume projects
+OpenCode Swarm is built differently.
-</td>
-<td width="50%">
-### ✅ OpenCode Swarm
-- **Serial execution** - predictable, traceable
-- **Heterogeneous models** - different perspectives catch errors
-- **Phased planning** - documented tasks with acceptance criteria
-- **Persistent memory** - `.swarm/` files survive sessions
-- **Review per task** - correctness + security review before anything ships
-- **One task at a time** - focused, quality code
-- **Resumable projects** - pick up exactly where you left off
+```
+Every other framework:
+├── Agent 1 starts the auth module...
+├── Agent 2 starts the user model... (conflicts with Agent 1)
+├── Agent 3 writes tests... (for code that doesn't exist yet)
+├── Context window fills up and the whole thing drifts
+└── Result: chaos. Rework. Start over.
-</td>
-</tr>
-</table>
+OpenCode Swarm:
+├── Architect reads .swarm/plan.md → project already in progress, resumes Phase 2
+├── @explorer scans the codebase for current state
+├── @sme DOMAIN: security → consults on auth patterns, guidance cached
+├── Architect writes .swarm/plan.md: 3 phases, 9 tasks, acceptance criteria per task
+├── @critic reviews the plan → APPROVED
+├── @coder implements Task 2.2 (one task, full context, nothing else)
+├── diff tool → imports tool → lint fix → secretscan → @reviewer → @test_engineer
+├── All gates pass → plan.md updated → Task 2.2: [x]
+└── Result: working code, documented decisions, resumable project, evidence trail
+```
 ---
 ## How It Works
+### The Execution Pipeline
 ```
-┌─────────────────────────────────────────────────────────────────────────┐
-│  USER: "Add user authentication with JWT"                               │
-└─────────────────────────────────────────────────────────────────────────┘
+┌──────────────────────────────────────────────────────────────────────────┐
+│  Phase 0: Resume Check                                                   │
+│  .swarm/plan.md exists? Resume mid-task. New project? Continue.          │
+└──────────────────────────────────────────────────────────────────────────┘
                                     │
                                     ▼
-┌─────────────────────────────────────────────────────────────────────────┐
-│  PHASE 0: Check for .swarm/plan.md                                      │
-│           Exists? Resume. New? Continue.                                │
-└─────────────────────────────────────────────────────────────────────────┘
+┌──────────────────────────────────────────────────────────────────────────┐
+│  Phase 1: Clarify                                                        │
+│  Ask only what the Architect cannot infer. Then stop.                    │
+└──────────────────────────────────────────────────────────────────────────┘
                                     │
                                     ▼
-┌─────────────────────────────────────────────────────────────────────────┐
-│  PHASE 1: Clarify (if needed)                                           │
-│           "Do you need refresh tokens? What's the session duration?"    │
-└─────────────────────────────────────────────────────────────────────────┘
+┌──────────────────────────────────────────────────────────────────────────┐
+│  Phase 2: Discover                                                       │
+│  @explorer scans codebase → structure, languages, frameworks, key files  │
+└──────────────────────────────────────────────────────────────────────────┘
                                     │
                                     ▼
-┌─────────────────────────────────────────────────────────────────────────┐
-│  PHASE 2: Discover                                                      │
-│           @explorer scans codebase → structure, languages, patterns     │
-└─────────────────────────────────────────────────────────────────────────┘
+┌──────────────────────────────────────────────────────────────────────────┐
+│  Phase 3: SME Consult (serial, cached)                                   │
+│  @sme DOMAIN: security, @sme DOMAIN: api, ...                            │
+│  Guidance written to .swarm/context.md — never re-asked in future phases │
+└──────────────────────────────────────────────────────────────────────────┘
                                     │
                                     ▼
-┌─────────────────────────────────────────────────────────────────────────┐
-│  PHASE 3: Consult SMEs (serial, cached)                                 │
-│           @sme DOMAIN: security → auth best practices                   │
-│           @sme DOMAIN: api → JWT patterns, refresh flow                 │
-│           Guidance saved to .swarm/context.md                           │
-└─────────────────────────────────────────────────────────────────────────┘
+┌──────────────────────────────────────────────────────────────────────────┐
+│  Phase 4: Plan                                                           │
+│  Architect writes .swarm/plan.md                                         │
+│  Structured phases, tasks with SMALL/MEDIUM/LARGE sizing, acceptance     │
+│  criteria per task, explicit dependency graph                            │
+└──────────────────────────────────────────────────────────────────────────┘
                                     │
                                     ▼
-┌─────────────────────────────────────────────────────────────────────────┐
-│  PHASE 4: Plan                                                          │
-│           Creates .swarm/plan.md with phases, tasks, acceptance criteria│
-│                                                                         │
-│           Phase 1: Foundation [3 tasks]                                 │
-│           Phase 2: Core Auth [4 tasks]                                  │
-│           Phase 3: Session Management [3 tasks]                         │
-└─────────────────────────────────────────────────────────────────────────┘
+┌──────────────────────────────────────────────────────────────────────────┐
+│  Phase 4.5: Critic Gate                                                  │
+│  @critic reviews plan → APPROVED / NEEDS_REVISION / REJECTED             │
+│  Max 2 revision cycles. Escalates to user if unresolved.                 │
+└──────────────────────────────────────────────────────────────────────────┘
                                     │
                                     ▼
-┌─────────────────────────────────────────────────────────────────────────┐
-│  PHASE 4.5: Critic Gate                                                 │
-│             @critic reviews plan → APPROVED / NEEDS_REVISION / REJECTED│
-│             Max 2 revision cycles before escalating to user             │
-└─────────────────────────────────────────────────────────────────────────┘
+┌──────────────────────────────────────────────────────────────────────────┐
+│  Phase 5: Execute (per task)                                             │
+│                                                                          │
+│  [UI task?] → @designer scaffold first                                   │
+│                                                                          │
+│  @coder (one task, full context)                                         │
+│       ↓                                                                  │
+│  diff tool  →  imports tool  →  lint fix  →  lint check  →  secretscan  │
+│  (contract change detection)   (AST-based)  (auto-fix)   (entropy scan) │
+│       ↓                                                                  │
+│  @reviewer (correctness pass)                                            │
+│       ↓ APPROVED                                                         │
+│  @reviewer (security-only pass, if file matches security globs)          │
+│       ↓ APPROVED                                                         │
+│  @test_engineer (verification tests + coverage gate ≥70%)               │
+│       ↓ PASS                                                             │
+│  @test_engineer (adversarial tests — boundary violations, injections)    │
+│       ↓ PASS                                                             │
+│  plan.md → [x] Task complete                                             │
+│                                                                          │
+│  Any gate fails → back to @coder with structured rejection reason        │
+└──────────────────────────────────────────────────────────────────────────┘
                                     │
                                     ▼
-┌─────────────────────────────────────────────────────────────────────────┐
-│  PHASE 5: Execute (per task)                                            │
-│                                                                         │
-│   ┌─────────┐    ┌───────┐    ┌────────────┐    ┌──────────────┐       │
-│   │ @coder  │ →  │ diff  │ →  │ @reviewer  │ →  │    @test     │       │
-│   │ 1 task  │    │ tool  │    │ check all  │    │ write + run  │       │
-│   └─────────┘    └───────┘    └────────────┘    └──────────────┘       │
-│        │              │             │                   │               │
-│        │    Contract   │   If REJECTED:        If FAIL: fix            │
-│        │    changes?   │   retry from coder    + retest                │
-│        │       │       │             │                                  │
-│        │       ▼       │             ▼                                  │
-│        │  ┌─────────┐  │   ┌──────────────┐    ┌──────────────┐       │
-│        │  │@explorer│  │   │  @reviewer   │ →  │    @test     │       │
-│        │  │ impact  │  │   │ security-only│    │ adversarial  │       │
-│        │  │analysis │  │   │   (if match) │    │   (attacks)  │       │
-│        │  └─────────┘  │   └──────────────┘    └──────────────┘       │
-│        │               │                                               │
-│        └───────────────┘                                               │
-│                                                                         │
-│   Update plan.md: [x] Task complete (only after ALL gates pass)        │
-│   Next task...                                                          │
-└─────────────────────────────────────────────────────────────────────────┘
-                                    │
-                                    ▼
-┌─────────────────────────────────────────────────────────────────────────┐
-│  PHASE 6: Phase Complete                                                │
-│           Re-scan with @explorer                                        │
-│           Update context.md with learnings                              │
-│           Archive to .swarm/history/                                    │
-│           "Phase 1 complete. Ready for Phase 2?"                        │
-└─────────────────────────────────────────────────────────────────────────┘
+┌──────────────────────────────────────────────────────────────────────────┐
+│  Phase 6: Phase Complete                                                 │
+│  @explorer rescans. @docs updates documentation. Retrospective written.  │
+│  Learnings injected as [SWARM RETROSPECTIVE] into next phase.            │
+│  "Phase 1 complete (4 tasks, 0 rejections). Ready for Phase 2?"          │
+└──────────────────────────────────────────────────────────────────────────┘
 ```
+### Why Serial Execution Matters
+Multi-agent parallelism sounds fast. In practice, it is a race to produce conflicting, unreviewed code that requires a human to untangle. OpenCode Swarm runs one task at a time through a deterministic pipeline. Every task is reviewed. Every test is run. Every failure is documented and fed back to the coder with structured context. The tradeoff in raw speed is paid back in not redoing work.
 ---
-## Persistent Project Memory
+## Agents
+### 🎯 Orchestrator
+**`architect`** — The central coordinator. Owns the plan, delegates all work, enforces every QA gate, maintains project memory, and resumes projects across sessions. Every other agent works for the Architect.
+### 🔍 Discovery
+**`explorer`** — Fast codebase scanner. Identifies structure, languages, frameworks, key files, and import patterns. Runs before planning and after every phase completes.
+### 🧠 Domain Expert
+**`sme`** — Open-domain expert. The Architect specifies any domain per call: `security`, `python`, `rust`, `kubernetes`, `ios`, `ml`, `blockchain` — any domain the underlying model has knowledge of. No hardcoded list. Guidance is cached in `.swarm/context.md` so the same question is never asked twice.
+### 🎨 Design
+**`designer`** — UI/UX specification agent. Opt-in via config. Generates component scaffolds and design tokens before the coder touches UI tasks, eliminating the most common source of front-end rework.
+### 💻 Implementation
+**`coder`** — Implements exactly one task with full context. No multitasking. No context bleed from prior tasks. The coder receives: the task spec, acceptance criteria, SME guidance, and relevant context from `.swarm/context.md`. Nothing else.
+**`test_engineer`** — Generates tests, runs them, and returns structured `PASS/FAIL` verdicts with coverage percentages. Runs twice per task: once for verification, once for adversarial attack scenarios.
+### ✅ Quality Assurance
+**`reviewer`** — Dual-pass review. First pass: correctness, logic, maintainability. Second pass: security-only, scoped to OWASP Top 10 categories, triggered automatically when the modified files match security-sensitive path patterns. Both passes produce structured verdicts with specific rejection reasons.
+**`critic`** — Plan review gate. Reviews the Architect's plan *before implementation begins*. Checks for completeness, feasibility, scope creep, missing dependencies, and AI-slop hallucinations. Plans do not proceed without Critic approval.
+### 📝 Documentation
-Other frameworks lose everything when the session ends. Swarm doesn't.
+**`docs`** — Documentation synthesizer. Runs in Phase 6 with a diff of changed files. Updates READMEs, API documentation, and guides to reflect what was actually built, not what was planned.
+---
+## Persistent Memory
+Other frameworks lose everything when the session ends. Swarm stores project state on disk.
 ```
 .swarm/
-├── plan.md          # Your project roadmap (+ plan.json)
-├── context.md       # Everything a new Architect needs
-├── evidence/        # Per-task execution evidence
-│   ├── 1.1/         # Evidence for task 1.1
-│   └── 2.3/         # Evidence for task 2.3
+├── plan.md          # Living roadmap: phases, tasks, status, rejections, blockers
+├── plan.json        # Machine-readable plan for tooling
+├── context.md       # Institutional knowledge: decisions, SME guidance, patterns
+├── evidence/        # Per-task execution evidence bundles
+│   ├── 1.1/         # review verdict, test results, diff summary for task 1.1
+│   └── 2.3/
 └── history/
-    ├── phase-1.md   # What was done, what was learned
+    ├── phase-1.md   # What was built, what was learned, retrospective metrics
     └── phase-2.md
 ```
-### plan.md - Living Roadmap
+### plan.md — Living Roadmap
 ```markdown
 # Project: Auth System
 Current Phase: 2
@@ -200,260 +206,133 @@ Current Phase: 2
 ## Phase 2: Core Auth [IN PROGRESS]
 - [x] Task 2.1: Login endpoint [MEDIUM]
 - [ ] Task 2.2: JWT generation [MEDIUM] (depends: 2.1) ← CURRENT
-  - Acceptance: Returns valid JWT with user claims
-  - Attempt 1: REJECTED - Missing expiration
+  - Acceptance: Returns valid JWT with user claims, 15-minute expiry
+  - Attempt 1: REJECTED — missing expiration claim
 - [ ] Task 2.3: Token validation middleware [MEDIUM]
-- [BLOCKED] Task 2.4: Refresh tokens
-  - Reason: Waiting for decision on rotation strategy
+- [BLOCKED] Task 2.4: Refresh token rotation
+  - Reason: Awaiting decision on rotation strategy
 ```
-### context.md - Institutional Knowledge
+### context.md — Institutional Knowledge
 ```markdown
 # Project Context: Auth System
 ## Technical Decisions
-- Using bcrypt (cost 12) for password hashing
-- JWT expires in 15 minutes, refresh in 7 days
-- Storing refresh tokens in Redis
+- bcrypt cost factor: 12
+- JWT TTL: 15 minutes; refresh TTL: 7 days
+- Refresh token store: Redis with key prefix auth:refresh:
 ## SME Guidance Cache
-### Security (Phase 1)
-- Never log tokens or passwords
-- Use constant-time comparison for tokens
-- Implement rate limiting on login
+### security (Phase 1)
+- Never log tokens or passwords in any context
+- Use constant-time comparison for all token equality checks
+- Rate-limit login endpoint: 5 attempts / 15 minutes per IP
-### API (Phase 1)
-- Return 401 for invalid credentials (not 404)
-- Include token expiry in response body
+### api (Phase 1)
+- Return HTTP 401 for invalid credentials (not 404)
+- Include token expiry timestamp in response body
 ## Patterns Established
-- Error handling: Custom ApiError class with status codes
-- Validation: Zod schemas in /validators/
+- Error handling: custom ApiError class with HTTP status and error code
+- Validation: Zod schemas in /validators/, applied at request boundary
 ```
-**Start a new session tomorrow?** The Architect reads these files and picks up exactly where you left off.
+Start a new session tomorrow. The Architect reads these files and picks up exactly where you left off — no re-explaining, no rediscovery, no drift.
----
-## Heterogeneous Models = Better Code
-Most frameworks use one model for everything. Same blindspots everywhere.
+### Evidence Bundles
-Swarm lets you mix models strategically:
-```json
-{
-  "agents": {
-    "architect": { "model": "anthropic/claude-sonnet-4-5" },
-    "explorer": { "model": "google/gemini-2.0-flash" },
-    "coder": { "model": "anthropic/claude-sonnet-4-5" },
-    "sme": { "model": "google/gemini-2.0-flash" },
-    "reviewer": { "model": "openai/gpt-4o" },
-    "critic": { "model": "google/gemini-2.0-flash" },
-    "test_engineer": { "model": "google/gemini-2.0-flash" }
-  }
-}
-```
+Each completed task writes structured evidence to `.swarm/evidence/`:
-| Role | Optimized For | Why Different Models? |
-|------|---------------|----------------------|
-| Architect | Deep reasoning | Needs to plan complex work |
-| Explorer | Fast scanning | Speed over depth |
-| Coder | Implementation | Best coding model you have |
-| SME | Domain knowledge | Fast recall, not deep reasoning |
-| Reviewer | Finding flaws | **Different vendor catches different bugs** |
-| Critic | Plan review | Catches scope issues before any code is written |
-| Test Engineer | Test + run | Writes tests, runs them, reports PASS/FAIL |
+| Type | What It Captures |
+|------|-----------------|
+| `review` | Verdict (APPROVED/REJECTED), risk level, specific issues |
+| `test` | Pass/fail counts, coverage percentage, failure messages |
+| `diff` | Files changed, additions/deletions, contract change flags |
+| `approval` | Stakeholder sign-off with notes |
+| `retrospective` | Phase metrics: total tool calls, coder revisions, reviewer rejections, test failures, security findings, lessons learned |
-**If Claude writes code and GPT reviews it, GPT catches Claude's blindspots.** This is why real teams have code review.
+Retrospectives from completed phases are injected as `[SWARM RETROSPECTIVE]` hints at the start of subsequent phases. The framework learns from its own history within a project.
 ---
-## Multiple Swarms
-Run different model configurations simultaneously. Perfect for:
-- **Cloud vs Local**: Premium cloud models for critical work, local models for quick tasks
-- **Fast vs Quality**: Quick iterations with fast models, careful work with expensive ones
-- **Cost Tiers**: Cheap models for exploration, premium for implementation
+## Heterogeneous Models
-### Configuration
+Single-model frameworks have correlated failure modes. The same model that writes the bug reviews it and misses it. Swarm lets you route each agent to the model it is best suited for:
 ```json
 {
-  "swarms": {
-    "cloud": {
-      "name": "Cloud",
-      "agents": {
-        "architect": { "model": "anthropic/claude-sonnet-4-5" },
-        "coder": { "model": "anthropic/claude-sonnet-4-5" },
-        "sme": { "model": "google/gemini-2.0-flash" },
-        "reviewer": { "model": "openai/gpt-4o" }
-      }
-    },
-    "local": {
-      "name": "Local",
-      "agents": {
-        "architect": { "model": "ollama/qwen2.5:32b" },
-        "coder": { "model": "ollama/qwen2.5:32b" },
-        "sme": { "model": "ollama/qwen2.5:14b" },
-        "reviewer": { "model": "ollama/qwen2.5:14b" }
-      }
-    }
+  "agents": {
+    "architect": { "model": "anthropic/claude-opus-4-6" },
+    "coder": { "model": "minimax-coding-plan/MiniMax-M2.5" },
+    "explorer": { "model": "minimax-coding-plan/MiniMax-M2.1" },
+    "sme": { "model": "kimi-for-coding/k2p5" },
+    "critic": { "model": "zai-coding-plan/glm-5" },
+    "reviewer": { "model": "zai-coding-plan/glm-5" },
+    "test_engineer": { "model": "minimax-coding-plan/MiniMax-M2.5" },
+    "docs": { "model": "zai-coding-plan/glm-4.7-flash" },
+    "designer": { "model": "kimi-for-coding/k2p5" }
   }
 }
 ```
-### What Gets Created
-| Swarm | Agents |
-|-------|--------|
-| `cloud` (default) | `architect`, `explorer`, `coder`, `sme`, `reviewer`, `critic`, `test_engineer` |
-| `local` | `local_architect`, `local_explorer`, `local_coder`, `local_sme`, `local_reviewer`, `local_critic`, `local_test_engineer` |
-The first swarm (or one named "default") creates unprefixed agents. Additional swarms prefix all agent names.
-### Usage
-In OpenCode, you'll see multiple architects to choose from:
-- `architect` - Cloud swarm (default)
-- `local_architect` - Local swarm
-Each architect automatically delegates to its own swarm's agents.
+Reviewer uses a different model than Coder by design. Different training, different priors, different blind spots. This is the cheapest bug-catcher you will ever deploy.
 ---
-## Installation
+## Guardrails
-```bash
-# Install via CLI (recommended)
-bunx opencode-swarm install
-```
+Every subagent runs inside a circuit breaker that kills runaway behavior before it burns credits on a stuck loop.
-### Uninstall
+| Layer | Trigger | Action |
+|-------|---------|--------|
+| ⚠️ Soft Warning | 50% of any limit reached | Warning injected into agent stream |
+| 🛑 Hard Block | 100% of any limit reached | All further tool calls blocked |
-```bash
-# Remove from opencode.json
-bunx opencode-swarm uninstall
+| Signal | Default | Description |
+|--------|---------|-------------|
+| Tool calls | 200 | Per-invocation, not per-session |
+| Duration | 30 min | Wall-clock time per delegation |
+| Repetition | 10 | Same tool + args consecutively |
+| Consecutive errors | 5 | Sequential null/undefined outputs |
-# Remove from opencode.json + clean up config files
-bunx opencode-swarm uninstall --clean
-```
+Limits are enforced **per-invocation**. Each delegation to a subagent starts a fresh budget. A coder fixing a second task is not penalized for the first task's tool calls. The Architect is exempt from all limits by default.
----
+Per-agent profiles allow fine-grained overrides:
-## What's New
-### v6.1.2 — Guardrails Remediation
-- **Fail-safe config validation** — Config validation failures now disable guardrails as a safety precaution (previously Zod defaults could silently re-enable them).
-- **Architect exemption fix** — Architect/orchestrator sessions can no longer inherit 30-minute base limits during delegation race conditions.
-- **Explicit disable always wins** — `guardrails.enabled: false` in config is now always honored, even when the config was loaded from file.
-- **Internal map synchronization** — `startAgentSession()` now keeps `activeAgent` and `agentSessions` maps in sync for consistent state tracking.
-### v6.1.1 — Security Fix & Tech Debt
-- **Security hardening (`_loadedFromFile`)** — Fixed a critical vulnerability where an internal loader flag could be injected via JSON config to bypass guardrails. The flag is now purely internal and no longer part of the public schema.
-- **TOCTOU protection** — Added atomic-style content checks in the config loader to prevent race conditions during file reads.
-- **`retrieve_summary` tool** — Properly registered the retrieval tool, allowing agents to fetch full content from auto-summarized tool outputs.
-- **92 new tests** — 1280 total tests across 57+ files (up from 1188 in v6.0.0).
-### v6.1.0 — Docs & Design Agents
-- **`docs` agent** — Dedicated documentation synthesizer that automatically updates READMEs, API docs, and guides during Phase 6.
-- **`designer` agent** — UI/UX specification agent that generates component scaffolds before coding begins on UI-heavy tasks.
-- **Heterogeneous model defaults** — Updated default models for new agents to use optimized Gemini models for speed and cost.
-### v6.0.0 — Core QA & Security Gates
-- **Dual-pass security reviewer** — After the general reviewer APPROVES, the architect automatically triggers a second security-only review pass when the changed file matches security-sensitive paths (`auth`, `crypto`, `session`, `token`, `middleware`, `api`, `security`) or the coder's output contains security keywords. Configurable via `review_passes` config.
-- **Adversarial testing** — After verification tests PASS, the test engineer is re-delegated with adversarial-only framing: attack vectors, boundary violations, and injection attempts. Pure prompt engineering, no new infrastructure.
-- **Integration impact analysis** — After the coder completes, the `diff` tool detects contract changes (exported functions, interfaces, types). If found, the explorer runs impact analysis across dependents before review begins.
-- **`diff` tool** — New agent-accessible tool providing structured git diff with numstat parsing, contract change detection, configurable base ref (`HEAD`/staged/unstaged), path filtering, and 500-line truncation.
-- **87 new tests** — 1188 total tests across 53+ files (up from 1101 in v5.2.0).
-### v5.2.0 — Per-Invocation Guardrails
-- **Per-invocation budget isolation** — Guardrail limits (tool calls, duration, errors) now reset with each agent delegation. Second invocation of the same agent gets a fresh budget, preventing false circuit breaker trips in long-running projects.
-- **Architect protocol enforcement** — New mandatory QA gate rules: every coder task must go through reviewer approval + test_engineer verification before the next coder task. Protocol violations detected at runtime with warning injection.
-- **Invocation window observability** — Circuit breaker logs now include `invocationId` and `windowKey` for precise debugging of which specific agent invocation hit limits.
-- **67 new tests** — 1101 total tests across 48 files (up from 1034 in v5.1.x).
-### v5.0.0 — Verifiable Execution
-- **Canonical plan schema** — Machine-readable `plan.json` with Zod-validated `PlanSchema`/`TaskSchema`/`PhaseSchema`. Automatic migration from legacy `plan.md` format. Structured status tracking (`pending`, `in_progress`, `completed`, `blocked`).
-- **Evidence bundles** — Per-task execution evidence persisted to `.swarm/evidence/`. Five evidence types: `review`, `test`, `diff`, `approval`, `note`. Sanitized task IDs, atomic writes, configurable size limits. `/swarm evidence` to view, `/swarm archive` to manage retention.
-- **Per-agent guardrail profiles** — Override guardrail limits for individual agents via `guardrails.profiles`. `resolveGuardrailsConfig()` merges base + profile with per-agent specificity.
-- **Context injection budget** — `max_injection_tokens` config controls how much context is injected into system prompts. Priority-ordered: phase → task → decisions → agent context. Lower-priority items dropped when budget exhausted.
-- **Enhanced `/swarm agents`** — Agent count summary, `⚡ custom limits` indicator for profiled agents, guardrail profiles section.
-- **Packaging smoke tests** — CI-safe `dist/` validation (8 tests).
-- **151 new tests** — 1027 total tests across 44 files (up from 876 in v4.6.0).
-### v4.6.0 — Agent Guardrails
-- **Circuit breaker** — Two-layer protection against runaway agents. Soft warning at 50% of limits, hard block at 100%. Prevents infinite loops and runaway API costs.
-- **Detection signals** — Tool call count, wall-clock time, consecutive repetition, and consecutive error tracking per agent session.
-- **Configurable limits** — All thresholds tunable via `guardrails` config: `max_tool_calls`, `max_duration_minutes`, `max_repetitions`, `max_consecutive_errors`, `warning_threshold`.
-- **46 new tests** — 668 total tests across 30 files.
-### v4.5.0 — Tech Debt + New Commands
-- **Lint cleanup** — Replaced string concatenation with template literals, documented `as any` casts with biome-ignore comments.
-- **Code deduplication** — Extracted `stripSwarmPrefix()` utility to eliminate 3 duplicate prefix-stripping blocks.
-- **`/swarm diagnose`** — Health check for `.swarm/` files, plan structure, and plugin configuration.
-- **`/swarm export`** — Export plan.md and context.md as portable JSON.
-- **`/swarm reset --confirm`** — Clear swarm state files with safety confirmation.
-### v4.4.0 — DX & Quality
-- **CLI `uninstall` command** — Remove plugin with optional `--clean` flag.
-- **Custom error classes** — `SwarmError` hierarchy with actionable `guidance` messages.
-- **`/swarm history`** — View completed phases from plan.md.
-- **`/swarm config`** — View current resolved plugin configuration.
-### v4.3.2 — Security Hardening
-- **Path validation** — `validateSwarmPath()` prevents directory traversal in `.swarm/` file operations.
-- **Fetch hardening** — 10s timeout, 5MB limit, retry logic for gitingest tool.
-- **Config limits** — Deep merge depth limit (10), config file size limit (100KB).
-### v4.3.0 — Hooks & Agent Awareness
-- **Hooks pipeline** — `safeHook()` crash-safe wrapper, `composeHandlers()` for multi-handler composition.
-- **Context pruning** — Token budget tracking with 70%/90% threshold warnings.
-- **Slash commands** — `/swarm status`, `/swarm plan`, `/swarm agents`.
-- **Agent awareness** — Activity tracking, delegation tracking, cross-agent context injection.
-All features are opt-in via configuration. See [Installation Guide](docs/installation.md) for config options.
+```jsonc
+{
+  "guardrails": {
+    "max_tool_calls": 200,
+    "profiles": {
+      "coder":    { "max_tool_calls": 500, "max_duration_minutes": 60 },
+      "explorer": { "max_tool_calls": 50 }
+    }
+  }
+}
+```
 ---
-## Agents
-### 🎯 Orchestrator
-| Agent | Role |
-|-------|------|
-| `architect` | Central coordinator. Plans phases, delegates tasks, manages QA, maintains project memory. |
-### 🔍 Discovery
-| Agent | Role |
-|-------|------|
-| `explorer` | Fast codebase scanner. Identifies structure, languages, frameworks, key files. |
-### 🎨 Design
-| Agent | Role |
-|-------|------|
-| `designer` | UI/UX specification agent. Generates component scaffolds and design tokens before coding begins on UI-heavy tasks. |
-### 🧠 Domain Expert
-| Agent | Role |
-|-------|------|
-| `sme` | Open-domain expert. The architect specifies any domain (security, python, ios, rust, kubernetes, etc.) per call. No hardcoded list — works with any domain the LLM has knowledge of. |
-### 💻 Implementation
-| Agent | Role |
-|-------|------|
-| `coder` | Implements ONE task at a time with full context |
-| `test_engineer` | Generates tests, runs them, and reports structured PASS/FAIL verdicts |
-### ✅ Quality Assurance
-| Agent | Role |
-|-------|------|
-| `reviewer` | Dual-pass review: correctness review first, then automatic security-only pass for security-sensitive files. The architect specifies CHECK dimensions per call. OWASP Top 10 categories built in. |
-| `critic` | Plan review gate. Reviews the architect's plan BEFORE implementation — checks completeness, feasibility, scope, dependencies, and flags AI-slop. |
+## Comparison
-### 📝 Documentation
-| Agent | Role |
-|-------|------|
-| `docs` | Documentation synthesizer. Automatically updates READMEs, API docs, and guides based on implementation changes during Phase 6. |
+| Feature | OpenCode Swarm | oh-my-opencode | get-shit-done | AutoGen | CrewAI |
+|---------|:-:|:-:|:-:|:-:|:-:|
+| Multi-agent orchestration | ✅ 9 specialized agents | ❌ Prompt config only | ❌ Single-agent macros | ✅ | ✅ |
+| Execution model | Serial (deterministic) | N/A | N/A | Parallel (chaotic) | Parallel |
+| Phased planning with acceptance criteria | ✅ | ❌ | ❌ | ❌ | ❌ |
+| Critic gate before implementation | ✅ | ❌ | ❌ | ❌ | ❌ |
+| Per-task dual-pass review (correctness + security) | ✅ | ❌ | ❌ | Optional | Optional |
+| Adversarial test pass per task | ✅ | ❌ | ❌ | ❌ | ❌ |
+| Pre-reviewer pipeline (lint, secretscan, imports) | ✅ v6.3 | ❌ | ❌ | ❌ | ❌ |
+| Persistent session memory | ✅ `.swarm/` files | ❌ | ❌ | Session only | Session only |
+| Resume projects across sessions | ✅ Native | ❌ | ❌ | ❌ | ❌ |
+| Evidence trail per task | ✅ Structured bundles | ❌ | ❌ | ❌ | ❌ |
+| Heterogeneous model routing | ✅ Per-agent | ❌ | ❌ | Limited | Limited |
+| Circuit breaker / guardrails | ✅ Per-invocation | ❌ | ❌ | ❌ | ❌ |
+| Open-domain SME consultation | ✅ Any domain | ❌ | ❌ | ❌ | ❌ |
+| Retrospective learning across phases | ✅ | ❌ | ❌ | ❌ | ❌ |
+| Slash commands + diagnostics | ✅ 12 commands | ❌ | Limited | ❌ | ❌ |
 ---
@@ -461,220 +340,141 @@ All features are opt-in via configuration. See [Installation Guide](docs/install
 | Command | Description |
 |---------|-------------|
-| `/swarm status` | Current phase, task progress, and agent count |
-| `/swarm plan [N]` | View full plan or filter by phase number |
-| `/swarm agents` | List all registered agents with models and permissions |
-| `/swarm history` | View completed phases with status icons |
-| `/swarm config` | View current resolved plugin configuration |
-| `/swarm diagnose` | Health check for .swarm/ files and config |
+| `/swarm status` | Current phase, task progress, agent count |
+| `/swarm plan [N]` | Full plan or filtered by phase |
+| `/swarm agents` | All registered agents with models and permissions |
+| `/swarm history` | Completed phases with status |
+| `/swarm config` | Current resolved configuration |
+| `/swarm diagnose` | Health check for `.swarm/` files and config |
 | `/swarm export` | Export plan and context as portable JSON |
-| `/swarm reset --confirm` | Clear swarm state files (with safety gate) |
-| `/swarm evidence [task]` | View evidence bundles for a task or all tasks |
-| `/swarm archive [--dry-run]` | Archive old evidence bundles with retention policy |
-| `/swarm benchmark` | Run performance benchmarks and display metrics |
-| `/swarm retrieve [id]` | Retrieve auto-summarized tool outputs by ID |
+| `/swarm evidence [task]` | Evidence bundles for a task or all tasks |
+| `/swarm archive [--dry-run]` | Archive old evidence with retention policy |
+| `/swarm benchmark` | Performance benchmarks |
+| `/swarm retrieve [id]` | Retrieve auto-summarized tool outputs |
+| `/swarm reset --confirm` | Clear swarm state files |
 ---
 ## Configuration
-Create `~/.config/opencode/opencode-swarm.json`:
 ```json
 {
   "agents": {
-    "architect": { "model": "anthropic/claude-sonnet-4-5" },
-    "explorer": { "model": "google/gemini-2.0-flash" },
-    "coder": { "model": "anthropic/claude-sonnet-4-5" },
-    "sme": { "model": "google/gemini-2.0-flash" },
-    "reviewer": { "model": "openai/gpt-4o" },
-    "critic": { "model": "google/gemini-2.0-flash" },
-    "test_engineer": { "model": "google/gemini-2.0-flash" },
-    "docs": { "model": "google/gemini-2.0-flash" },
-    "designer": { "model": "google/gemini-2.0-flash" }
+    "architect": { "model": "anthropic/claude-opus-4-6" },
+    "coder": { "model": "minimax-coding-plan/MiniMax-M2.5" },
+    "explorer": { "model": "minimax-coding-plan/MiniMax-M2.1" },
+    "sme": { "model": "kimi-for-coding/k2p5" },
+    "critic": { "model": "zai-coding-plan/glm-5" },
+    "reviewer": { "model": "zai-coding-plan/glm-5" },
+    "test_engineer": { "model": "minimax-coding-plan/MiniMax-M2.5" },
+    "docs": { "model": "zai-coding-plan/glm-4.7-flash" },
+    "designer": { "model": "kimi-for-coding/k2p5" }
+  },
+  "guardrails": {
+    "max_tool_calls": 200,
+    "max_duration_minutes": 30,
+    "profiles": {
+      "coder": { "max_tool_calls": 500 }
+    }
+  },
+  "review_passes": {
+    "always_security_review": false,
+    "security_globs": ["**/*auth*", "**/*crypto*", "**/*session*", "**/*token*"]
   }
 }
 ```
-### Disable Agents
+Save to `~/.config/opencode/opencode-swarm.json` or `.opencode/swarm.json` in your project root. Project config merges over global config via deep merge — partial overrides do not clobber unspecified fields.
+### Disabling Agents
 ```json
 {
-  "sme": { "disabled": true },
+  "sme":          { "disabled": true },
+  "designer":     { "disabled": true },
   "test_engineer": { "disabled": true }
 }
 ```
 ---
-## Guardrails
-OpenCode Swarm includes a built-in circuit breaker that prevents subagents from running away — burning API credits in infinite loops, repeating the same tool call, or spinning for hours.
-### How It Works
-| Layer | Trigger | Action |
-|-------|---------|--------|
-| ⚠️ **Soft Warning** | 50% of any limit reached | Injects warning message into agent's chat stream |
-| 🛑 **Hard Block** | 100% of any limit reached | Blocks ALL further tool calls + injects stop message |
-### Detection Signals
-| Signal | Default Limit | Description |
-|--------|---------------|-------------|
-| Tool calls | 200 | Total tool invocations per agent session |
-| Duration | 30 min | Wall-clock time since delegation started |
-| Repetition | 10 | Same tool + args called consecutively |
-| Consecutive errors | 5 | Sequential null/undefined tool outputs |
+## Installation
-### Configuration
+```bash
+# Install globally
+npm install -g opencode-swarm
-Guardrails are **enabled by default**. Customize in your swarm config:
+# Or use npx
+npx opencode-swarm install
-```jsonc
-{
-  "guardrails": {
-    "enabled": true,              // default: true
-    "max_tool_calls": 200,        // range: 10–1000
-    "max_duration_minutes": 30,   // range: 1–120
-    "max_repetitions": 10,        // range: 3–50
-    "max_consecutive_errors": 5,  // range: 2–20
-    "warning_threshold": 0.5      // range: 0.1–0.9 (fraction of limit for soft warning)
-  }
-}
+# Verify
+opencode  # then: /swarm diagnose
 ```
-### Per-Agent Profiles
-Override limits for specific agents that need more (or less) room:
+The installer auto-configures `opencode.json` to include the plugin. Manual configuration:
-```jsonc
+```json
 {
-  "guardrails": {
-    "max_tool_calls": 200,
-    "profiles": {
-      "coder": { "max_tool_calls": 500, "max_duration_minutes": 60 },
-      "explorer": { "max_tool_calls": 50 }
-    }
-  }
+  "plugins": ["opencode-swarm"]
 }
 ```
-Profiles merge with base config — only specified fields are overridden.
+---
-### Review Passes
+## Testing
-Control the dual-pass security review behavior:
+2031 tests across 78 files. Unit, integration, adversarial, and smoke. Covers config schemas, all agent prompts, all hooks, all tools, all commands, guardrail circuit breaker, race conditions, invocation window isolation, multi-invocation state, security category classification, and evidence validation.
-```jsonc
-{
-  "review_passes": {
-    "always_security_review": false,  // default: false (only on security-sensitive files)
-    "security_globs": [               // default patterns:
-      "**/*auth*", "**/*crypto*",
-      "**/*session*", "**/*token*",
-      "**/*middleware*", "**/*api*",
-      "**/*security*"
-    ]
-  }
-}
+```bash
+bun test
 ```
-Set `always_security_review: true` to run the security pass on every task, regardless of file path.
+Zero additional test dependencies. Uses Bun's built-in test runner.
-### Integration Analysis
+---
-Control whether contract change detection triggers impact analysis:
+## Roadmap
-```jsonc
-{
-  "integration_analysis": {
-    "enabled": true  // default: true
-  }
-}
-```
+### v6.3 — Pre-Reviewer Pipeline
-> **Architect is exempt/unlimited by default:** The architect agent has no guardrail limits by default. To override, add a `profiles.architect` entry in your guardrails config.
+Three new tools complete the pre-reviewer gauntlet. Code reaching the Reviewer is already clean.
-### Per-Invocation Budgets
+- **`imports`** — AST-based import graph. For each file changed by the coder, returns every consumer file, which exports each consumer uses, and the line numbers. Replaces fragile grep-based integration analysis with deterministic graph traversal.
+- **`lint`** — Auto-detects project linter (Biome, ESLint, Ruff, Clippy, PSScriptAnalyzer). Runs in fix mode first, then check mode. Structured diagnostic output per file.
+- **`secretscan`** — Entropy-based credential scanner. Detects API keys, tokens, connection strings, and private key headers in the diff before they reach the reviewer. Zero external dependencies.
-Guardrail limits are enforced **per-invocation**, not per-session. Each time the architect delegates to an agent, that agent gets a fresh budget of tool calls, duration, and error tolerance.
+Phase 5 execute loop becomes: `coder → diff → imports → lint fix → lint check → secretscan → reviewer → security reviewer → test_engineer → adversarial test_engineer`.
-**Example**: If `max_tool_calls: 200`, then:
-- Architect → Coder (task 1) → 200 calls available
-- Coder finishes → Architect → Coder (task 2) → 200 calls available again
+### v6.4 — Execution and Planning Tools
-This prevents long-running projects from accumulating session-wide counters that incorrectly trip the circuit breaker on later tasks.
+- **`test_runner`** — Unified test execution across Bun, Vitest, Jest, Mocha, pytest, cargo test, and Pester. Auto-detects framework, returns normalized JSON with pass/fail/skip counts and coverage. Three scope modes: `all`, `convention` (naming-based), `graph` (import-graph-based). Eliminates the test_engineer's most common failure mode.
+- **`symbols`** — Export inventory for a module: functions, classes, interfaces, types, enums. Gives the Architect instant visibility into a file's public API surface without reading the full source.
+- **`checkpoint`** — Git-backed save points. Before any multi-file refactor (≥3 files), Architect auto-creates a checkpoint commit. On critical integration failure, restores via soft reset instead of iterating into a hole.
-> **Architect is unlimited**: The architect never creates invocation windows and has no guardrail limits by default.
+### v6.5 — Intelligence and Audit Tools
-### Disable Guardrails
+Five tools that improve planning quality and post-phase validation:
-```json
-{
-  "guardrails": {
-    "enabled": false
-  }
-}
-```
----
-## Comparison
-| Feature | OpenCode Swarm | AutoGen | CrewAI | LangGraph |
-|---------|---------------|---------|--------|-----------|
-| Execution | Serial (predictable) | Parallel (chaotic) | Parallel | Configurable |
-| Planning | Phased with acceptance criteria | Ad-hoc | Role-based | Graph-based |
-| Memory | Persistent `.swarm/` files | Session only | Session only | Checkpoints |
-| QA | Dual-pass per-task (review + security + adversarial) | Optional | Optional | Manual |
-| Model mixing | Per-agent configuration | Limited | Limited | Manual |
-| Resume projects | ✅ Native | ❌ | ❌ | Partial |
-| SME domains | Open-domain (any) | Generic | Generic | Generic |
-| Task granularity | One at a time | Batched | Batched | Varies |
+- **`pkg_audit`** — Wraps `npm audit`, `pip-audit`, `cargo audit`. Structured CVE output with severity, patched versions, and advisory URLs. Fed to the security reviewer for concrete vulnerability context.
+- **`complexity_hotspots`** — Git churn × cyclomatic complexity risk map. Run in Phase 0/2 to identify modules that need stricter QA gates before implementation begins.
+- **`schema_drift`** — Compares OpenAPI spec against actual route implementations. Surfaces undocumented routes and phantom spec paths. Run in Phase 6 when API routes were modified.
+- **`todo_extract`** — Structured extraction of `TODO`, `FIXME`, and `HACK` annotations across the codebase. High-priority items fed directly into plan task candidates.
+- **`evidence_check`** — Audits completed tasks against required evidence types. Run in Phase 6 to verify every task has review and test evidence before the phase is marked complete.
 ---
 ## Design Principles
-1. **Plan before code** - Documented phases with acceptance criteria
-2. **One task at a time** - Focused work, quality output
-3. **Review everything immediately** - Dual-pass review (correctness + security) with adversarial testing per task
-4. **Cache SME knowledge** - Don't re-ask answered questions
-5. **Persistent memory** - `.swarm/` files survive sessions
-6. **Serial execution** - Predictable, debuggable, no race conditions
-7. **Heterogeneous models** - Different perspectives catch different bugs
-8. **User checkpoints** - Confirm before proceeding to next phase
-9. **Failure tracking** - Document rejections, escalate after 5 attempts
-10. **Resumable by design** - Any Architect can pick up any project
----
-## Testing
-```bash
-# Run all tests
-bun test
-# Run specific test file
-bun test tests/unit/config/schema.test.ts
-```
-1280 tests across 57+ files covering config, tools, agents, hooks, commands, state, guardrails, evidence, plan schemas, circuit breaker race conditions, invocation windows, multi-invocation isolation, security categories, review/integration schemas, and diff tool. Uses Bun's built-in test runner — zero additional test dependencies.
-## Troubleshooting
-### Plugin not loading
-1. Verify `opencode-swarm` is listed in your `opencode.json` plugins array
-2. Run `bunx opencode-swarm install` to auto-configure
-3. Run `/swarm diagnose` to check health status
-### Commands not working
-- Ensure you're using `/swarm <command>`, not `/swarm/<command>`
-- Run `/swarm` with no arguments to see available commands
-### Resuming a project
-- Swarm automatically detects `.swarm/plan.md` and resumes where you left off
-- If you get unexpected behavior, run `/swarm export` to backup, then `/swarm reset --confirm` to start fresh
+1. **Plan before code** — Documented phases with acceptance criteria. The Critic approves the plan before a single line is written.
+2. **One task at a time** — The Coder gets one task and full context. Nothing else.
+3. **Review everything immediately** — Every task goes through correctness review, security review, verification tests, and adversarial tests. No task ships without passing all four.
+4. **Cache SME knowledge** — Guidance is written to `context.md`. The same domain question is never asked twice in a project.
+5. **Persistent memory** — `.swarm/` files are the ground truth. Any session, any model, any day.
+6. **Serial execution** — Predictable, debuggable, no race conditions, no conflicting writes.
+7. **Heterogeneous models** — Different models, different blind spots. The coder's bug is the reviewer's catch.
+8. **User checkpoints** — Phase transitions require user confirmation. No unsupervised multi-phase runs.
+9. **Document failures** — Rejections and retries are recorded in plan.md. After 5 failed attempts, the task escalates to the user.
+10. **Resumable by design** — A cold-start Architect can read `.swarm/` and continue any project as if it had been there from the beginning.
 ---
@@ -693,5 +493,5 @@ MIT
 ---
 <p align="center">
-  <strong>Stop hoping your agents figure it out. Start shipping code that works.</strong>
+  <strong>Stop hoping your agents figure it out. Start shipping code that actually works.</strong>
 </p>