opencode-swarm 6.11.0 β†’ 6.13.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,204 +1,225 @@
1
- <p align="center">
2
- <img src="https://img.shields.io/badge/version-6.11.0-blue" alt="Version">
3
- <img src="https://img.shields.io/badge/license-MIT-green" alt="License">
4
- <img src="https://img.shields.io/badge/opencode-plugin-purple" alt="OpenCode Plugin">
5
- <img src="https://img.shields.io/badge/agents-9-orange" alt="Agents">
6
- <img src="https://img.shields.io/badge/tests-6000+-brightgreen" alt="Tests">
7
- </p>
8
-
9
- <h1 align="center">🐝 OpenCode Swarm</h1>
10
-
11
- <p align="center">
12
- <strong>A structured multi-agent coding framework for OpenCode.</strong><br>
13
- Nine specialized agents. Persistent memory. A QA gate on every task. Code that ships.
14
- </p>
15
-
16
- <p align="center">
17
- <a href="#the-problem">The Problem</a> β€’
18
- <a href="#how-it-works">How It Works</a> β€’
19
- <a href="#agents">Agents</a> β€’
20
- <a href="#persistent-memory">Memory</a> β€’
21
- <a href="#guardrails">Guardrails</a> β€’
22
- <a href="#comparison">Comparison</a> β€’
23
- <a href="#installation">Installation</a> β€’
24
- <a href="#roadmap">Roadmap</a>
25
- </p>
1
+ # 🐝 OpenCode Swarm
2
+
3
+ **Your AI writes the code. Swarm makes sure it actually works.**
4
+
5
+ OpenCode Swarm is a plugin for [OpenCode](https://opencode.ai) that turns a single AI coding agent into a team of nine. One agent writes the code. A different agent reviews it. Another writes and runs tests. Another catches security issues. Nothing ships until every check passes. Your project state is saved to disk, so you can close your laptop, come back tomorrow, and pick up exactly where you left off.
6
+
7
+ ```bash
8
+ npm install -g opencode-swarm
9
+ ```
10
+
11
+ That's it. Open your project with `opencode` and start building. Swarm activates automatically.
26
12
 
27
13
  ---
28
14
 
29
- ## The Problem
15
+ ## What Actually Happens
16
+
17
+ You say: *"Build me a JWT auth system."*
30
18
 
31
- Every multi-agent AI coding tool on the market has the same failure mode: they are vibes-driven. You describe a feature. Agents spawn. They race each other to write conflicting code, lose context after 20 messages, hit token limits mid-task, and produce something that sort-of-works until it doesn't. There's no plan. There's no memory. There's no gatekeeper. There's no test that was actually run.
19
+ Here's what Swarm does behind the scenes:
32
20
 
33
- **oh-my-opencode** is a prompt collection. **get-shit-done** is a workflow macro. Neither is a framework with memory, QA enforcement, or the ability to resume a project a week later exactly where you left off.
21
+ 1. **Asks you clarifying questions** (only the ones it can't figure out itself)
22
+ 2. **Scans your codebase** to understand what already exists
23
+ 3. **Consults domain experts** (security, API design, whatever your project needs) and caches the guidance so it never re-asks
24
+ 4. **Writes a phased plan** with concrete tasks, acceptance criteria, and dependencies
25
+ 5. **A separate critic agent reviews the plan** before any code is written
26
+ 6. **Implements one task at a time.** For each task:
27
+ - A coder agent writes the code
28
+ - 7 automated checks run (syntax, imports, linting, secrets, security, build, quality)
29
+ - A reviewer agent (running on a *different* AI model) checks for correctness
30
+ - A test engineer agent writes tests, runs them, and checks coverage
31
+ - If anything fails, it goes back to the coder with specific feedback
32
+ - If it passes everything, the task is marked done and the next one starts
33
+ 7. **After each phase completes**, documentation updates automatically, and a retrospective captures what worked and what didn't. Those learnings carry into the next phase.
34
34
 
35
- OpenCode Swarm is built differently.
35
+ All of this state lives in a `.swarm/` folder in your project:
36
36
 
37
37
  ```
38
- Every other framework:
39
- β”œβ”€β”€ Agent 1 starts the auth module...
40
- β”œβ”€β”€ Agent 2 starts the user model... (conflicts with Agent 1)
41
- β”œβ”€β”€ Agent 3 writes tests... (for code that doesn't exist yet)
42
- β”œβ”€β”€ Context window fills up and the whole thing drifts
43
- └── Result: chaos. Rework. Start over.
44
-
45
- OpenCode Swarm:
46
- β”œβ”€β”€ Architect reads .swarm/plan.md β†’ project already in progress, resumes Phase 2
47
- β”œβ”€β”€ @explorer scans the codebase for current state
48
- β”œβ”€β”€ @sme DOMAIN: security β†’ consults on auth patterns, guidance cached
49
- β”œβ”€β”€ Architect writes .swarm/plan.md: 3 phases, 9 tasks, acceptance criteria per task
50
- β”œβ”€β”€ @critic reviews the plan β†’ APPROVED
51
- β”œβ”€β”€ @coder implements Task 2.2 (one task, full context, nothing else)
52
- β”œβ”€β”€ diff tool β†’ imports tool β†’ lint fix β†’ lint check β†’ secretscan β†’ @reviewer β†’ @test_engineer
53
- β”œβ”€β”€ All gates pass β†’ plan.md updated β†’ Task 2.2: [x]
54
- └── Result: working code, documented decisions, resumable project, evidence trail
38
+ .swarm/
39
+ β”œβ”€β”€ plan.md # Your project roadmap (tasks, status, what's done, what's next)
40
+ β”œβ”€β”€ context.md # Decisions made, expert guidance, established patterns
41
+ β”œβ”€β”€ evidence/ # Review verdicts, test results, diffs for every completed task
42
+ └── history/ # Phase retrospectives and metrics
55
43
  ```
56
44
 
45
+ Close your terminal. Come back next week. Swarm reads these files and picks up exactly where it stopped.
46
+
57
47
  ---
58
48
 
59
- ## How It Works
49
+ ## Why This Exists
60
50
 
61
- ### The Execution Pipeline
51
+ Most AI coding tools let one model write code and then ask *that same model* if the code is good. That's like asking someone to proofread their own essay. They'll miss the same things they missed while writing it.
62
52
 
63
- ```
64
- β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
65
- β”‚ Phase 0: Resume Check β”‚
66
- β”‚ .swarm/plan.md exists? Resume mid-task. New project? Continue. β”‚
67
- β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
68
- β”‚
69
- β–Ό
70
- β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
71
- β”‚ Phase 1: Clarify β”‚
72
- β”‚ Ask only what the Architect cannot infer. Then stop. β”‚
73
- β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
74
- β”‚
75
- β–Ό
76
- β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
77
- β”‚ Phase 2: Discover β”‚
78
- β”‚ @explorer scans codebase β†’ structure, languages, frameworks, key files β”‚
79
- β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
80
- β”‚
81
- β–Ό
82
- β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
83
- β”‚ Phase 3: SME Consult (serial, cached) β”‚
84
- β”‚ @sme DOMAIN: security, @sme DOMAIN: api, ... β”‚
85
- β”‚ Guidance written to .swarm/context.md β€” never re-asked in future phases β”‚
86
- β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
87
- β”‚
88
- β–Ό
89
- β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
90
- β”‚ Phase 4: Plan β”‚
91
- β”‚ Architect writes .swarm/plan.md β”‚
92
- β”‚ Structured phases, tasks with SMALL/MEDIUM/LARGE sizing, acceptance β”‚
93
- β”‚ criteria per task, explicit dependency graph β”‚
94
- β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
95
- β”‚
96
- β–Ό
97
- β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
98
- β”‚ Phase 4.5: Critic Gate β”‚
99
- β”‚ @critic reviews plan β†’ APPROVED / NEEDS_REVISION / REJECTED β”‚
100
- β”‚ Max 2 revision cycles. Escalates to user if unresolved. β”‚
101
- β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
102
- β”‚
103
- β–Ό
104
- β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
105
- β”‚ Phase 5: Execute (per task) β”‚
106
- β”‚ β”‚
107
- β”‚ [UI task?] β†’ @designer scaffold first β”‚
108
- β”‚ β”‚
109
- β”‚ @coder (one task, full context) β”‚
110
- β”‚ ↓ β”‚
111
- β”‚ diff β†’ syntax_check β†’ placeholder_scan β†’ imports β†’ lint fix β”‚
112
- β”‚ (contract detection) (parse validation) (anti-slop) (AST-based) β”‚
113
- β”‚ ↓ β”‚
114
- β”‚ build_check β†’ pre_check_batch (4 parallel: lint:check, secretscan, β”‚
115
- β”‚ (compile verify) sast_scan, quality_budget) β”‚
116
- β”‚ ↓ β”‚
117
- β”‚ @reviewer (correctness pass) β”‚
118
- β”‚ ↓ APPROVED β”‚
119
- β”‚ @reviewer (security-only pass, if file matches security globs) β”‚
120
- β”‚ ↓ APPROVED β”‚
121
- β”‚ @test_engineer (verification tests + coverage gate β‰₯70%) β”‚
122
- β”‚ ↓ PASS β”‚
123
- β”‚ @test_engineer (adversarial tests β€” boundary violations, injections) β”‚
124
- β”‚ ↓ PASS β”‚
125
- β”‚ β›” HARD STOP: Pre-commit checklist (4 items required, no override) β”‚
126
- β”‚ ↓ COMPLETE β”‚
127
- β”‚ plan.md β†’ [x] Task complete β”‚
128
- β”‚ β”‚
129
- β”‚ Any gate fails β†’ retry with failure count + structured rejection β”‚
130
- β”‚ Max 5 retries β†’ escalate to user β”‚
131
- β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
132
- β”‚
133
- β–Ό
134
- β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
135
- β”‚ Phase 6: Phase Complete β”‚
136
- β”‚ @explorer rescans. @docs updates documentation. Retrospective written. β”‚
137
- β”‚ Learnings injected as [SWARM RETROSPECTIVE] into next phase. β”‚
138
- β”‚ "Phase 1 complete (4 tasks, 0 rejections). Ready for Phase 2?" β”‚
139
- β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
53
+ Swarm fixes this by splitting the work across specialized agents and requiring that different models handle writing vs. reviewing. The coder writes. A different model reviews. Another model tests. Different training data, different blind spots, different failure modes.
54
+
55
+ The other thing most tools get wrong: they try to do everything in parallel. That sounds fast, but in practice you get three agents writing conflicting code at the same time with no coordination. Swarm runs one task at a time through a fixed pipeline. Slower per-task, but you don't redo work.
56
+
57
+ ---
58
+
59
+ ## Quick Start
60
+
61
+ ### Install
62
+
63
+ ```bash
64
+ npm install -g opencode-swarm
140
65
  ```
141
66
 
142
- ### Why Serial Execution Matters
67
+ ### Verify
143
68
 
144
- Multi-agent parallelism sounds fast. In practice, it is a race to produce conflicting, unreviewed code that requires a human to untangle. OpenCode Swarm runs one task at a time through a deterministic pipeline. Every task is reviewed. Every test is run. Every failure is documented and fed back to the coder with structured context. The tradeoff in raw speed is paid back in not redoing work.
69
+ Open a project with `opencode` and run:
145
70
 
146
- ---
71
+ ```
72
+ /swarm diagnose
73
+ ```
147
74
 
148
- ## Agents
75
+ This checks that everything is wired up correctly.
149
76
 
150
- ### 🎯 Orchestrator
77
+ ### Configure Models (Optional)
151
78
 
152
- **`architect`** β€” The central coordinator. Owns the plan, delegates all work, enforces every QA gate, maintains project memory, and resumes projects across sessions. Every other agent works for the Architect.
79
+ By default, Swarm uses whatever model OpenCode is configured with. To route different agents to different models (recommended), create `.opencode/swarm.json` in your project:
153
80
 
154
- ### πŸ” Discovery
81
+ ```json
82
+ {
83
+ "agents": {
84
+ "architect": { "model": "anthropic/claude-opus-4-6" },
85
+ "coder": { "model": "minimax-coding-plan/MiniMax-M2.5" },
86
+ "reviewer": { "model": "zai-coding-plan/glm-5" }
87
+ },
88
+ "guardrails": {
89
+ "max_tool_calls": 200,
90
+ "max_duration_minutes": 30,
91
+ "profiles": {
92
+ "coder": { "max_tool_calls": 500 }
93
+ }
94
+ },
95
+ "tool_filter": {
96
+ "enabled": true,
97
+ "overrides": {}
98
+ },
99
+ "review_passes": {
100
+ "always_security_review": false,
101
+ "security_globs": ["**/*auth*", "**/*crypto*", "**/*session*"]
102
+ },
103
+ "automation": {
104
+ "mode": "manual",
105
+ "capabilities": {
106
+ "plan_sync": false,
107
+ "phase_preflight": false,
108
+ "config_doctor_on_startup": false,
109
+ "evidence_auto_summaries": false,
110
+ "decision_drift_detection": false
111
+ }
112
+ }
113
+ }
114
+ ```
115
+
116
+ You only need to specify the agents you want to override. The rest use the default.
117
+
118
+ ### Start Building
155
119
 
156
- **`explorer`** β€” Fast codebase scanner. Identifies structure, languages, frameworks, key files, and import patterns. Runs before planning and after every phase completes.
120
+ Just tell OpenCode what you want to build. Swarm handles the rest.
157
121
 
158
- ### 🧠 Domain Expert
122
+ ```
123
+ > Build a REST API with user registration, login, and JWT auth
124
+ ```
159
125
 
160
- **`sme`** β€” Open-domain expert. The Architect specifies any domain per call: `security`, `python`, `rust`, `kubernetes`, `ios`, `ml`, `blockchain` β€” any domain the underlying model has knowledge of. No hardcoded list. Guidance is cached in `.swarm/context.md` so the same question is never asked twice.
126
+ Use `/swarm status` at any time to see where things stand.
161
127
 
162
- ### 🎨 Design
128
+ ---
163
129
 
164
- **`designer`** β€” UI/UX specification agent. Opt-in via config. Generates component scaffolds and design tokens before the coder touches UI tasks, eliminating the most common source of front-end rework.
130
+ ## Useful Commands
165
131
 
166
- ### πŸ’» Implementation
132
+ | Command | What It Does |
133
+ |---------|-------------|
134
+ | `/swarm status` | Where am I? Current phase, task progress |
135
+ | `/swarm plan` | Show the full project plan |
136
+ | `/swarm diagnose` | Health check, is everything configured right? |
137
+ | `/swarm evidence 2.1` | Show review/test results for a specific task |
138
+ | `/swarm history` | What's been completed so far |
139
+ | `/swarm reset --confirm` | Start over (clears all swarm state) |
167
140
 
168
- **`coder`** β€” Implements exactly one task with full context. No multitasking. No context bleed from prior tasks. The coder receives: the task spec, acceptance criteria, SME guidance, and relevant context from `.swarm/context.md`. Nothing else.
141
+ ---
169
142
 
170
- **`test_engineer`** β€” Generates tests, runs them, and returns structured `PASS/FAIL` verdicts with coverage percentages. Runs twice per task: once for verification, once for adversarial attack scenarios.
143
+ ## The Agents
171
144
 
172
- ### βœ… Quality Assurance
145
+ Swarm has nine agents. You don't interact with them directly. The architect orchestrates everything.
173
146
 
174
- **`reviewer`** β€” Dual-pass review. First pass: correctness, logic, maintainability. Second pass: security-only, scoped to OWASP Top 10 categories, triggered automatically when the modified files match security-sensitive path patterns. Both passes produce structured verdicts with specific rejection reasons.
147
+ | Agent | Role | When It Runs |
148
+ |-------|------|-------------|
149
+ | **architect** | Plans the project, delegates tasks, enforces quality gates | Always (it's the coordinator) |
150
+ | **explorer** | Scans your codebase to understand what exists | Before planning, after each phase |
151
+ | **sme** | Domain expert (security, APIs, databases, whatever is needed) | During planning, guidance is cached |
152
+ | **critic** | Reviews the plan before any code is written | After planning, before execution |
153
+ | **coder** | Writes code, one task at a time | During execution |
154
+ | **reviewer** | Reviews code for correctness and security issues | After every task |
155
+ | **test_engineer** | Writes and runs tests, including adversarial edge cases | After every task |
156
+ | **designer** | Generates UI scaffolds and design tokens (opt-in) | Before UI tasks |
157
+ | **docs** | Updates documentation to match what was actually built | After each phase |
175
158
 
176
- **`critic`** β€” Plan review gate. Reviews the Architect's plan *before implementation begins*. Checks for completeness, feasibility, scope creep, missing dependencies, and AI-slop hallucinations. Plans do not proceed without Critic approval.
159
+ ---
177
160
 
178
- ### πŸ“ Documentation
161
+ ## How It Compares
179
162
 
180
- **`docs`** β€” Documentation synthesizer. Runs in Phase 6 with a diff of changed files. Updates READMEs, API documentation, and guides to reflect what was actually built, not what was planned.
163
+ | | OpenCode Swarm | oh-my-opencode | get-shit-done |
164
+ |---|:-:|:-:|:-:|
165
+ | Multiple specialized agents | βœ… 9 agents | ❌ Prompt config | ❌ Single-agent macros |
166
+ | Plan reviewed before coding starts | βœ… | ❌ | ❌ |
167
+ | Every task reviewed + tested | βœ… | ❌ | ❌ |
168
+ | Different model for review vs. coding | βœ… | ❌ | ❌ |
169
+ | Saves state to disk, resumable | βœ… | ❌ | ❌ |
170
+ | Security scanning built in | βœ… | ❌ | ❌ |
171
+ | Learns from its own mistakes | βœ… (retrospectives) | ❌ | ❌ |
181
172
 
182
173
  ---
183
174
 
184
- ## Persistent Memory
175
+ <details>
176
+ <summary><strong>Full Execution Pipeline (Technical Detail)</strong></summary>
177
+
178
+ ### The Pipeline
185
179
 
186
- Other frameworks lose everything when the session ends. Swarm stores project state on disk.
180
+ Every task goes through this sequence. No exceptions, no overrides.
187
181
 
188
182
  ```
189
- .swarm/
190
- β”œβ”€β”€ plan.md # Living roadmap: phases, tasks, status, rejections, blockers
191
- β”œβ”€β”€ plan.json # Machine-readable plan for tooling
192
- β”œβ”€β”€ context.md # Institutional knowledge: decisions, SME guidance, patterns
193
- β”œβ”€β”€ evidence/ # Per-task execution evidence bundles
194
- β”‚ β”œβ”€β”€ 1.1/ # review verdict, test results, diff summary for task 1.1
195
- β”‚ └── 2.3/
196
- └── history/
197
- β”œβ”€β”€ phase-1.md # What was built, what was learned, retrospective metrics
198
- └── phase-2.md
183
+ MODE: EXECUTE (per task)
184
+ β”‚
185
+ β”œβ”€β”€ 5a. @coder implements (ONE task only)
186
+ β”œβ”€β”€ 5b. diff + imports (contract + dependency analysis)
187
+ β”œβ”€β”€ 5c. syntax_check (parse validation)
188
+ β”œβ”€β”€ 5d. placeholder_scan (catches TODOs, stubs, incomplete code)
189
+ β”œβ”€β”€ 5e. lint fix β†’ lint check
190
+ β”œβ”€β”€ 5f. build_check (does it compile?)
191
+ β”œβ”€β”€ 5g. pre_check_batch (4 parallel: lint, secretscan, SAST, quality budget)
192
+ β”œβ”€β”€ 5h. @reviewer (correctness pass)
193
+ β”œβ”€β”€ 5i. @reviewer (security pass, if security-sensitive files changed)
194
+ β”œβ”€β”€ 5j. @test_engineer (verification tests + coverage β‰₯70%)
195
+ β”œβ”€β”€ 5k. @test_engineer (adversarial tests)
196
+ β”œβ”€β”€ 5l. β›” Pre-commit checklist (all 4 items required, no override)
197
+ └── 5m. Task marked complete, evidence written
199
198
  ```
200
199
 
201
- ### plan.md β€” Living Roadmap
200
+ If any step fails, the coder gets structured feedback and retries. After 5 failures on the same task, it escalates to you.
201
+
202
+ ### Architect Workflow Modes
203
+
204
+ The architect moves through these modes automatically:
205
+
206
+ | Mode | What Happens |
207
+ |------|-------------|
208
+ | `RESUME` | Checks if `.swarm/plan.md` exists, picks up where it left off |
209
+ | `CLARIFY` | Asks you questions (only what it can't infer) |
210
+ | `DISCOVER` | Explorer scans the codebase |
211
+ | `CONSULT` | SME agents provide domain guidance |
212
+ | `PLAN` | Architect writes the phased plan |
213
+ | `CRITIC-GATE` | Critic reviews the plan (max 2 revision cycles) |
214
+ | `EXECUTE` | Tasks are implemented one at a time through the QA pipeline |
215
+ | `PHASE-WRAP` | Phase completes, docs update, retrospective written |
216
+
217
+ </details>
218
+
219
+ <details>
220
+ <summary><strong>Persistent Memory (What's in .swarm/)</strong></summary>
221
+
222
+ ### plan.md: Your Project Roadmap
202
223
 
203
224
  ```markdown
204
225
  # Project: Auth System
@@ -215,156 +236,108 @@ Current Phase: 2
215
236
  - Acceptance: Returns valid JWT with user claims, 15-minute expiry
216
237
  - Attempt 1: REJECTED β€” missing expiration claim
217
238
  - [ ] Task 2.3: Token validation middleware [MEDIUM]
218
- - [BLOCKED] Task 2.4: Refresh token rotation
219
- - Reason: Awaiting decision on rotation strategy
220
239
  ```
221
240
 
222
- ### context.md β€” Institutional Knowledge
241
+ ### context.md: What's Been Decided
223
242
 
224
243
  ```markdown
225
- # Project Context: Auth System
226
-
227
244
  ## Technical Decisions
228
245
  - bcrypt cost factor: 12
229
246
  - JWT TTL: 15 minutes; refresh TTL: 7 days
230
- - Refresh token store: Redis with key prefix auth:refresh:
231
247
 
232
- ## SME Guidance Cache
248
+ ## SME Guidance (cached, never re-asked)
233
249
  ### security (Phase 1)
234
- - Never log tokens or passwords in any context
235
- - Use constant-time comparison for all token equality checks
236
- - Rate-limit login endpoint: 5 attempts / 15 minutes per IP
250
+ - Never log tokens or passwords
251
+ - Rate-limit login: 5 attempts / 15 min per IP
237
252
 
238
253
  ### api (Phase 1)
239
- - Return HTTP 401 for invalid credentials (not 404)
240
- - Include token expiry timestamp in response body
241
-
242
- ## Patterns Established
243
- - Error handling: custom ApiError class with HTTP status and error code
244
- - Validation: Zod schemas in /validators/, applied at request boundary
254
+ - Return 401 for invalid credentials (not 404)
245
255
  ```
246
256
 
247
- Start a new session tomorrow. The Architect reads these files and picks up exactly where you left off β€” no re-explaining, no rediscovery, no drift.
248
-
249
257
  ### Evidence Bundles
250
258
 
251
- Each completed task writes structured evidence to `.swarm/evidence/`:
259
+ Every completed task writes structured evidence to `.swarm/evidence/`:
252
260
 
253
261
  | Type | What It Captures |
254
- |------|-----------------|
255
- | `review` | Verdict (APPROVED/REJECTED), risk level, specific issues |
256
- | `test` | Pass/fail counts, coverage percentage, failure messages |
257
- | `diff` | Files changed, additions/deletions, contract change flags |
258
- | `approval` | Stakeholder sign-off with notes |
259
- | `retrospective` | Phase metrics: total tool calls, coder revisions, reviewer rejections, test failures, security findings, lessons learned |
262
+ |------|--------------------|
263
+ | review | Verdict, risk level, specific issues |
264
+ | test | Pass/fail counts, coverage %, failure messages |
265
+ | diff | Files changed, additions/deletions |
266
+ | retrospective | Phase metrics, lessons learned (injected into next phase) |
260
267
 
261
- Retrospectives from completed phases are injected as `[SWARM RETROSPECTIVE]` hints at the start of subsequent phases. The framework learns from its own history within a project.
268
+ </details>
262
269
 
263
- ---
270
+ <details>
271
+ <summary><strong>Guardrails and Circuit Breakers</strong></summary>
264
272
 
265
- ## Heterogeneous Models
273
+ Every agent runs inside a circuit breaker that kills runaway behavior before it burns your credits.
266
274
 
267
- Single-model frameworks have correlated failure modes. The same model that writes the bug reviews it and misses it. Swarm lets you route each agent to the model it is best suited for:
275
+ | Signal | Default Limit | What Happens |
276
+ |--------|:---:|-------------|
277
+ | Tool calls | 200 | Agent is stopped |
278
+ | Duration | 30 min | Agent is stopped |
279
+ | Same tool repeated | 10x | Agent is warned, then stopped |
280
+ | Consecutive errors | 5 | Agent is stopped |
281
+
282
+ Limits reset per task. A coder working on Task 2.3 is not penalized for tool calls made during Task 2.2.
283
+
284
+ Per-agent overrides:
268
285
 
269
286
  ```json
270
287
  {
271
- "agents": {
272
- "architect": { "model": "anthropic/claude-opus-4-6" },
273
- "coder": { "model": "minimax-coding-plan/MiniMax-M2.5" },
274
- "explorer": { "model": "minimax-coding-plan/MiniMax-M2.1" },
275
- "sme": { "model": "kimi-for-coding/k2p5" },
276
- "critic": { "model": "zai-coding-plan/glm-5" },
277
- "reviewer": { "model": "zai-coding-plan/glm-5" },
278
- "test_engineer": { "model": "minimax-coding-plan/MiniMax-M2.5" },
279
- "docs": { "model": "zai-coding-plan/glm-4.7-flash" },
280
- "designer": { "model": "kimi-for-coding/k2p5" }
288
+ "guardrails": {
289
+ "profiles": {
290
+ "coder": { "max_tool_calls": 500, "max_duration_minutes": 60 },
291
+ "explorer": { "max_tool_calls": 50 }
292
+ }
281
293
  }
282
294
  }
283
295
  ```
284
296
 
285
- Reviewer uses a different model than Coder by design. Different training, different priors, different blind spots. This is the cheapest bug-catcher you will ever deploy.
286
-
287
- ---
297
+ </details>
288
298
 
289
- ## Guardrails
299
+ <details>
300
+ <summary><strong>Quality Gates (Technical Detail)</strong></summary>
290
301
 
291
- Every subagent runs inside a circuit breaker that kills runaway behavior before it burns credits on a stuck loop.
302
+ ### Built-in Tools
292
303
 
293
- | Layer | Trigger | Action |
294
- |-------|---------|--------|
295
- | ⚠️ Soft Warning | 50% of any limit reached | Warning injected into agent stream |
296
- | πŸ›‘ Hard Block | 100% of any limit reached | All further tool calls blocked |
304
+ | Tool | What It Does |
305
+ |------|-------------|
306
+ | syntax_check | Tree-sitter validation across 9+ languages |
307
+ | placeholder_scan | Catches TODOs, FIXMEs, stubs, placeholder text |
308
+ | sast_scan | Offline security analysis, 63+ rules, 9 languages |
309
+ | sbom_generate | CycloneDX dependency tracking, 8 ecosystems |
310
+ | build_check | Runs your project's native build/typecheck |
311
+ | quality_budget | Enforces complexity, duplication, and test ratio limits |
312
+ | pre_check_batch | Runs lint, secretscan, SAST, and quality budget in parallel (~15s vs ~60s sequential) |
297
313
 
298
- | Signal | Default | Description |
299
- |--------|---------|-------------|
300
- | Tool calls | 200 | Per-invocation, not per-session |
301
- | Duration | 30 min | Wall-clock time per delegation |
302
- | Repetition | 10 | Same tool + args consecutively |
303
- | Consecutive errors | 5 | Sequential null/undefined outputs |
314
+ All tools run locally. No Docker, no network calls, no external APIs.
304
315
 
305
- Limits are enforced **per-invocation**. Each delegation to a subagent starts a fresh budget. A coder fixing a second task is not penalized for the first task's tool calls. The Architect is exempt from all limits by default.
316
+ Optional enhancement: Semgrep (if on PATH).
306
317
 
307
- Per-agent profiles allow fine-grained overrides:
318
+ ### Gate Configuration
308
319
 
309
- ```jsonc
320
+ ```json
310
321
  {
311
- "guardrails": {
312
- "max_tool_calls": 200,
313
- "profiles": {
314
- "coder": { "max_tool_calls": 500, "max_duration_minutes": 60 },
315
- "explorer": { "max_tool_calls": 50 }
322
+ "gates": {
323
+ "syntax_check": { "enabled": true },
324
+ "placeholder_scan": { "enabled": true },
325
+ "sast_scan": { "enabled": true },
326
+ "quality_budget": {
327
+ "enabled": true,
328
+ "max_complexity_delta": 5,
329
+ "min_test_to_code_ratio": 0.3
316
330
  }
317
331
  }
318
332
  }
319
333
  ```
320
334
 
321
- ---
322
-
323
- ## Comparison
324
-
325
- | Feature | OpenCode Swarm | oh-my-opencode | get-shit-done | AutoGen | CrewAI |
326
- |---------|:-:|:-:|:-:|:-:|:-:|
327
- | Multi-agent orchestration | βœ… 9 specialized agents | ❌ Prompt config only | ❌ Single-agent macros | βœ… | βœ… |
328
- | Execution model | Serial (deterministic) | N/A | N/A | Parallel (chaotic) | Parallel |
329
- | Phased planning with acceptance criteria | βœ… | ❌ | ❌ | ❌ | ❌ |
330
- | Critic gate before implementation | βœ… | ❌ | ❌ | ❌ | ❌ |
331
- | Per-task dual-pass review (correctness + security) | βœ… | ❌ | ❌ | Optional | Optional |
332
- | Adversarial test pass per task | βœ… | ❌ | ❌ | ❌ | ❌ |
333
- | Pre-reviewer pipeline (lint, secretscan, imports) | βœ… v6.3 | ❌ | ❌ | ❌ | ❌ |
334
- | Persistent session memory | βœ… `.swarm/` files | ❌ | ❌ | Session only | Session only |
335
- | Resume projects across sessions | βœ… Native | ❌ | ❌ | ❌ | ❌ |
336
- | Evidence trail per task | βœ… Structured bundles | ❌ | ❌ | ❌ | ❌ |
337
- | Heterogeneous model routing | βœ… Per-agent | ❌ | ❌ | Limited | Limited |
338
- | Circuit breaker / guardrails | βœ… Per-invocation | ❌ | ❌ | ❌ | ❌ |
339
- | Open-domain SME consultation | βœ… Any domain | ❌ | ❌ | ❌ | ❌ |
340
- | Retrospective learning across phases | βœ… | ❌ | ❌ | ❌ | ❌ |
341
- | Slash commands + diagnostics | βœ… 12 commands | ❌ | Limited | ❌ | ❌ |
342
-
343
- ---
344
-
345
- ## Slash Commands
346
-
347
- | Command | Description |
348
- |---------|-------------|
349
- | `/swarm status` | Current phase, task progress, agent count |
350
- | `/swarm plan [N]` | Full plan or filtered by phase |
351
- | `/swarm agents` | All registered agents with models and permissions |
352
- | `/swarm history` | Completed phases with status |
353
- | `/swarm config` | Current resolved configuration |
354
- | `/swarm diagnose` | Health check for `.swarm/` files and config |
355
- | `/swarm export` | Export plan and context as portable JSON |
356
- | `/swarm evidence [task]` | Evidence bundles for a task or all tasks |
357
- | `/swarm archive [--dry-run]` | Archive old evidence with retention policy |
358
- | `/swarm benchmark` | Performance benchmarks |
359
- | `/swarm retrieve [id]` | Retrieve auto-summarized tool outputs |
360
- | `/swarm reset --confirm` | Clear swarm state files |
361
- | `/swarm preflight` | Run phase preflight checks (v6.7) |
362
- | `/swarm config doctor [--fix] [--restore <id>]` | Config validation with optional auto-fix (v6.7) |
363
- | `/swarm sync-plan` | Force plan.md regeneration from plan.json (v6.7) |
335
+ </details>
364
336
 
365
- ---
337
+ <details>
338
+ <summary><strong>Full Configuration Reference</strong></summary>
366
339
 
367
- ## Configuration
340
+ Config file location: `~/.config/opencode/opencode-swarm.json` (global) or `.opencode/swarm.json` (project). Project config merges over global.
368
341
 
369
342
  ```json
370
343
  {
@@ -388,349 +361,286 @@ Per-agent profiles allow fine-grained overrides:
388
361
  },
389
362
  "review_passes": {
390
363
  "always_security_review": false,
391
- "security_globs": ["**/*auth*", "**/*crypto*", "**/*session*", "**/*token*"]
364
+ "security_globs": ["**/*auth*", "**/*crypto*", "**/*session*"]
392
365
  },
393
366
  "automation": {
394
367
  "mode": "manual",
395
368
  "capabilities": {
396
- "plan_sync": false,
369
+ "plan_sync": true,
397
370
  "phase_preflight": false,
398
371
  "config_doctor_on_startup": false,
399
372
  "config_doctor_autofix": false,
400
- "evidence_auto_summaries": false,
401
- "decision_drift_detection": false
373
+ "evidence_auto_summaries": true,
374
+ "decision_drift_detection": true
402
375
  }
403
376
  }
404
377
  }
405
378
  ```
406
379
 
407
- Save to `~/.config/opencode/opencode-swarm.json` or `.opencode/swarm.json` in your project root. Project config merges over global config via deep merge β€” partial overrides do not clobber unspecified fields.
380
+ ### Automation
408
381
 
409
- ### Automation (v6.7)
382
+ ## Plan Cursor (v6.13)
410
383
 
411
- **Default mode: `manual`** (no background automation). Enable automation features via `automation` config:
384
+ The `plan_cursor` config compresses the plan that is injected into the LLM context.
412
385
 
413
386
  ```json
414
387
  {
415
- "automation": {
416
- "mode": "hybrid",
417
- "capabilities": {
418
- "plan_sync": true,
419
- "config_doctor_on_startup": true,
420
- "evidence_auto_summaries": true
421
- }
388
+ "plan_cursor": {
389
+ "enabled": true,
390
+ "max_tokens": 1500,
391
+ "lookahead_tasks": 2
422
392
  }
423
393
  }
424
394
  ```
425
395
 
426
- **Automation modes:**
427
- - `manual` - No background automation (default)
428
- - `hybrid` - Background automation for safe ops, manual for sensitive ones
429
- - `auto` - Full background automation (target state)
396
+ - **enabled** – When `true` (default) Swarm injects a compact plan cursor instead of the full `plan.md`.
397
+ - **max_tokens** – Upper bound on the number of tokens emitted for the cursor (defaultβ€―1500). The cursor contains the current phase summary, the full current task, and up to `lookahead_tasks` upcoming tasks. Earlier phases are reduced to one‑line summaries.
398
+ - **lookahead_tasks** – Number of future tasks to include in full detail (defaultβ€―2). Set to `0` to show only the current task.
430
399
 
431
- **Per-feature flags (all default `false`):**
432
- - `plan_sync` - Auto-regenerate plan.md from plan.json when out of sync
433
- - `phase_preflight` - Phase-boundary validation before agent execution
434
- - `config_doctor_on_startup` - Config validation on plugin initialization
435
- - `config_doctor_autofix` - Auto-fix mode for Config Doctor (requires explicit opt-in)
436
- - `evidence_auto_summaries` - Auto-generate evidence summaries
437
- - `decision_drift_detection` - Detect drift between planned and actual decisions
400
+ Disabling (`"enabled": false`) falls back to the pre‑v6.13 behavior of injecting the entire plan text.
438
401
 
439
- ### Disabling Agents
402
+ ## Tool Output Truncation (v6.13)
403
+
404
+ Control the size of tool outputs that are sent back to the LLM.
440
405
 
441
406
  ```json
442
407
  {
443
- "sme": { "disabled": true },
444
- "designer": { "disabled": true },
445
- "test_engineer": { "disabled": true }
408
+ "tool_output": {
409
+ "truncation_enabled": true,
410
+ "max_lines": 150,
411
+ "per_tool": {
412
+ "diff": 200,
413
+ "symbols": 100
414
+ }
415
+ }
446
416
  }
447
417
  ```
448
418
 
449
- ---
419
+ - **truncation_enabled** – Global switch (defaultβ€―true).
420
+ - **max_lines** – Default line limit for any tool output.
421
+ - **per_tool** – Overrides `max_lines` for specific tools. The `diff` and `symbols` tools are truncated by default because their outputs can be very large.
450
422
 
451
- ## Installation
423
+ When truncation is active, a footer is appended:
452
424
 
453
- ```bash
454
- # Install globally
455
- npm install -g opencode-swarm
456
-
457
- # Or use npx
458
- npx opencode-swarm install
459
-
460
- # Verify
461
- opencode # then: /swarm diagnose
462
425
  ```
463
-
464
- The installer auto-configures `opencode.json` to include the plugin. Manual configuration:
465
-
466
- ```json
467
- {
468
- "plugins": ["opencode-swarm"]
469
- }
470
- ```
471
-
472
426
  ---
427
+ [output truncated to {maxLines} lines – use `tool_output.per_tool.<tool>` to adjust]
428
+ ```
473
429
 
474
- ## Testing
430
+ ## Mode Detection (v6.13)
475
431
 
476
- 4008 tests across 136 files. Unit, integration, adversarial, and smoke. Covers config schemas, all agent prompts, all hooks, all tools, all commands, guardrail circuit breaker, race conditions, invocation window isolation, multi-invocation state, security category classification, evidence validation, background workers, phase-monitor hooks, and evidence-summary automation.
432
+ Swarm now explicitly distinguishes five architect modes:
477
433
 
478
- ```bash
479
- bun test
480
- ```
434
+ | Mode | When Injected |
435
+ |------|----------------|
436
+ | `DISCOVER` | After the explorer finishes scanning the codebase. |
437
+ | `PLAN` | When the architect writes or updates the plan. |
438
+ | `EXECUTE` | During task implementation (the normal pipeline). |
439
+ | `PHASE-WRAP` | After all tasks in a phase are completed, before docs are updated. |
440
+ | `UNKNOWN` | Fallback when the current state does not match any known mode. |
481
441
 
482
- Zero additional test dependencies. Uses Bun's built-in test runner.
442
+ Each mode determines which injection blocks are added to the LLM prompt (e.g., plan cursor is injected in `PLAN`, tool output truncation in `EXECUTE`, etc.).
483
443
 
484
- ---
444
+ Default mode: `manual`. No background automation β€” all actions require explicit slash commands.
485
445
 
486
- ## Quality Gates (v6.9.0)
446
+ Modes:
487
447
 
488
- ### syntax_check - Tree-sitter Parse Validation
489
- Validates syntax across 9+ languages using Tree-sitter parsers. Catches syntax errors before review.
448
+ - `manual` β€” No background automation. All actions via slash commands (default).
449
+ - `hybrid` β€” Background automation for safe operations, manual for sensitive ones.
450
+ - `auto` β€” Full background automation.
490
451
 
491
- ### placeholder_scan - Anti-Slop Detection
492
- Detects TODO/FIXME comments, placeholder text, and stub implementations. Prevents shipping incomplete code.
452
+ Capability defaults:
493
453
 
494
- ### sast_scan - Static Security Analysis
495
- Offline SAST with 63+ security rules across 9 languages. Optional Semgrep Tier B enhancement if available on PATH.
454
+ - `plan_sync`: `true` β€” Background plan synchronization using `fs.watch` with debounced writes (300ms) and 2-second polling fallback
455
+ - `phase_preflight`: `false` β€” Phase preflight checks before agent execution (opt-in)
456
+ - `config_doctor_on_startup`: `false` β€” Validate configuration on startup
457
+ - `config_doctor_autofix`: `false` β€” Auto-fix for config doctor (opt-in, security-sensitive)
458
+ - `evidence_auto_summaries`: `true` β€” Automatic summaries for evidence bundles
459
+ - `decision_drift_detection`: `true` β€” Detect drift between planned and actual decisions
496
460
 
497
- ### sbom_generate - Dependency Tracking
498
- Generates CycloneDX SBOMs from manifests/lock files. Tracks dependencies for 8 ecosystems.
461
+ ---
499
462
 
500
- ### build_check - Build Verification
501
- Runs repo-native build/typecheck commands. Ensures code compiles before review.
463
+ ### Disabling Agents
502
464
 
503
- ### quality_budget - Maintainability Enforcement
504
- Enforces complexity, API, duplication, and test-to-code ratio budgets. Configurable thresholds.
465
+ ```json
466
+ {
467
+ "sme": { "disabled": true },
468
+ "designer": { "disabled": true },
469
+ "test_engineer": { "disabled": true }
470
+ }
471
+ ```
505
472
 
506
- ## Parallel Pre-Check Batch (v6.10.0)
473
+ </details>
507
474
 
508
- ### pre_check_batch - Parallel Verification
475
+ <details>
476
+ <summary><strong>All Slash Commands</strong></summary>
509
477
 
510
- Runs four verification tools in parallel for faster QA gate execution:
511
- - **lint:check** - Code quality verification (hard gate)
512
- - **secretscan** - Secret detection (hard gate)
513
- - **sast_scan** - Static security analysis (hard gate)
514
- - **quality_budget** - Maintainability metrics
478
+ | Command | Description |
479
+ |---------|-------------|
480
+ | `/swarm status` | Current phase, task progress, agent count |
481
+ | `/swarm plan [N]` | Full plan or filtered by phase |
482
+ | `/swarm agents` | Registered agents with models and permissions |
483
+ | `/swarm history` | Completed phases with status |
484
+ | `/swarm config` | Current resolved configuration |
485
+ | `/swarm diagnose` | Health check for `.swarm/` files and config |
486
+ | `/swarm export` | Export plan and context as portable JSON |
487
+ | `/swarm evidence [task]` | Evidence bundles for a task or all tasks |
488
+ | `/swarm archive [--dry-run]` | Archive old evidence with retention policy |
489
+ | `/swarm benchmark` | Performance benchmarks |
490
+ | `/swarm retrieve [id]` | Retrieve auto-summarized tool outputs |
491
+ | `/swarm reset --confirm` | Clear swarm state files |
492
+ | `/swarm preflight` | Run phase preflight checks |
493
+ | `/swarm config doctor [--fix]` | Config validation with optional auto-fix |
494
+ | `/swarm sync-plan` | Force plan.md regeneration from plan.json |
515
495
 
516
- **Purpose**: Reduces total gate execution time from ~60s (sequential) to ~15s (parallel) by running independent checks concurrently.
496
+ </details>
517
497
 
518
- **When to use**: After `build_check` passes and before `@reviewer` β€” all 4 gates must pass for `gates_passed: true`.
498
+ ---
519
499
 
520
- **Usage**:
521
- ```typescript
522
- const result = await pre_check_batch({
523
- directory: ".",
524
- files: ["src/auth.ts", "src/session.ts"],
525
- sast_threshold: "medium"
526
- });
500
+ ## Role-Scoped Tool Filtering
527
501
 
528
- // Returns:
529
- // {
530
- // gates_passed: boolean, // All hard gates passed
531
- // lint: { ran, result, error, duration_ms },
532
- // secretscan: { ran, result, error, duration_ms },
533
- // sast_scan: { ran, result, error, duration_ms },
534
- // quality_budget: { ran, result, error, duration_ms },
535
- // total_duration_ms: number
536
- // }
537
- ```
502
+ Swarm limits which tools each agent can access based on their role. This prevents agents from using tools that aren't appropriate for their responsibilities, reducing errors and keeping agents focused.
538
503
 
539
- **Hard Gates** (must pass for gates_passed=true):
540
- - Lint errors β†’ Fix and retry
541
- - Secrets found β†’ Fix and retry
542
- - SAST vulnerabilities at/above threshold β†’ Fix and retry
543
- - Quality budget violations β†’ Refactor or adjust thresholds
504
+ ### Default Tool Allocations
544
505
 
545
- **Parallel Execution Safety**:
546
- - Max 4 concurrent operations via `p-limit`
547
- - 60-second timeout per tool
548
- - 500KB output size limit
549
- - Individual tool failures don't cascade to others
506
+ | Agent | Tools | Count | Rationale |
507
+ |-------|-------|:---:|-----------|
508
+ | **architect** | All 17 tools | 17 | Orchestrator needs full visibility |
509
+ | **reviewer** | diff, imports, lint, pkg_audit, pre_check_batch, secretscan, symbols, complexity_hotspots, retrieve_summary, extract_code_blocks, test_runner | 11 | Security-focused QA |
510
+ | **coder** | diff, imports, lint, symbols, extract_code_blocks, retrieve_summary | 6 | Write-focused, minimal read tools |
511
+ | **test_engineer** | test_runner, diff, symbols, extract_code_blocks, retrieve_summary, imports, complexity_hotspots, pkg_audit | 8 | Testing and verification |
512
+ | **explorer** | complexity_hotspots, detect_domains, extract_code_blocks, gitingest, imports, retrieve_summary, schema_drift, symbols, todo_extract | 9 | Discovery and analysis |
513
+ | **sme** | complexity_hotspots, detect_domains, extract_code_blocks, imports, retrieve_summary, schema_drift, symbols | 7 | Domain expertise research |
514
+ | **critic** | complexity_hotspots, detect_domains, imports, retrieve_summary, symbols | 5 | Plan review, minimal toolset |
515
+ | **docs** | detect_domains, extract_code_blocks, gitingest, imports, retrieve_summary, schema_drift, symbols, todo_extract | 8 | Documentation synthesis |
516
+ | **designer** | extract_code_blocks, retrieve_summary, symbols | 3 | UI-focused, minimal toolset |
550
517
 
551
518
  ### Configuration
552
519
 
553
- Enable/disable parallel pre-check via `.opencode/swarm.json`:
520
+ Tool filtering is enabled by default. Customize it in your config:
554
521
 
555
522
  ```json
556
523
  {
557
- "pipeline": {
558
- "parallel_precheck": true // default: true
524
+ "tool_filter": {
525
+ "enabled": true,
526
+ "overrides": {
527
+ "coder": ["diff", "imports", "lint", "symbols", "test_runner"],
528
+ "reviewer": ["diff", "secretscan", "sast_scan", "symbols"]
529
+ }
559
530
  }
560
531
  }
561
532
  ```
562
533
 
563
- Set to `false` to run gates sequentially (useful for debugging or resource-constrained environments).
564
-
565
- ### Updated Phase 5 QA Sequence (v6.11.0)
566
-
567
- Complete execution pipeline with MODE labels and observable outputs:
534
+ | Option | Type | Default | Description |
535
+ |--------|------|---------|-------------|
536
+ | `enabled` | boolean | `true` | Enable tool filtering globally |
537
+ | `overrides` | Record<string, string[]> | `{}` | Per-agent tool whitelist. Empty array denies all tools. |
568
538
 
569
- ```
570
- MODE: EXECUTE (per task)
571
- β”‚
572
- β”œβ”€β”€ 5a. @coder implements (ONE task only)
573
- β”‚ └── β†’ REQUIRED: Print task start confirmation
574
- β”‚
575
- β”œβ”€β”€ 5b. diff + imports tools (contract + dependency analysis)
576
- β”‚ └── β†’ REQUIRED: Print change summary
577
- β”‚
578
- β”œβ”€β”€ 5c. syntax_check (parse validation)
579
- β”‚ └── β†’ REQUIRED: Print syntax status
580
- β”‚
581
- β”œβ”€β”€ 5d. placeholder_scan (anti-slop detection)
582
- β”‚ └── β†’ REQUIRED: Print placeholder scan results
583
- β”‚
584
- β”œβ”€β”€ 5e. lint fix β†’ 5f. lint:check (inside pre_check_batch)
585
- β”‚ └── β†’ REQUIRED: Print lint status
586
- β”‚
587
- β”œβ”€β”€ 5g. build_check (compilation verification)
588
- β”‚ └── β†’ REQUIRED: Print build status
589
- β”‚
590
- β”œβ”€β”€ 5h. pre_check_batch (4 parallel gates)
591
- β”‚ β”œβ”€β”€ lint:check (hard gate)
592
- β”‚ β”œβ”€β”€ secretscan (hard gate)
593
- β”‚ β”œβ”€β”€ sast_scan (hard gate)
594
- β”‚ └── quality_budget (maintainability metrics)
595
- β”‚ └── β†’ REQUIRED: Print gates_passed status
596
- β”‚
597
- β”œβ”€β”€ 5i. @reviewer (correctness pass)
598
- β”‚ └── β†’ REQUIRED: Print approval decision
599
- β”‚
600
- β”œβ”€β”€ 5j. @reviewer security-only pass (if security file)
601
- β”‚ └── β†’ REQUIRED: Print security approval
602
- β”‚
603
- β”œβ”€β”€ 5k. @test_engineer (verification tests + coverage)
604
- β”‚ └── β†’ REQUIRED: Print test results
605
- β”‚
606
- β”œβ”€β”€ 5l. @test_engineer (adversarial tests)
607
- β”‚ └── β†’ REQUIRED: Print adversarial test results
608
- β”‚
609
- β”œβ”€β”€ 5m. β›” HARD STOP: Pre-commit checklist
610
- β”‚ β”œβ”€β”€ [ ] All QA gates passed (no overrides)
611
- β”‚ β”œβ”€β”€ [ ] Reviewer approval documented
612
- β”‚ β”œβ”€β”€ [ ] Tests pass with evidence
613
- β”‚ └── [ ] No security findings
614
- β”‚ └── β†’ REQUIRED: Print checklist completion
615
- β”‚
616
- └── 5n. TASK COMPLETION CHECKLIST (emit before marking complete)
617
- β”œβ”€β”€ Evidence written to .swarm/evidence/{taskId}/
618
- β”œβ”€β”€ plan.md updated with [x] task complete
619
- └── β†’ REQUIRED: Print completion confirmation
620
- ```
539
+ ### Troubleshooting: Agent Missing a Tool
621
540
 
622
- **MODE Labels** (v6.11): Architect workflow uses MODE labels internally:
623
- - `MODE: RESUME` β€” Resume detection
624
- - `MODE: CLARIFY` β€” Requirement clarification
625
- - `MODE: DISCOVER` β€” Codebase exploration
626
- - `MODE: CONSULT` β€” SME consultation
627
- - `MODE: PLAN` β€” Plan creation
628
- - `MODE: CRITIC-GATE` β€” Plan review checkpoint
629
- - `MODE: EXECUTE` β€” Task implementation
630
- - `MODE: PHASE-WRAP` β€” Phase completion
541
+ If an agent reports it doesn't have access to a tool it needs:
631
542
 
632
- **NAMESPACE RULE**: MODE labels refer to architect workflow phases. Project plan phases (in plan.md) remain as "Phase N".
543
+ 1. Check if the tool is in the agent's default allocation (see table above)
544
+ 2. Add a custom override in your config:
633
545
 
634
- **Retry Protocol** (v6.11): On failure, emit structured rejection:
635
- ```
636
- RETRY #{count}/5
637
- FAILED GATE: {gate_name}
638
- REASON: {specific failure}
639
- REQUIRED FIX: {actionable instruction}
640
- RESUME AT: {step_5x}
546
+ ```json
547
+ {
548
+ "tool_filter": {
549
+ "overrides": {
550
+ "coder": ["diff", "imports", "lint", "symbols", "extract_code_blocks", "retrieve_summary", "test_runner"]
551
+ }
552
+ }
553
+ }
641
554
  ```
642
555
 
643
- **Anti-Exemption Rules** (v6.11): The following rationalizations are explicitly blocked:
644
- - "It's a simple change"
645
- - "Just updating docs"
646
- - "Only a config tweak"
647
- - "Hotfix, no time for QA"
648
- - "The tests pass locally"
649
- - "I'll clean it up later"
650
- - "No logic changes"
651
- - "Already reviewed the pattern"
652
-
653
- **Pre-Commit Rule** (v6.11): All 4 checkboxes required before commit. No override. A commit without completed QA gate is a workflow violation.
654
-
655
- ### Rollback
656
-
657
- If parallel execution causes issues, refer to `.swarm/ROLLBACK-pre-check-batch.md` for rollback instructions.
658
-
659
- ### Local-Only Guarantee
660
- All v6.9.0 quality tools run locally without:
661
- - Docker containers
662
- - Network connections
663
- - External APIs
664
- - Cloud services
665
-
666
- Optional enhancement: Semgrep (if already on PATH)
667
-
668
- ### Configuration
669
- Configure gates in `.opencode/swarm.json`:
556
+ 3. To completely disable filtering for all agents:
670
557
 
671
558
  ```json
672
559
  {
673
- "gates": {
674
- "syntax_check": { "enabled": true },
675
- "placeholder_scan": { "enabled": true },
676
- "sast_scan": { "enabled": true },
677
- "sbom_generate": { "enabled": true },
678
- "build_check": { "enabled": true },
679
- "quality_budget": {
680
- "enabled": true,
681
- "max_complexity_delta": 5,
682
- "max_public_api_delta": 10,
683
- "max_duplication_ratio": 0.05,
684
- "min_test_to_code_ratio": 0.3
685
- }
560
+ "tool_filter": {
561
+ "enabled": false
686
562
  }
687
563
  }
688
564
  ```
689
565
 
566
+ ### Available Tools Reference
567
+
568
+ The following tools can be assigned to agents via overrides:
569
+
570
+ | Tool | Purpose |
571
+ |------|---------|
572
+ | `checkpoint` | Save/restore git checkpoints |
573
+ | `complexity_hotspots` | Identify high-risk code areas |
574
+ | `detect_domains` | Detect SME domains from text |
575
+ | `diff` | Analyze git diffs and changes |
576
+ | `evidence_check` | Verify task evidence |
577
+ | `extract_code_blocks` | Extract code from markdown |
578
+ | `gitingest` | Ingest external repositories |
579
+ | `imports` | Analyze import relationships |
580
+ | `lint` | Run project linters |
581
+ | `pkg_audit` | Security audit of dependencies |
582
+ | `pre_check_batch` | Parallel pre-checks (lint, secrets, SAST, quality) |
583
+ | `retrieve_summary` | Retrieve summarized tool outputs |
584
+ | `schema_drift` | Detect OpenAPI/schema drift |
585
+ | `secretscan` | Scan for secrets in code |
586
+ | `symbols` | Extract exported symbols |
587
+ | `test_runner` | Run project tests |
588
+ | `todo_extract` | Extract TODO/FIXME comments |
589
+
690
590
  ---
691
591
 
692
- ## Roadmap
592
+ ## Recent Changes
593
+
594
+ ### v6.13.0 β€” Context Efficiency
693
595
 
694
- ### v6.3 β€” Pre-Reviewer Pipeline
596
+ This release focuses on reducing context usage and improving mode-conditional behavior:
695
597
 
696
- Three new tools complete the pre-reviewer gauntlet. Code reaching the Reviewer is already clean.
598
+ - **Role-Scoped Tool Filtering**: Agent tools filtered via AGENT_TOOL_MAP
599
+ - **Plan Cursor**: Compressed plan summary under 1,500 tokens
600
+ - **Mode Detection**: DISCOVER/PLAN/EXECUTE/PHASE-WRAP/UNKNOWN modes
601
+ - **Tool Output Truncation**: diff/symbols outputs truncated with footer
602
+ - **ZodError Fixes**: Optional current_phase, 'completed' status support
697
603
 
698
- - **`imports`** β€” AST-based import graph. For each file changed by the coder, returns every consumer file, which exports each consumer uses, and the line numbers. Replaces fragile grep-based integration analysis with deterministic graph traversal.
699
- - **`lint`** β€” Auto-detects project linter (Biome, ESLint, Ruff, Clippy, PSScriptAnalyzer). Runs in fix mode first, then check mode. Structured diagnostic output per file.
700
- - **`secretscan`** β€” Entropy-based credential scanner. Detects API keys, tokens, connection strings, and private key headers in the diff before they reach the reviewer. Zero external dependencies.
604
+ ### v6.12.0 β€” Anti-Process-Violation Hardening
701
605
 
702
- Phase 5 execute loop becomes: `coder β†’ diff β†’ imports β†’ lint fix β†’ lint check β†’ secretscan β†’ reviewer β†’ security reviewer β†’ test_engineer β†’ adversarial test_engineer`.
606
+ This release adds runtime detection hooks to catch and warn about architect workflow violations:
703
607
 
704
- ### v6.4 β€” Execution and Planning Tools
608
+ - **Self-coding detection**: Warns when the architect writes code directly instead of delegating
609
+ - **Partial gate tracking**: Detects when QA gates are skipped
610
+ - **Self-fix detection**: Warns when an agent fixes its own gate failure (should delegate to fresh agent)
611
+ - **Batch detection**: Catches "implement X and add Y" batching in task requests
612
+ - **Zero-delegation detection**: Warns when tasks complete without any coder delegation
705
613
 
706
- - **`test_runner`** β€” Unified test execution across Bun, Vitest, Jest, Mocha, pytest, cargo test, and Pester. Auto-detects framework, returns normalized JSON with pass/fail/skip counts and coverage. Three scope modes: `all`, `convention` (naming-based), `graph` (import-graph-based). Eliminates the test_engineer's most common failure mode.
707
- - **`symbols`** β€” Export inventory for a module: functions, classes, interfaces, types, enums. Gives the Architect instant visibility into a file's public API surface without reading the full source.
708
- - **`checkpoint`** β€” Git-backed save points. Before any multi-file refactor (β‰₯3 files), Architect auto-creates a checkpoint commit. On critical integration failure, restores via soft reset instead of iterating into a hole.
614
+ These hooks are advisory (warnings only) and help maintain workflow discipline during long sessions.
709
615
 
710
- ### v6.5 β€” Intelligence and Audit Tools
616
+ ---
711
617
 
712
- Five tools that improve planning quality and post-phase validation:
618
+ ## Testing
713
619
 
714
- - **`pkg_audit`** β€” Wraps `npm audit`, `pip-audit`, `cargo audit`. Structured CVE output with severity, patched versions, and advisory URLs. Fed to the security reviewer for concrete vulnerability context.
715
- - **`complexity_hotspots`** β€” Git churn Γ— cyclomatic complexity risk map. Run in Phase 0/2 to identify modules that need stricter QA gates before implementation begins.
716
- - **`schema_drift`** β€” Compares OpenAPI spec against actual route implementations. Surfaces undocumented routes and phantom spec paths. Run in Phase 6 when API routes were modified.
717
- - **`todo_extract`** β€” Structured extraction of `TODO`, `FIXME`, and `HACK` annotations across the codebase. High-priority items fed directly into plan task candidates.
718
- - **`evidence_check`** β€” Audits completed tasks against required evidence types. Run in Phase 6 to verify every task has review and test evidence before the phase is marked complete.
620
+ 6,000+ tests. Unit, integration, adversarial, and smoke. Zero additional test dependencies.
621
+
622
+ ```bash
623
+ bun test
624
+ ```
719
625
 
720
626
  ---
721
627
 
722
628
  ## Design Principles
723
629
 
724
- 1. **Plan before code** β€” Documented phases with acceptance criteria. The Critic approves the plan before a single line is written.
725
- 2. **One task at a time** β€” The Coder gets one task and full context. Nothing else.
726
- 3. **Review everything immediately** β€” Every task goes through correctness review, security review, verification tests, and adversarial tests. No task ships without passing all four.
727
- 4. **Cache SME knowledge** β€” Guidance is written to `context.md`. The same domain question is never asked twice in a project.
728
- 5. **Persistent memory** β€” `.swarm/` files are the ground truth. Any session, any model, any day.
729
- 6. **Serial execution** β€” Predictable, debuggable, no race conditions, no conflicting writes.
730
- 7. **Heterogeneous models** β€” Different models, different blind spots. The coder's bug is the reviewer's catch.
731
- 8. **User checkpoints** β€” Phase transitions require user confirmation. No unsupervised multi-phase runs.
732
- 9. **Document failures** β€” Rejections and retries are recorded in plan.md. After 5 failed attempts, the task escalates to the user.
733
- 10. **Resumable by design** β€” A cold-start Architect can read `.swarm/` and continue any project as if it had been there from the beginning.
630
+ 1. **Plan before code.** The critic approves the plan before a single line is written.
631
+ 2. **One task at a time.** The coder gets one task and full context. Nothing else.
632
+ 3. **Review everything immediately.** Correctness, security, tests, adversarial tests. Every task.
633
+ 4. **Different models catch different bugs.** The coder's blind spot is the reviewer's strength.
634
+ 5. **Save everything to disk.** Any session, any model, any day, pick up where you left off.
635
+ 6. **Document failures.** Rejections and retries are recorded. After 5 failures, it escalates to you.
636
+
637
+ ---
638
+
639
+ ## Roadmap
640
+
641
+ See [CHANGELOG.md](CHANGELOG.md) for shipped features.
642
+
643
+ Upcoming: v6.14 focuses on further context optimization and agent coordination improvements.
734
644
 
735
645
  ---
736
646
 
@@ -739,8 +649,7 @@ Five tools that improve planning quality and post-phase validation:
739
649
  - [Architecture Deep Dive](docs/architecture.md)
740
650
  - [Design Rationale](docs/design-rationale.md)
741
651
  - [Installation Guide](docs/installation.md)
742
- - [Linux + Native Windows + Docker Desktop Install Guide](docs/installation-linux-docker.md)
743
- - [LLM Operator Install Guide](docs/installation-llm-operator.md)
652
+ - [Linux + Docker Desktop Install Guide](docs/installation-linux-docker.md)
744
653
 
745
654
  ---
746
655
 
@@ -750,6 +659,4 @@ MIT
750
659
 
751
660
  ---
752
661
 
753
- <p align="center">
754
- <strong>Stop hoping your agents figure it out. Start shipping code that actually works.</strong>
755
- </p>
662
+ **Stop hoping your agents figure it out. Start shipping code that actually works.**