meow-swarm 0.4.0 → 0.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/L2 ADDED
File without changes
package/README.md CHANGED
@@ -1,86 +1,161 @@
1
- # Meow-Swarm
2
-
3
- A background coding harness that runs autonomous coding tasks while you sleep. You dispatch a task, it runs in the background, and you check back later via TUI or state files.
4
-
5
- ```
6
- meow -p "fix the auth bug in src/auth.ts"
7
- # → background task dispatched, check back later with `meow --tui`
8
- ```
9
-
10
- ## How it works
11
-
12
- Meow-Swarm is a **background daemon** for coding tasks — think `nohup ./worker.sh &` but for AI coding agents:
13
-
14
- 1. **Dispatch** Run `meow -p "your task"` and it immediately returns
15
- 2. **Work** — The swarm picks tasks from a queue, runs them, checkpoints progress to SQLite
16
- 3. **Monitor** Watch progress via `meow --tui` (real-time dashboard) or `meow` (REPL)
17
- 4. **Recover** — Crashes and restarts don't lose work; interrupted tasks resume from the last checkpoint
18
-
19
- ## Quick start
20
-
21
- ```bash
22
- # Install (Node.js 18+ required)
23
- npm install -g meow-swarm
24
-
25
- # Configure API key (Anthropic)
26
- export ANTHROPIC_API_KEY="sk-ant-..."
27
-
28
- # Dispatch a task (runs in background)
29
- meow -p "add user registration to the API"
30
-
31
- # Monitor progress
32
- meow --tui
33
- ```
34
-
35
- ## Architecture
36
-
37
- ```
38
- meow -p "task" # CLI entry task queued in SQLite
39
-
40
- [L1 Liaison] # Validates and decomposes task
41
-
42
- [L2 Architect] # Breaks into subtasks, resolves dependencies
43
-
44
- [L3 SwarmManager] # Spawns specialist agents (Claude Code subprocesses)
45
-
46
- [Sandbox Gate] # Blocks dangerous shell commands (rm -rf, etc.)
47
-
48
- [Mission Reviewer] # Scores result against 7 quality criteria
49
-
50
- [L4 Auditor] # Final verification, checkpoints to SQLite
51
- ```
52
-
53
- - **Specialist agents** are Claude Code subprocesses that work on subtasks
54
- - **Checkpointing** means crashes are recoverable — tasks resume where they left off
55
- - **Safety sandbox** blocks destructive operations before they run
56
- - **Multi-agent coordination** via SQLite-backed task claims (no two agents work on the same task)
57
-
58
- ## Key features
59
-
60
- - **Crash-safe** — SQLite checkpointing survives power failures and restarts
61
- - **Process cleanup** — Stuck subprocesses are killed (Windows `taskkill /f /t`, POSIX `SIGKILL`)
62
- - **Safety gates** — Dangerous shell commands blocked before execution
63
- - **Recovery mode** — `meow --continue` replays stranded tasks on boot
64
- - **TUI dashboard** Real-time monitoring of agent status, token costs, and task progress
65
-
66
- ## Commands
67
-
68
- | Command | Description |
69
- |---------|-------------|
70
- | `meow -p "task"` | Dispatch task (headless, returns immediately) |
71
- | `meow` | Interactive REPL |
72
- | `meow --tui` | Terminal dashboard |
73
- | `meow --continue` | Resume stranded tasks after a crash |
74
-
75
- ## Configuration
76
-
77
- | Variable | Default | Description |
78
- |----------|---------|-------------|
79
- | `ANTHROPIC_API_KEY` | required | API key for the LLM |
80
- | `ANTHROPIC_MODEL` | `claude-sonnet-4` | Model to use |
81
- | `MEOW_DB` | `meow.db` | SQLite database path |
82
- | `MEOW_MODE` | `SEQUENTIAL` | `SEQUENTIAL`, `PARALLEL`, or `SHIP` |
83
-
84
- ---
85
-
86
- See `.env.example` for configuration options.
1
+ # meow-swarm
2
+
3
+ > One prompt. Meow plans it, builds it, verifies it, and repairs itself if anything breaks.
4
+
5
+ ```bash
6
+ meow -p "add OAuth2 login to the API with tests"
7
+ # → dispatched. check back with meow --tui
8
+ ```
9
+
10
+ Most coding agents execute your prompt once and hand you back a diff. Meow-swarm is different: it runs a full plan → build → quality-verify → self-repair loop autonomously, and doesn't mark a task done until it can show evidence the work is correct.
11
+
12
+ ---
13
+
14
+ ## What makes meow-swarm different
15
+
16
+ ### 1. Definition of Done before the first line of code
17
+
18
+ When a task arrives, meow derives explicit acceptance criteria from your request before touching anything. It knows what "done" looks like — specific, verifiable outcomes — before it starts. This is what separates a task that completes from one that finishes.
19
+
20
+ ### 2. Quality gates, not just a diff
21
+
22
+ After every execution, meow runs a structured self-review loop against a set of quality gates:
23
+
24
+ | Gate | What it checks |
25
+ |------|---------------|
26
+ | **Placeholder Detection** | No TODOs, FIXMEs, or stub bodies in produced code |
27
+ | **Lint / Type Check** | Zero errors from the project's linter and type checker |
28
+ | **Test Coverage** | Tests pass, coverage meets the project threshold |
29
+ | **Coherence** | The diff actually addresses the stated goal (LLM review pass) |
30
+ | **Human Sign-Off** | Production tasks require explicit approval before shipping |
31
+
32
+ If gates fail, meow feeds the specific issues back into the agent loop and retries — up to a configurable iteration limit. A `QualityConvergenceChecker` tracks whether quality is genuinely improving each iteration and stops early if it detects diminishing returns, so it doesn't burn tokens grinding on something unfixable.
33
+
34
+ ### 3. Evidence-based completion
35
+
36
+ Meow doesn't consider a task done because the code compiled. It runs the thing it built and captures the evidence:
37
+
38
+ - **stdout / stderr** from running the produced code or tests
39
+ - **Screenshots** for UI changes (visual diff against baseline)
40
+ - **File read-back** for generated artifacts — confirms the file has real content, not a stub
41
+
42
+ This evidence is fed to an LLM judge that scores the work against the original task description. Score below threshold → back into the loop with a specific critique. Score above threshold → task is marked complete with the evidence attached.
43
+
44
+ ### 4. MEOW-3-RULE: self-repair instead of giving up
45
+
46
+ When meow fails three consecutive attempts on a task, it doesn't ask you for help. It runs a targeted `claude -p` call — not to finish the task, but to diagnose and patch meow's own code, prompts, or tool configuration. After the patch, the task is re-queued for a fresh attempt with the fixed machinery.
47
+
48
+ ```
49
+ Task fails × 3
50
+ claude -p "diagnose why meow failed, fix meow's code"
51
+ → meow is patched
52
+ → task re-queued → succeeds
53
+ ```
54
+
55
+ The task and the mechanic are never conflated. Meow fixes itself; it doesn't sneak in a bad completion to avoid admitting failure.
56
+
57
+ ### 5. Skills-first execution
58
+
59
+ Before writing any code for common task types — code review, frontend design, testing, documentation — meow searches the community skills ecosystem:
60
+
61
+ - [`https://github.com/anthropics/skills`](https://github.com/anthropics/skills)
62
+ - [`https://github.com/vercel-labs/skills`](https://github.com/vercel-labs/skills/blob/main/skills/find-skills/SKILL.md)
63
+
64
+ A battle-tested code review skill has better prompts and output structure than anything meow would derive from scratch on every run. Skills are installed automatically if found (`npx skills add <skill> -g -y`) and invoked before falling back to raw LLM generation or a summon call.
65
+
66
+ ### 6. Background daemon — fire and forget
67
+
68
+ Meow is not a chat partner. It runs in the background like a worker process:
69
+
70
+ - Dispatch a task returns immediately, work runs async
71
+ - Checkpoints every state change to SQLite — crashes are recoverable
72
+ - `meow --continue` resumes stranded tasks on reboot
73
+ - `meow --tui` gives a live dashboard of agent status, task queue, and token costs
74
+
75
+ You can dispatch a task, close your laptop, and come back to a completed result with a full audit trail of every decision and tool call.
76
+
77
+ ### 7. Multi-layer architecture with specialist routing
78
+
79
+ ```
80
+ meow -p "task"
81
+
82
+ [L1 Liaison] Intent extraction, MissionBrief with acceptance criteria
83
+
84
+ [L2 Architect] Task decomposition, dependency resolution, specialist assignment
85
+
86
+ [L3 SwarmManager] Parallel or sequential execution across specialist agents
87
+
88
+ [Self-Review Loop] Quality gates, convergence check, evidence capture
89
+
90
+ [LLM Judge] Scores output against original task — passes or feeds critique back
91
+
92
+ [L4 Auditor] Final checkpoint to SQLite, cost tracking, audit ledger
93
+ ```
94
+
95
+ Each layer uses the right model for its job. The Liaison uses a fast model for sub-500ms intent parsing. Deep execution uses your configured model (Claude Sonnet by default). The judge uses a separate call with the full context to avoid self-grading bias.
96
+
97
+ ---
98
+
99
+ ## Quick start
100
+
101
+ ```bash
102
+ # Node.js 18+ required (Bun not supported — native SQLite addons require Node)
103
+ npm install -g meow-swarm
104
+
105
+ # Configure your API key
106
+ export ANTHROPIC_API_KEY="sk-ant-..."
107
+
108
+ # Dispatch a task
109
+ meow -p "refactor the auth module to use JWT and add tests"
110
+
111
+ # Watch it work
112
+ meow --tui
113
+ ```
114
+
115
+ ## Commands
116
+
117
+ | Command | Description |
118
+ |---------|-------------|
119
+ | `meow -p "task"` | Dispatch task headlessly (primary interface) |
120
+ | `meow` | Interactive REPL |
121
+ | `meow --tui` | Live terminal dashboard |
122
+ | `meow --continue` | Resume stranded tasks after a crash |
123
+ | `meow --monitor` | Run the monitoring agent (cluster analysis, patch suggestions) |
124
+
125
+ ## Configuration
126
+
127
+ Copy `.env.example` to `.env` and set:
128
+
129
+ | Variable | Default | Description |
130
+ |----------|---------|-------------|
131
+ | `ANTHROPIC_API_KEY` | required | API key |
132
+ | `ANTHROPIC_MODEL` | `claude-sonnet-4` | Model for execution |
133
+ | `MEOW_MODE` | `SHIP` | `SHIP`, `SEQUENTIAL`, `PARALLEL`, `ECOMODE`, `RALPH` |
134
+ | `MEOW_BUDGET_CENTS` | unset | Hard spend cap per session |
135
+ | `MEOW_DB` | `meow.db` | SQLite state database path |
136
+
137
+ **Execution modes:**
138
+
139
+ - `SHIP` — Full quality pipeline. Self-review loop, all gates, LLM judge. Use for production tasks.
140
+ - `SEQUENTIAL` — Gates enabled, no judge pass. Good for development iteration.
141
+ - `PARALLEL` — Maximum throughput, no quality gates. Use for bulk refactors you'll review yourself.
142
+ - `ECOMODE` — Cheap model, 1 retry, 30s timeout. Fast exploration.
143
+ - `RALPH` — Unlimited retries, relentless quality convergence. For hard problems where cost is secondary.
144
+
145
+ ---
146
+
147
+ ## MEOW-3-RULE (never violate this)
148
+
149
+ ```
150
+ Task arrives → meow -p (3 retry attempts)
151
+ ↓ fails × 3
152
+ claude -p (fixes meow's code/prompts — NOT the task)
153
+
154
+ User re-runs same task → meow → succeeds
155
+ ```
156
+
157
+ `claude -p` is a meow-swarm mechanic. It repairs broken machinery. It never completes the original task on meow's behalf.
158
+
159
+ ---
160
+
161
+ See `docs/STATUS.md` for current known issues and `docs/TODO.md` for the prioritized improvement backlog.