meow-swarm 0.4.0 → 0.5.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/L2 +0 -0
- package/README.md +161 -86
- package/dist/bin/meow-mcp.js +376 -119
- package/dist/bin/meow.js +1095 -217
- package/meow.db +0 -0
- package/meow.db-shm +0 -0
- package/meow.db-wal +0 -0
- package/package.json +1 -1
package/L2
ADDED
|
File without changes
|
package/README.md
CHANGED
|
@@ -1,86 +1,161 @@
|
|
|
1
|
-
#
|
|
2
|
-
|
|
3
|
-
|
|
4
|
-
|
|
5
|
-
```
|
|
6
|
-
meow -p "
|
|
7
|
-
# →
|
|
8
|
-
```
|
|
9
|
-
|
|
10
|
-
|
|
11
|
-
|
|
12
|
-
|
|
13
|
-
|
|
14
|
-
|
|
15
|
-
|
|
16
|
-
|
|
17
|
-
|
|
18
|
-
|
|
19
|
-
|
|
20
|
-
|
|
21
|
-
|
|
22
|
-
|
|
23
|
-
|
|
24
|
-
|
|
25
|
-
|
|
26
|
-
|
|
27
|
-
|
|
28
|
-
|
|
29
|
-
|
|
30
|
-
|
|
31
|
-
|
|
32
|
-
meow
|
|
33
|
-
|
|
34
|
-
|
|
35
|
-
|
|
36
|
-
|
|
37
|
-
|
|
38
|
-
|
|
39
|
-
|
|
40
|
-
|
|
41
|
-
|
|
42
|
-
|
|
43
|
-
|
|
44
|
-
|
|
45
|
-
|
|
46
|
-
|
|
47
|
-
|
|
48
|
-
|
|
49
|
-
|
|
50
|
-
|
|
51
|
-
|
|
52
|
-
|
|
53
|
-
|
|
54
|
-
|
|
55
|
-
|
|
56
|
-
|
|
57
|
-
|
|
58
|
-
|
|
59
|
-
|
|
60
|
-
|
|
61
|
-
-
|
|
62
|
-
-
|
|
63
|
-
|
|
64
|
-
-
|
|
65
|
-
|
|
66
|
-
|
|
67
|
-
|
|
68
|
-
|
|
69
|
-
|
|
70
|
-
|
|
71
|
-
|
|
72
|
-
|
|
73
|
-
|
|
74
|
-
|
|
75
|
-
|
|
76
|
-
|
|
77
|
-
|
|
78
|
-
|
|
79
|
-
|
|
80
|
-
|
|
81
|
-
|
|
82
|
-
|
|
83
|
-
|
|
84
|
-
|
|
85
|
-
|
|
86
|
-
|
|
1
|
+
# meow-swarm
|
|
2
|
+
|
|
3
|
+
> One prompt. Meow plans it, builds it, verifies it, and repairs itself if anything breaks.
|
|
4
|
+
|
|
5
|
+
```bash
|
|
6
|
+
meow -p "add OAuth2 login to the API with tests"
|
|
7
|
+
# → dispatched. check back with meow --tui
|
|
8
|
+
```
|
|
9
|
+
|
|
10
|
+
Most coding agents execute your prompt once and hand you back a diff. Meow-swarm is different: it runs a full plan → build → quality-verify → self-repair loop autonomously, and doesn't mark a task done until it can show evidence the work is correct.
|
|
11
|
+
|
|
12
|
+
---
|
|
13
|
+
|
|
14
|
+
## What makes meow-swarm different
|
|
15
|
+
|
|
16
|
+
### 1. Definition of Done before the first line of code
|
|
17
|
+
|
|
18
|
+
When a task arrives, meow derives explicit acceptance criteria from your request before touching anything. It knows what "done" looks like — specific, verifiable outcomes — before it starts. This is what separates a task that completes from one that finishes.
|
|
19
|
+
|
|
20
|
+
### 2. Quality gates, not just a diff
|
|
21
|
+
|
|
22
|
+
After every execution, meow runs a structured self-review loop against a set of quality gates:
|
|
23
|
+
|
|
24
|
+
| Gate | What it checks |
|
|
25
|
+
|------|---------------|
|
|
26
|
+
| **Placeholder Detection** | No TODOs, FIXMEs, or stub bodies in produced code |
|
|
27
|
+
| **Lint / Type Check** | Zero errors from the project's linter and type checker |
|
|
28
|
+
| **Test Coverage** | Tests pass, coverage meets the project threshold |
|
|
29
|
+
| **Coherence** | The diff actually addresses the stated goal (LLM review pass) |
|
|
30
|
+
| **Human Sign-Off** | Production tasks require explicit approval before shipping |
|
|
31
|
+
|
|
32
|
+
If gates fail, meow feeds the specific issues back into the agent loop and retries — up to a configurable iteration limit. A `QualityConvergenceChecker` tracks whether quality is genuinely improving each iteration and stops early if it detects diminishing returns, so it doesn't burn tokens grinding on something unfixable.
|
|
33
|
+
|
|
34
|
+
### 3. Evidence-based completion
|
|
35
|
+
|
|
36
|
+
Meow doesn't consider a task done because the code compiled. It runs the thing it built and captures the evidence:
|
|
37
|
+
|
|
38
|
+
- **stdout / stderr** from running the produced code or tests
|
|
39
|
+
- **Screenshots** for UI changes (visual diff against baseline)
|
|
40
|
+
- **File read-back** for generated artifacts — confirms the file has real content, not a stub
|
|
41
|
+
|
|
42
|
+
This evidence is fed to an LLM judge that scores the work against the original task description. Score below threshold → back into the loop with a specific critique. Score above threshold → task is marked complete with the evidence attached.
|
|
43
|
+
|
|
44
|
+
### 4. MEOW-3-RULE: self-repair instead of giving up
|
|
45
|
+
|
|
46
|
+
When meow fails three consecutive attempts on a task, it doesn't ask you for help. It runs a targeted `claude -p` call — not to finish the task, but to diagnose and patch meow's own code, prompts, or tool configuration. After the patch, the task is re-queued for a fresh attempt with the fixed machinery.
|
|
47
|
+
|
|
48
|
+
```
|
|
49
|
+
Task fails × 3
|
|
50
|
+
→ claude -p "diagnose why meow failed, fix meow's code"
|
|
51
|
+
→ meow is patched
|
|
52
|
+
→ task re-queued → succeeds
|
|
53
|
+
```
|
|
54
|
+
|
|
55
|
+
The task and the mechanic are never conflated. Meow fixes itself; it doesn't sneak in a bad completion to avoid admitting failure.
|
|
56
|
+
|
|
57
|
+
### 5. Skills-first execution
|
|
58
|
+
|
|
59
|
+
Before writing any code for common task types — code review, frontend design, testing, documentation — meow searches the community skills ecosystem:
|
|
60
|
+
|
|
61
|
+
- [`https://github.com/anthropics/skills`](https://github.com/anthropics/skills)
|
|
62
|
+
- [`https://github.com/vercel-labs/skills`](https://github.com/vercel-labs/skills/blob/main/skills/find-skills/SKILL.md)
|
|
63
|
+
|
|
64
|
+
A battle-tested code review skill has better prompts and output structure than anything meow would derive from scratch on every run. Skills are installed automatically if found (`npx skills add <skill> -g -y`) and invoked before falling back to raw LLM generation or a summon call.
|
|
65
|
+
|
|
66
|
+
### 6. Background daemon — fire and forget
|
|
67
|
+
|
|
68
|
+
Meow is not a chat partner. It runs in the background like a worker process:
|
|
69
|
+
|
|
70
|
+
- Dispatch a task → returns immediately, work runs async
|
|
71
|
+
- Checkpoints every state change to SQLite — crashes are recoverable
|
|
72
|
+
- `meow --continue` resumes stranded tasks on reboot
|
|
73
|
+
- `meow --tui` gives a live dashboard of agent status, task queue, and token costs
|
|
74
|
+
|
|
75
|
+
You can dispatch a task, close your laptop, and come back to a completed result with a full audit trail of every decision and tool call.
|
|
76
|
+
|
|
77
|
+
### 7. Multi-layer architecture with specialist routing
|
|
78
|
+
|
|
79
|
+
```
|
|
80
|
+
meow -p "task"
|
|
81
|
+
↓
|
|
82
|
+
[L1 Liaison] Intent extraction, MissionBrief with acceptance criteria
|
|
83
|
+
↓
|
|
84
|
+
[L2 Architect] Task decomposition, dependency resolution, specialist assignment
|
|
85
|
+
↓
|
|
86
|
+
[L3 SwarmManager] Parallel or sequential execution across specialist agents
|
|
87
|
+
↓
|
|
88
|
+
[Self-Review Loop] Quality gates, convergence check, evidence capture
|
|
89
|
+
↓
|
|
90
|
+
[LLM Judge] Scores output against original task — passes or feeds critique back
|
|
91
|
+
↓
|
|
92
|
+
[L4 Auditor] Final checkpoint to SQLite, cost tracking, audit ledger
|
|
93
|
+
```
|
|
94
|
+
|
|
95
|
+
Each layer uses the right model for its job. The Liaison uses a fast model for sub-500ms intent parsing. Deep execution uses your configured model (Claude Sonnet by default). The judge uses a separate call with the full context to avoid self-grading bias.
|
|
96
|
+
|
|
97
|
+
---
|
|
98
|
+
|
|
99
|
+
## Quick start
|
|
100
|
+
|
|
101
|
+
```bash
|
|
102
|
+
# Node.js 18+ required (Bun not supported — native SQLite addons require Node)
|
|
103
|
+
npm install -g meow-swarm
|
|
104
|
+
|
|
105
|
+
# Configure your API key
|
|
106
|
+
export ANTHROPIC_API_KEY="sk-ant-..."
|
|
107
|
+
|
|
108
|
+
# Dispatch a task
|
|
109
|
+
meow -p "refactor the auth module to use JWT and add tests"
|
|
110
|
+
|
|
111
|
+
# Watch it work
|
|
112
|
+
meow --tui
|
|
113
|
+
```
|
|
114
|
+
|
|
115
|
+
## Commands
|
|
116
|
+
|
|
117
|
+
| Command | Description |
|
|
118
|
+
|---------|-------------|
|
|
119
|
+
| `meow -p "task"` | Dispatch task headlessly (primary interface) |
|
|
120
|
+
| `meow` | Interactive REPL |
|
|
121
|
+
| `meow --tui` | Live terminal dashboard |
|
|
122
|
+
| `meow --continue` | Resume stranded tasks after a crash |
|
|
123
|
+
| `meow --monitor` | Run the monitoring agent (cluster analysis, patch suggestions) |
|
|
124
|
+
|
|
125
|
+
## Configuration
|
|
126
|
+
|
|
127
|
+
Copy `.env.example` to `.env` and set:
|
|
128
|
+
|
|
129
|
+
| Variable | Default | Description |
|
|
130
|
+
|----------|---------|-------------|
|
|
131
|
+
| `ANTHROPIC_API_KEY` | required | API key |
|
|
132
|
+
| `ANTHROPIC_MODEL` | `claude-sonnet-4` | Model for execution |
|
|
133
|
+
| `MEOW_MODE` | `SHIP` | `SHIP`, `SEQUENTIAL`, `PARALLEL`, `ECOMODE`, `RALPH` |
|
|
134
|
+
| `MEOW_BUDGET_CENTS` | unset | Hard spend cap per session |
|
|
135
|
+
| `MEOW_DB` | `meow.db` | SQLite state database path |
|
|
136
|
+
|
|
137
|
+
**Execution modes:**
|
|
138
|
+
|
|
139
|
+
- `SHIP` — Full quality pipeline. Self-review loop, all gates, LLM judge. Use for production tasks.
|
|
140
|
+
- `SEQUENTIAL` — Gates enabled, no judge pass. Good for development iteration.
|
|
141
|
+
- `PARALLEL` — Maximum throughput, no quality gates. Use for bulk refactors you'll review yourself.
|
|
142
|
+
- `ECOMODE` — Cheap model, 1 retry, 30s timeout. Fast exploration.
|
|
143
|
+
- `RALPH` — Unlimited retries, relentless quality convergence. For hard problems where cost is secondary.
|
|
144
|
+
|
|
145
|
+
---
|
|
146
|
+
|
|
147
|
+
## MEOW-3-RULE (never violate this)
|
|
148
|
+
|
|
149
|
+
```
|
|
150
|
+
Task arrives → meow -p (3 retry attempts)
|
|
151
|
+
↓ fails × 3
|
|
152
|
+
claude -p (fixes meow's code/prompts — NOT the task)
|
|
153
|
+
↓
|
|
154
|
+
User re-runs same task → meow → succeeds
|
|
155
|
+
```
|
|
156
|
+
|
|
157
|
+
`claude -p` is a meow-swarm mechanic. It repairs broken machinery. It never completes the original task on meow's behalf.
|
|
158
|
+
|
|
159
|
+
---
|
|
160
|
+
|
|
161
|
+
See `docs/STATUS.md` for current known issues and `docs/TODO.md` for the prioritized improvement backlog.
|