@zhixuan92/multi-model-agent 4.3.0 → 4.4.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +19 -22
- package/dist/http/handlers/tools/delegate.d.ts.map +1 -1
- package/dist/http/handlers/tools/delegate.js +8 -0
- package/dist/http/handlers/tools/delegate.js.map +1 -1
- package/dist/http/handlers/tools/execute-plan.d.ts.map +1 -1
- package/dist/http/handlers/tools/execute-plan.js +9 -1
- package/dist/http/handlers/tools/execute-plan.js.map +1 -1
- package/dist/http/handlers/tools/research.js +1 -1
- package/dist/http/handlers/tools/research.js.map +1 -1
- package/dist/http/handlers/tools/retry.js.map +1 -1
- package/dist/http/middleware/caller-identity.d.ts +5 -3
- package/dist/http/middleware/caller-identity.d.ts.map +1 -1
- package/dist/http/middleware/caller-identity.js.map +1 -1
- package/dist/http/request-pipeline.d.ts.map +1 -1
- package/dist/http/request-pipeline.js +8 -14
- package/dist/http/request-pipeline.js.map +1 -1
- package/dist/http/server.d.ts.map +1 -1
- package/dist/http/server.js +7 -11
- package/dist/http/server.js.map +1 -1
- package/dist/http/wire/register-all-handlers.d.ts.map +1 -1
- package/dist/http/wire/register-all-handlers.js +0 -2
- package/dist/http/wire/register-all-handlers.js.map +1 -1
- package/dist/skills/_shared/auth.md +4 -1
- package/dist/skills/mma-audit/SKILL.md +67 -60
- package/dist/skills/mma-context-blocks/SKILL.md +5 -3
- package/dist/skills/mma-debug/SKILL.md +7 -4
- package/dist/skills/mma-delegate/SKILL.md +3 -2
- package/dist/skills/mma-execute-plan/SKILL.md +2 -1
- package/dist/skills/mma-explore/SKILL.md +1 -1
- package/dist/skills/mma-investigate/SKILL.md +4 -1
- package/dist/skills/mma-research/SKILL.md +6 -1
- package/dist/skills/mma-retry/SKILL.md +6 -5
- package/dist/skills/mma-review/SKILL.md +4 -1
- package/dist/skills/multi-model-agent/SKILL.md +6 -11
- package/package.json +2 -2
- package/dist/http/handlers/tools/verify.d.ts +0 -4
- package/dist/http/handlers/tools/verify.d.ts.map +0 -1
- package/dist/http/handlers/tools/verify.js +0 -53
- package/dist/http/handlers/tools/verify.js.map +0 -1
- package/dist/skills/mma-verify/SKILL.md +0 -155
|
@@ -1,58 +1,45 @@
|
|
|
1
1
|
---
|
|
2
2
|
name: mma-audit
|
|
3
3
|
description: >-
|
|
4
|
-
Use when the user asks to audit a spec / plan / design doc /
|
|
5
|
-
|
|
6
|
-
|
|
7
|
-
|
|
8
|
-
|
|
9
|
-
|
|
4
|
+
Use when the user asks to audit a spec / plan / design doc / skill file. The
|
|
5
|
+
`subtype` field picks the criteria set. `default` (prose-coherence) is the
|
|
6
|
+
general doc auditor. `plan` verifies a code-execution plan against the actual
|
|
7
|
+
codebase — run this before any `mma-execute-plan` dispatch. `spec` audits
|
|
8
|
+
requirement prose for testability and decision-trace. `skill` audits a
|
|
9
|
+
SKILL.md against reader-effectiveness criteria.
|
|
10
10
|
when_to_use: >-
|
|
11
|
-
User asks for a doc/spec/plan/
|
|
11
|
+
User asks for a doc / spec / plan / skill audit OR a methodology skill
|
|
12
12
|
(superpowers:dispatching-parallel-agents, /security-review) points at one AND
|
|
13
13
|
mmagent is running. Audit on PROSE/SPEC docs — use mma-review for source code.
|
|
14
|
-
Audit a CODE-EXECUTION PLAN against the codebase — use
|
|
15
|
-
version: 4.
|
|
14
|
+
Audit a CODE-EXECUTION PLAN against the codebase — use subtype=plan.
|
|
15
|
+
version: 4.4.0
|
|
16
16
|
---
|
|
17
17
|
|
|
18
18
|
# mma-audit
|
|
19
19
|
|
|
20
20
|
## Overview
|
|
21
21
|
|
|
22
|
-
`mma-audit` sends a prose artifact to workers for structured auditing —
|
|
22
|
+
`mma-audit` sends a prose artifact to workers for structured auditing. The `subtype` field picks WHICH criteria set the workers apply — every subtype runs through the same sequential-criteria read-only lifecycle, but each one carries its own criteria list, semantics, and prompt scaffolding.
|
|
23
23
|
|
|
24
|
-
**
|
|
24
|
+
**Four subtypes — picked by the kind of artifact, not by the lens you want:**
|
|
25
25
|
|
|
26
26
|
| You're auditing… | Use… | What it checks |
|
|
27
27
|
|---|---|---|
|
|
28
|
-
| A
|
|
29
|
-
| A **code-execution PLAN** (`docs/superpowers/plans/*.md` or similar) before running it via `mma-execute-plan` | `
|
|
30
|
-
| A
|
|
31
|
-
| A
|
|
28
|
+
| A general prose artifact (design doc, recommendation, post-mortem, README) | `subtype: 'default'` | Comprehensive prose-coherence — would a literal-following worker produce the right outcome from this prose alone? Catches ambiguity, contradictions, missing branches, drift, scope-creep. **Does NOT verify against any codebase.** |
|
|
29
|
+
| A **code-execution PLAN** (`docs/superpowers/plans/*.md` or similar) before running it via `mma-execute-plan` | `subtype: 'plan'` | Plan-vs-codebase coherence — for every method / type / file path / signature / import / verify command the plan names, the codebase actually contains it as described. Catches the bug class the prose-coherence audit cannot see (e.g. plan says `registerBlock` but actual interface is `register`). |
|
|
30
|
+
| A **requirement spec** (what we want, why; success criteria) | `subtype: 'spec'` | Requirement-prose executability — every requirement testable, scope explicit, acceptance criteria covered, non-functional requirements captured, decision-trace exposed, conflicts surfaced. |
|
|
31
|
+
| A **SKILL.md** for an `mma-*` skill or comparable agent-facing playbook | `subtype: 'skill'` | Skill-file reader-effectiveness — when-to-use specificity, endpoint contract integrity, example correctness, anti-pattern coverage, link integrity. |
|
|
32
32
|
|
|
33
|
-
|
|
34
|
-
**Core principle (plan mode):** One worker per verification perspective (8 in parallel) = each dimension grounds independently in the codebase.
|
|
33
|
+
If you want to bias workers toward a narrow lens (security only, performance only, accessibility only), put that in the free-text `background` portion of the prompt — `subtype` is criteria machinery, not a lens selector.
|
|
35
34
|
|
|
36
35
|
## When to Use
|
|
37
36
|
|
|
38
|
-
|
|
39
|
-
-
|
|
40
|
-
-
|
|
41
|
-
-
|
|
37
|
+
- `subtype: 'default'` — a general prose artifact needs a critical read for internal executability (the artifact will be acted on by a worker reading the prose alone).
|
|
38
|
+
- `subtype: 'plan'` — you have a written code-execution plan on disk and you're about to dispatch tasks from it via `mma-execute-plan`. This is the ONLY subtype that grounds findings against real source files.
|
|
39
|
+
- `subtype: 'spec'` — you have a requirement / brainstorming-output spec and want to verify every requirement is testable, traceable, and unambiguous BEFORE writing the plan. Typical predecessor to `writing-plans`.
|
|
40
|
+
- `subtype: 'skill'` — you're authoring or revising an `mma-*` skill or comparable SKILL.md and want to know whether agents will actually read it the right way.
|
|
42
41
|
|
|
43
|
-
**
|
|
44
|
-
- You have a written code-execution plan on disk and you're about to dispatch tasks from it via `mma-execute-plan`
|
|
45
|
-
- You want to know: "Will this plan actually dispatch successfully against the codebase as it exists today?"
|
|
46
|
-
- This is the ONLY audit mode that grounds findings against real source files
|
|
47
|
-
|
|
48
|
-
**Don't use mma-audit when:**
|
|
49
|
-
- The thing being audited is source code → `mma-review` (knows about types, call sites, test coverage)
|
|
50
|
-
- You want a quick look ("does this look right?") → just `Read` and use your judgment
|
|
51
|
-
- You need to verify a plan dispatches but you haven't written it yet → write the plan first, then run plan-audit on it
|
|
52
|
-
|
|
53
|
-
## Endpoint
|
|
54
|
-
|
|
55
|
-
`POST /audit?cwd=<abs-path>`
|
|
42
|
+
**Don't use mma-audit when:** the thing being audited is source code (→ `mma-review`); a 30-second `Read` would answer it; or you want to verify a plan that hasn't been written yet (write the plan first).
|
|
56
43
|
|
|
57
44
|
## Endpoint
|
|
58
45
|
|
|
@@ -65,7 +52,7 @@ version: 4.3.0
|
|
|
65
52
|
```json
|
|
66
53
|
{
|
|
67
54
|
"document": "inline content to audit (optional if filePaths given)",
|
|
68
|
-
"
|
|
55
|
+
"subtype": "default",
|
|
69
56
|
"filePaths": ["/project/docs/spec.md"],
|
|
70
57
|
"contextBlockIds": []
|
|
71
58
|
}
|
|
@@ -74,7 +61,7 @@ version: 4.3.0
|
|
|
74
61
|
| Field | Type | Required | Notes |
|
|
75
62
|
|---|---|---|---|
|
|
76
63
|
| `document` | string | no | Inline document content |
|
|
77
|
-
| `
|
|
64
|
+
| `subtype` | `'default' \| 'plan' \| 'spec' \| 'skill'` | no (defaults to `'default'`) | See "Picking subtype" below. |
|
|
78
65
|
| `filePaths` | string[] | no | Files to audit (one worker per file, parallel) |
|
|
79
66
|
| `contextBlockIds` | string[] | no | IDs from `mma-context-blocks` |
|
|
80
67
|
|
|
@@ -82,60 +69,78 @@ Either `document` or `filePaths` (or both) must be provided.
|
|
|
82
69
|
|
|
83
70
|
> Worker tier for `mma-audit` is hardcoded to `complex` and is not caller-configurable. Sending `agentType` is rejected with HTTP 400.
|
|
84
71
|
|
|
85
|
-
### Picking
|
|
72
|
+
### Picking subtype
|
|
86
73
|
|
|
87
74
|
| Value | When to use |
|
|
88
75
|
|---|---|
|
|
89
|
-
| `default` (or omit the field) | **
|
|
90
|
-
| `plan`
|
|
91
|
-
| `
|
|
92
|
-
| `
|
|
93
|
-
|
|
94
|
-
**Plan vs Default — which to pick:** The artifact's NATURE decides:
|
|
95
|
-
- **Spec / requirements** (what we want, why) → `default`. Reviewing the prose alone is the goal.
|
|
96
|
-
- **Plan** (concrete tasks with code blocks, file paths, methods to call) → `plan`. The plan only matters if the codebase agrees with it.
|
|
76
|
+
| `default` (or omit the field) | **General prose — design doc, recommendation, post-mortem, README, brief.** Comprehensive prose-coherence audit. Does NOT verify against any codebase. |
|
|
77
|
+
| `plan` | **Code-execution plans being audited against a real codebase.** Single-file input (the plan markdown). Workers grep / read source files under `cwd` to verify every named symbol / path / signature / import / verify command. Use this BEFORE every `mma-execute-plan` dispatch. |
|
|
78
|
+
| `spec` | **Requirement spec / brainstorming-output / what-we-want prose.** Criteria target testability, scope explicitness, acceptance-criteria coverage, decision-trace, assumption exposure. |
|
|
79
|
+
| `skill` | **`SKILL.md` or comparable agent-facing playbook.** Criteria target when-to-use specificity, endpoint contract integrity, example correctness, anti-pattern coverage, link integrity. |
|
|
97
80
|
|
|
98
|
-
You can run BOTH on a plan: first `default` (prose quality
|
|
81
|
+
You can run BOTH on a plan: first `spec` or `default` (prose quality), then `plan` (does the plan match the codebase?). They cover orthogonal failure modes.
|
|
99
82
|
|
|
100
|
-
The legacy
|
|
83
|
+
The legacy `auditType` field and its `correctness` / `style` / `general` / `security` / `performance` values no longer exist. Sending `auditType` returns `400 invalid_request`. Sending unknown `subtype` values returns `400 invalid_request` with the allowed enum.
|
|
101
84
|
|
|
102
85
|
### Plan-audit specifics
|
|
103
86
|
|
|
104
|
-
When `
|
|
87
|
+
When `subtype: 'plan'`:
|
|
105
88
|
|
|
106
89
|
- `filePaths` MUST contain exactly **one entry** — the plan markdown. Sending zero or 2+ entries → `400 invalid_request` with the message: *"Plan audit takes exactly one filePath (the plan markdown). The worker discovers and verifies source files itself via its tool surface — do not pre-list source files."*
|
|
107
90
|
- `document` (inline content) is not used in plan mode — the plan must be on disk so workers can reference it by `?cwd=`-relative path.
|
|
108
|
-
-
|
|
109
|
-
-
|
|
110
|
-
- Only DISPATCH tasks that audit as `EXECUTABLE`. Fix the plan and re-audit if any task is `BLOCKED` or `PARTIAL`.
|
|
91
|
+
- The worker runs the sequential-criteria loop with the plan-audit criteria set: PATH EXISTENCE, SYMBOL EXISTENCE, SIGNATURE MATCH, IMPORT GRAPH, TEST HARNESS AVAILABILITY, STEP SEQUENCE WITHIN TASK, CROSS-TASK DEPENDENCIES, VERIFICATION COMMAND VALIDITY.
|
|
92
|
+
- Read the findings list. Fix the plan and re-audit if any `critical` or `high` plan-audit findings remain.
|
|
111
93
|
|
|
112
94
|
## Full example
|
|
113
95
|
|
|
114
|
-
### Default audit (
|
|
96
|
+
### Default audit (general prose)
|
|
115
97
|
|
|
116
98
|
```bash
|
|
117
99
|
BATCH=$(curl -f --show-error -s -X POST \
|
|
118
100
|
-H "X-MMA-Client: $MMA_CLIENT" \
|
|
101
|
+
-H "X-MMA-Main-Model: $MMA_MAIN_MODEL" \
|
|
119
102
|
-H "Authorization: Bearer $TOKEN" \
|
|
120
103
|
-H "Content-Type: application/json" \
|
|
121
|
-
-d '{"
|
|
104
|
+
-d '{"subtype":"default","filePaths":["/project/docs/api-spec.md"]}' \
|
|
122
105
|
"http://localhost:$PORT/audit?cwd=/project")
|
|
123
106
|
BATCH_ID=$(echo "$BATCH" | jq -r '.batchId')
|
|
124
107
|
```
|
|
125
108
|
|
|
126
|
-
###
|
|
109
|
+
### Spec audit (requirement prose)
|
|
127
110
|
|
|
128
111
|
```bash
|
|
129
112
|
BATCH=$(curl -f --show-error -s -X POST \
|
|
130
113
|
-H "X-MMA-Client: $MMA_CLIENT" \
|
|
114
|
+
-H "X-MMA-Main-Model: $MMA_MAIN_MODEL" \
|
|
131
115
|
-H "Authorization: Bearer $TOKEN" \
|
|
132
116
|
-H "Content-Type: application/json" \
|
|
133
|
-
-d '{"
|
|
117
|
+
-d '{"subtype":"spec","filePaths":["/project/docs/superpowers/specs/2026-05-12-feature-design.md"]}' \
|
|
134
118
|
"http://localhost:$PORT/audit?cwd=/project")
|
|
135
|
-
BATCH_ID=$(echo "$BATCH" | jq -r '.batchId')
|
|
136
119
|
```
|
|
137
120
|
|
|
138
|
-
|
|
121
|
+
### Skill audit (SKILL.md)
|
|
122
|
+
|
|
123
|
+
```bash
|
|
124
|
+
BATCH=$(curl -f --show-error -s -X POST \
|
|
125
|
+
-H "X-MMA-Client: $MMA_CLIENT" \
|
|
126
|
+
-H "X-MMA-Main-Model: $MMA_MAIN_MODEL" \
|
|
127
|
+
-H "Authorization: Bearer $TOKEN" \
|
|
128
|
+
-H "Content-Type: application/json" \
|
|
129
|
+
-d '{"subtype":"skill","filePaths":["/project/packages/server/src/skills/mma-audit/SKILL.md"]}' \
|
|
130
|
+
"http://localhost:$PORT/audit?cwd=/project")
|
|
131
|
+
```
|
|
132
|
+
|
|
133
|
+
### Plan audit (verify a code-execution plan against the codebase)
|
|
134
|
+
|
|
135
|
+
```bash
|
|
136
|
+
BATCH=$(curl -f --show-error -s -X POST \
|
|
137
|
+
-H "X-MMA-Client: $MMA_CLIENT" \
|
|
138
|
+
-H "X-MMA-Main-Model: $MMA_MAIN_MODEL" \
|
|
139
|
+
-H "Authorization: Bearer $TOKEN" \
|
|
140
|
+
-H "Content-Type: application/json" \
|
|
141
|
+
-d '{"subtype":"plan","filePaths":["/project/docs/superpowers/plans/2026-05-10-feature.md"]}' \
|
|
142
|
+
"http://localhost:$PORT/audit?cwd=/project")
|
|
143
|
+
```
|
|
139
144
|
|
|
140
145
|
@include _shared/polling.md
|
|
141
146
|
|
|
@@ -185,7 +190,9 @@ This skill is one step in the larger flow described in `multi-model-agent` → "
|
|
|
185
190
|
|
|
186
191
|
- **Recipe A — Audit-iterate-clean.** `mma-audit` → fix → `mma-audit` again. Sequential rounds. Register the doc via `mma-context-blocks` before round 1 and reuse the same ID across all rounds — avoids re-inlining the same content into every audit call.
|
|
187
192
|
|
|
188
|
-
- **Recipe E — Plan-validate-execute
|
|
193
|
+
- **Recipe E — Plan-validate-execute.** Before any `mma-execute-plan` batch, run `mma-audit` with `subtype: 'plan'` on the plan file. Read the findings. If any `critical` / `high` finding survives, fix the plan and re-audit. This catches the bug class where the plan's named methods/files don't actually exist in the codebase — symbols a prose-coherence audit cannot see.
|
|
194
|
+
|
|
195
|
+
- **Recipe F — Spec-then-plan-then-execute.** When working from a brainstorming spec: `mma-audit` (`subtype: 'spec'`) → fix → `writing-plans` → `mma-audit` (`subtype: 'plan'`) → fix → `mma-execute-plan`. Spec and plan audits catch orthogonal problem classes.
|
|
189
196
|
|
|
190
197
|
Anti-pattern alert: **`parallel-rounds-same-target`** (AP1). Three parallel audits on the same document re-flag the same issues without seeing each other's fixes. Run rounds sequentially with a fix between each.
|
|
191
198
|
|
|
@@ -197,8 +204,8 @@ The auditor lacks codebase context (no type info, no call-site lookup, no test a
|
|
|
197
204
|
❌ **Single huge `document` string instead of `filePaths`**
|
|
198
205
|
Inline docs lose the file boundary, so the per-file parallel split degenerates to one worker. **Fix:** save to disk first, pass `filePaths`.
|
|
199
206
|
|
|
200
|
-
❌ **Sending legacy auditType
|
|
201
|
-
|
|
207
|
+
❌ **Sending the legacy `auditType` field**
|
|
208
|
+
The field was renamed to `subtype` and the value set was narrowed. **Fix:** use `subtype` with one of `default` / `plan` / `spec` / `skill`. For "security only" / "performance only" lenses, put the bias in the free-text prompt — there is no narrow-lens subtype.
|
|
202
209
|
|
|
203
210
|
❌ **Re-auditing the same files round after round without delta context**
|
|
204
211
|
Round 2 worker has no idea what round 1 found. **Fix:** register the round 1 findings as a context block (`mma-context-blocks`) and pass `contextBlockIds` to round 2.
|
|
@@ -10,9 +10,9 @@ when_to_use: >-
|
|
|
10
10
|
A document (spec, plan, codebase summary, prior round's findings, error log)
|
|
11
11
|
larger than ~2 KB will be referenced by two or more mma-* calls in a row.
|
|
12
12
|
Register once here, then pass the ID via `contextBlockIds` on mma-delegate /
|
|
13
|
-
mma-execute-plan / mma-audit / mma-review / mma-
|
|
14
|
-
|
|
15
|
-
version: 4.
|
|
13
|
+
mma-execute-plan / mma-audit / mma-review / mma-debug / mma-investigate.
|
|
14
|
+
Cheaper and faster than inlining the same content N times.
|
|
15
|
+
version: 4.4.0
|
|
16
16
|
---
|
|
17
17
|
|
|
18
18
|
# mma-context-blocks
|
|
@@ -78,6 +78,7 @@ Returns `200 { ok: true }` on success. Returns `409 pinned` if the block is held
|
|
|
78
78
|
# Register spec document once
|
|
79
79
|
ID=$(curl -f --show-error -s -X POST \
|
|
80
80
|
-H "X-MMA-Client: $MMA_CLIENT" \
|
|
81
|
+
-H "X-MMA-Main-Model: $MMA_MAIN_MODEL" \
|
|
81
82
|
-H "Authorization: Bearer $TOKEN" \
|
|
82
83
|
-H "Content-Type: application/json" \
|
|
83
84
|
-d "{\"content\":$(jq -Rs . < /project/docs/spec.md)}" \
|
|
@@ -86,6 +87,7 @@ ID=$(curl -f --show-error -s -X POST \
|
|
|
86
87
|
# Reference from N delegate tasks
|
|
87
88
|
curl -f --show-error -s -X POST \
|
|
88
89
|
-H "X-MMA-Client: $MMA_CLIENT" \
|
|
90
|
+
-H "X-MMA-Main-Model: $MMA_MAIN_MODEL" \
|
|
89
91
|
-H "Authorization: Bearer $TOKEN" \
|
|
90
92
|
-H "Content-Type: application/json" \
|
|
91
93
|
-d "{\"tasks\":[
|
|
@@ -10,7 +10,7 @@ when_to_use: >-
|
|
|
10
10
|
read files, reproduce, trace — OR a methodology skill
|
|
11
11
|
(superpowers:systematic-debugging) points at the investigation step. Delegate
|
|
12
12
|
the read/reproduce/trace; the main agent stays on the hypothesis and the fix.
|
|
13
|
-
version: 4.
|
|
13
|
+
version: 4.4.0
|
|
14
14
|
---
|
|
15
15
|
|
|
16
16
|
# mma-debug
|
|
@@ -47,6 +47,7 @@ Submit a problem, context, and hypothesis to a worker for focused debugging. Unl
|
|
|
47
47
|
"problem": "POST /login returns 500 when password contains special characters",
|
|
48
48
|
"context": "Regression introduced in commit abc123; only affects production config",
|
|
49
49
|
"hypothesis": "The bcrypt binding fails on non-ASCII input in the Docker image",
|
|
50
|
+
"subtype": "default",
|
|
50
51
|
"filePaths": [
|
|
51
52
|
"/project/src/auth/login.ts",
|
|
52
53
|
"/project/src/auth/password.ts"
|
|
@@ -60,6 +61,7 @@ Submit a problem, context, and hypothesis to a worker for focused debugging. Unl
|
|
|
60
61
|
| `problem` | string | yes | What is broken (one sentence; concrete symptom) |
|
|
61
62
|
| `context` | string | no | Background — what changed recently, what works, what doesn't |
|
|
62
63
|
| `hypothesis` | string | no | Your initial theory; worker tests it first, then explores |
|
|
64
|
+
| `subtype` | `'default'` | no (defaults to `'default'`) | Reserved for future criteria sets; only `default` is wired today. |
|
|
63
65
|
| `filePaths` | string[] | no | All files investigated together (cross-file reasoning) |
|
|
64
66
|
| `contextBlockIds` | string[] | no | IDs from `mma-context-blocks` (e.g. error logs, traces) |
|
|
65
67
|
|
|
@@ -70,6 +72,7 @@ Submit a problem, context, and hypothesis to a worker for focused debugging. Unl
|
|
|
70
72
|
```bash
|
|
71
73
|
BATCH=$(curl -f --show-error -s -X POST \
|
|
72
74
|
-H "X-MMA-Client: $MMA_CLIENT" \
|
|
75
|
+
-H "X-MMA-Main-Model: $MMA_MAIN_MODEL" \
|
|
73
76
|
-H "Authorization: Bearer $TOKEN" \
|
|
74
77
|
-H "Content-Type: application/json" \
|
|
75
78
|
-d '{"problem":"Tests fail on CI only","hypothesis":"Missing env var","filePaths":["/project/src/config.ts"]}' \
|
|
@@ -123,7 +126,7 @@ Every finding has the same shape:
|
|
|
123
126
|
|
|
124
127
|
This skill is one step in the larger flow described in `multi-model-agent` → "Best practices". Recipes that involve `mma-debug`:
|
|
125
128
|
|
|
126
|
-
- **Recipe B — Debug-fix-
|
|
129
|
+
- **Recipe B — Debug-fix-review.** `mma-debug` → `mma-delegate` (apply fix) → `mma-review` with the acceptance criteria in the brief. Strict order. Register the failing test output / reproduction log as a context block before the debug call; reuse it on the review call.
|
|
127
130
|
|
|
128
131
|
Anti-pattern alert: **`inline-labor-leakage`** (AP2). If you're about to read 3+ files in main context to "understand the bug," that's the labor we delegate — call `mma-debug` with the hypothesis instead.
|
|
129
132
|
|
|
@@ -152,8 +155,8 @@ Every completed task automatically registers a terminal markdown context block c
|
|
|
152
155
|
|
|
153
156
|
**Use cases:**
|
|
154
157
|
- Pass debug findings to a downstream `mma-delegate` fix step
|
|
155
|
-
- Feed the root-cause analysis into `mma-
|
|
156
|
-
- Carry debug context forward through the debug → fix →
|
|
158
|
+
- Feed the root-cause analysis into a follow-up `mma-review` with acceptance criteria in the brief
|
|
159
|
+
- Carry debug context forward through the debug → fix → review chain
|
|
157
160
|
|
|
158
161
|
The block is registered server-side at task completion; no caller action is needed to create it. Delete it explicitly via `DELETE /context-blocks/:id` when no longer needed, or let it expire on session teardown.
|
|
159
162
|
|
|
@@ -11,7 +11,7 @@ when_to_use: >-
|
|
|
11
11
|
and keep main context free. If a plan file exists → use mma-execute-plan. If
|
|
12
12
|
the task is audit / review / verify / debug / investigate → use the matching
|
|
13
13
|
specialized skill.
|
|
14
|
-
version: 4.
|
|
14
|
+
version: 4.4.0
|
|
15
15
|
---
|
|
16
16
|
|
|
17
17
|
# mma-delegate
|
|
@@ -76,6 +76,7 @@ Dispatch one or more ad-hoc tasks to workers concurrently. Each task is an indep
|
|
|
76
76
|
```bash
|
|
77
77
|
BATCH=$(curl -f --show-error -s -X POST \
|
|
78
78
|
-H "X-MMA-Client: $MMA_CLIENT" \
|
|
79
|
+
-H "X-MMA-Main-Model: $MMA_MAIN_MODEL" \
|
|
79
80
|
-H "Authorization: Bearer $TOKEN" \
|
|
80
81
|
-H "Content-Type: application/json" \
|
|
81
82
|
-d '{"tasks":[{"prompt":"Refactor utils.ts to remove dead code","filePaths":["/project/src/utils.ts"]}]}' \
|
|
@@ -94,7 +95,7 @@ BATCH_ID=$(echo "$BATCH" | jq -r '.batchId')
|
|
|
94
95
|
This skill is one step in the larger flow described in `multi-model-agent` → "Best practices". Recipes that involve `mma-delegate`:
|
|
95
96
|
|
|
96
97
|
- **Recipe A (the fix step).** Between audit rounds, `mma-delegate` applies the fix when the change is more than 1-2 lines. Register the spec/audit findings as a context block; pass via `contextBlockIds`.
|
|
97
|
-
- **Recipe B (the apply-fix step).** After `mma-debug` returns a hypothesis, `mma-delegate` applies the fix. Same context block carries forward to `mma-
|
|
98
|
+
- **Recipe B (the apply-fix step).** After `mma-debug` returns a hypothesis, `mma-delegate` applies the fix. Same context block carries forward to a follow-up `mma-review` if you want acceptance-criteria checking.
|
|
98
99
|
|
|
99
100
|
Anti-pattern alert: **`inline-labor-leakage`** (AP2). If you're reading 3+ files or grepping in main context before dispatching, you're paying flagship-model tokens for labor. Pass the file paths to `mma-delegate` and let the worker read.
|
|
100
101
|
|
|
@@ -10,7 +10,7 @@ when_to_use: >-
|
|
|
10
10
|
superpowers:subagent-driven-development / superpowers:executing-plans —
|
|
11
11
|
workers are cheaper and don't pollute main context. Task descriptors must
|
|
12
12
|
match plan headings verbatim.
|
|
13
|
-
version: 4.
|
|
13
|
+
version: 4.4.0
|
|
14
14
|
---
|
|
15
15
|
|
|
16
16
|
# mma-execute-plan
|
|
@@ -73,6 +73,7 @@ Dispatch named tasks from a plan file to workers. Each `taskDescriptors` string
|
|
|
73
73
|
```bash
|
|
74
74
|
BATCH=$(curl -f --show-error -s -X POST \
|
|
75
75
|
-H "X-MMA-Client: $MMA_CLIENT" \
|
|
76
|
+
-H "X-MMA-Main-Model: $MMA_MAIN_MODEL" \
|
|
76
77
|
-H "Authorization: Bearer $TOKEN" \
|
|
77
78
|
-H "Content-Type: application/json" \
|
|
78
79
|
-d '{"taskDescriptors":["3. Migrate database schema"],"filePaths":["/project/docs/plan.md"]}' \
|
|
@@ -12,7 +12,7 @@ when_to_use: >-
|
|
|
12
12
|
out mma-investigate (internal) + mma-research (external) in parallel and
|
|
13
13
|
synthesise the results yourself. DO NOT use for convergent single-answer
|
|
14
14
|
questions — those are mma-investigate.
|
|
15
|
-
version: 4.
|
|
15
|
+
version: 4.4.0
|
|
16
16
|
---
|
|
17
17
|
|
|
18
18
|
# mma-explore
|
|
@@ -12,7 +12,7 @@ when_to_use: >-
|
|
|
12
12
|
git-history queries. OR you are about to read 3+ files / run any grep in main
|
|
13
13
|
context — that's the inline-labor-leakage anti-pattern (AP2); delegate to this
|
|
14
14
|
skill instead.
|
|
15
|
-
version: 4.
|
|
15
|
+
version: 4.4.0
|
|
16
16
|
---
|
|
17
17
|
|
|
18
18
|
# mma-investigate
|
|
@@ -67,6 +67,7 @@ digraph when_to_use {
|
|
|
67
67
|
```json
|
|
68
68
|
{
|
|
69
69
|
"question": "How does the auth middleware handle token refresh?",
|
|
70
|
+
"subtype": "default",
|
|
70
71
|
"filePaths": ["/project/src/auth/"],
|
|
71
72
|
"contextBlockIds": []
|
|
72
73
|
}
|
|
@@ -75,6 +76,7 @@ digraph when_to_use {
|
|
|
75
76
|
| Field | Type | Required | Notes |
|
|
76
77
|
|---|---|---|---|
|
|
77
78
|
| `question` | string | yes | Natural-language investigation question |
|
|
79
|
+
| `subtype` | `'default'` | no (defaults to `'default'`) | Reserved for future criteria sets; only `default` is wired today. |
|
|
78
80
|
| `filePaths` | string[] | no | Anchor paths the worker starts from. Worker may grep beyond. |
|
|
79
81
|
| `contextBlockIds` | string[] | no | IDs from `mma-context-blocks` — enables follow-up / delta investigation |
|
|
80
82
|
| `tools` | `'none' \| 'readonly'` | no | Default `'readonly'`. `'no-shell'` and `'full'` are rejected — investigation is read-only |
|
|
@@ -93,6 +95,7 @@ digraph when_to_use {
|
|
|
93
95
|
```bash
|
|
94
96
|
BATCH=$(curl -f --show-error -s -X POST \
|
|
95
97
|
-H "X-MMA-Client: $MMA_CLIENT" \
|
|
98
|
+
-H "X-MMA-Main-Model: $MMA_MAIN_MODEL" \
|
|
96
99
|
-H "Authorization: Bearer $TOKEN" \
|
|
97
100
|
-H "Content-Type: application/json" \
|
|
98
101
|
-d '{"question":"How does the auth middleware handle token refresh?"}' \
|
|
@@ -10,7 +10,7 @@ when_to_use: >-
|
|
|
10
10
|
others do, what published methods exist) AND mmagent is running. Delegate the
|
|
11
11
|
multi-source web/adapter research to a worker so the main context stays on
|
|
12
12
|
judgment. NOT for codebase questions — those are mma-investigate.
|
|
13
|
-
version: 4.
|
|
13
|
+
version: 4.4.0
|
|
14
14
|
---
|
|
15
15
|
|
|
16
16
|
# mma-research
|
|
@@ -53,6 +53,7 @@ mean and which directions to pursue.
|
|
|
53
53
|
{
|
|
54
54
|
"researchQuestion": "What approaches exist for streaming JSON parsing under 100KB?",
|
|
55
55
|
"background": "We currently use a single-pass push parser; we want to evaluate alternatives.",
|
|
56
|
+
"subtype": "default",
|
|
56
57
|
"contextBlockIds": []
|
|
57
58
|
}
|
|
58
59
|
```
|
|
@@ -61,16 +62,20 @@ mean and which directions to pursue.
|
|
|
61
62
|
|---|---|---|---|
|
|
62
63
|
| `researchQuestion` | string | yes | 20–8000 chars |
|
|
63
64
|
| `background` | string | yes | 20–8000 chars; what you already know / are trying to do |
|
|
65
|
+
| `subtype` | `'default'` | no (defaults to `'default'`) | Reserved for future criteria sets; only `default` is wired today. |
|
|
64
66
|
| `contextBlockIds` | string[] | no | IDs from `mma-context-blocks` |
|
|
65
67
|
|
|
66
68
|
> Worker tier is hardcoded `complex`. Sending `agentType` or `tools` is rejected with HTTP 400.
|
|
67
69
|
|
|
70
|
+
The `default` subtype's criteria target primary-source preference, practitioner consensus, recency, counter-perspectives, and cross-domain analogues — the worker is bibliographic, not opinionated.
|
|
71
|
+
|
|
68
72
|
## Full example
|
|
69
73
|
|
|
70
74
|
```bash
|
|
71
75
|
BATCH=$(curl -f -sS -X POST \
|
|
72
76
|
-H "Authorization: Bearer $TOKEN" \
|
|
73
77
|
-H "X-MMA-Client: $MMA_CLIENT" \
|
|
78
|
+
-H "X-MMA-Main-Model: $MMA_MAIN_MODEL" \
|
|
74
79
|
-H "Content-Type: application/json" \
|
|
75
80
|
-d '{
|
|
76
81
|
"researchQuestion": "State-of-the-art SIMD JSON parsers under 100KB?",
|
|
@@ -6,11 +6,11 @@ description: >-
|
|
|
6
6
|
re-dispatching the whole batch
|
|
7
7
|
when_to_use: >-
|
|
8
8
|
A previous mma-delegate / mma-execute-plan / mma-audit / mma-review /
|
|
9
|
-
mma-
|
|
10
|
-
|
|
11
|
-
|
|
12
|
-
|
|
13
|
-
version: 4.
|
|
9
|
+
mma-debug / mma-investigate batch returned partial results AND you want to
|
|
10
|
+
re-try the failed indices only. Prefer this over re-dispatching the whole
|
|
11
|
+
batch or inline-retrying — it's idempotent and preserves the original batch's
|
|
12
|
+
diagnostics.
|
|
13
|
+
version: 4.4.0
|
|
14
14
|
---
|
|
15
15
|
|
|
16
16
|
# mma-retry
|
|
@@ -78,6 +78,7 @@ To re-run all tasks: pass `[0, 1, ..., tasks.length - 1]`. (But consider: if all
|
|
|
78
78
|
# Original batch had 4 tasks; re-run tasks at index 1 and 3
|
|
79
79
|
BATCH=$(curl -f --show-error -s -X POST \
|
|
80
80
|
-H "X-MMA-Client: $MMA_CLIENT" \
|
|
81
|
+
-H "X-MMA-Main-Model: $MMA_MAIN_MODEL" \
|
|
81
82
|
-H "Authorization: Bearer $TOKEN" \
|
|
82
83
|
-H "Content-Type: application/json" \
|
|
83
84
|
-d '{"batchId":"550e8400-e29b-41d4-a716-446655440000","taskIndices":[1,3]}' \
|
|
@@ -10,7 +10,7 @@ when_to_use: >-
|
|
|
10
10
|
AND mmagent is running. Delegate so each file reviews on its own worker; the
|
|
11
11
|
main agent only decides what to merge. Review on SOURCE CODE — use mma-audit
|
|
12
12
|
for prose specs / configs.
|
|
13
|
-
version: 4.
|
|
13
|
+
version: 4.4.0
|
|
14
14
|
---
|
|
15
15
|
|
|
16
16
|
# mma-review
|
|
@@ -55,6 +55,7 @@ The cross-file ripple pass (changed-symbol → broken caller) only fires when th
|
|
|
55
55
|
{
|
|
56
56
|
"code": "inline code snippet (optional if filePaths given)",
|
|
57
57
|
"focus": ["correctness", "security"],
|
|
58
|
+
"subtype": "default",
|
|
58
59
|
"filePaths": ["/project/src/auth/login.ts"],
|
|
59
60
|
"contextBlockIds": []
|
|
60
61
|
}
|
|
@@ -64,6 +65,7 @@ The cross-file ripple pass (changed-symbol → broken caller) only fires when th
|
|
|
64
65
|
|---|---|---|---|
|
|
65
66
|
| `code` | string | no | Inline code snippet to review |
|
|
66
67
|
| `focus` | string[] | no | Any of `security`, `performance`, `correctness`, `style`. Omit for general review. |
|
|
68
|
+
| `subtype` | `'default'` | no (defaults to `'default'`) | Reserved for future criteria sets; only `default` is wired today. |
|
|
67
69
|
| `filePaths` | string[] | no | Files to review (one worker per file, parallel) |
|
|
68
70
|
| `contextBlockIds` | string[] | no | IDs from `mma-context-blocks` — useful for design docs the reviewer should validate against |
|
|
69
71
|
|
|
@@ -76,6 +78,7 @@ Either `code` or `filePaths` (or both) must be provided.
|
|
|
76
78
|
```bash
|
|
77
79
|
BATCH=$(curl -f --show-error -s -X POST \
|
|
78
80
|
-H "X-MMA-Client: $MMA_CLIENT" \
|
|
81
|
+
-H "X-MMA-Main-Model: $MMA_MAIN_MODEL" \
|
|
79
82
|
-H "Authorization: Bearer $TOKEN" \
|
|
80
83
|
-H "Content-Type: application/json" \
|
|
81
84
|
-d '{"focus":["security","correctness"],"filePaths":["/project/src/auth/login.ts"]}' \
|
|
@@ -11,7 +11,7 @@ when_to_use: >-
|
|
|
11
11
|
tasks — AND mmagent is running. Read this once, pick the matching mma-* skill,
|
|
12
12
|
and delegate there. Applies equally whether the user invoked a superpowers
|
|
13
13
|
methodology skill or asked directly.
|
|
14
|
-
version: 4.
|
|
14
|
+
version: 4.4.0
|
|
15
15
|
---
|
|
16
16
|
|
|
17
17
|
# multi-model-agent (router)
|
|
@@ -36,7 +36,6 @@ digraph picker {
|
|
|
36
36
|
"mma-execute-plan" [shape=box];
|
|
37
37
|
"mma-audit" [shape=box];
|
|
38
38
|
"mma-review" [shape=box];
|
|
39
|
-
"mma-verify" [shape=box];
|
|
40
39
|
"mma-debug" [shape=box];
|
|
41
40
|
"mma-investigate" [shape=box];
|
|
42
41
|
"mma-explore" [shape=box];
|
|
@@ -47,9 +46,7 @@ digraph picker {
|
|
|
47
46
|
"Audit a doc?" -> "mma-audit" [label="yes"];
|
|
48
47
|
"Audit a doc?" -> "Review code?" [label="no"];
|
|
49
48
|
"Review code?" -> "mma-review" [label="yes"];
|
|
50
|
-
"Review code?" -> "
|
|
51
|
-
"Verify a checklist?" -> "mma-verify" [label="yes"];
|
|
52
|
-
"Verify a checklist?" -> "Debug a failure?" [label="no"];
|
|
49
|
+
"Review code?" -> "Debug a failure?" [label="no"];
|
|
53
50
|
"Debug a failure?" -> "mma-debug" [label="yes"];
|
|
54
51
|
"Debug a failure?" -> "Codebase question?" [label="no"];
|
|
55
52
|
"Codebase question?" -> "Convergent or divergent?" [label="yes"];
|
|
@@ -63,8 +60,7 @@ digraph picker {
|
|
|
63
60
|
|---|---|
|
|
64
61
|
| `mma-execute-plan` | Implement tasks from a plan or spec file (descriptors match plan headings) |
|
|
65
62
|
| `mma-audit` | Audit a document/spec/config for security, correctness, style, or performance |
|
|
66
|
-
| `mma-review` | Review code for quality, security, performance, correctness |
|
|
67
|
-
| `mma-verify` | Verify work against a checklist (one item per worker, parallel) |
|
|
63
|
+
| `mma-review` | Review code for quality, security, performance, correctness. Pass acceptance checklists in the brief if you need verification-style checks. |
|
|
68
64
|
| `mma-debug` | Debug a failure with a structured hypothesis |
|
|
69
65
|
| `mma-investigate` | Codebase Q&A — structured answer with `file:line` citations + confidence |
|
|
70
66
|
| `mma-explore` | Divergent ideation from codebase + web research — use before `superpowers:brainstorming` |
|
|
@@ -107,9 +103,9 @@ Any artifact (spec, plan, prior-round findings, long error log) that crosses 2+
|
|
|
107
103
|
|
|
108
104
|
`mma-audit` → read findings → fix (inline if 1-2 lines, else `mma-delegate`) → `mma-audit` again. Sequential rounds, NOT parallel re-audits. The fix produces new edges; round 2 catches what round 1 couldn't see. Register the doc as a context block before round 1; reuse the same ID across all rounds. The same shape applies to `mma-review` for source code (review → fix → re-review).
|
|
109
105
|
|
|
110
|
-
### Recipe B — Debug-fix-
|
|
106
|
+
### Recipe B — Debug-fix-review
|
|
111
107
|
|
|
112
|
-
`mma-debug` (read/reproduce/trace) → `mma-delegate` (apply the fix the hypothesis implies) → `mma-
|
|
108
|
+
`mma-debug` (read/reproduce/trace) → `mma-delegate` (apply the fix the hypothesis implies) → `mma-review` with the acceptance criteria included in the brief. Three skills, strict order. Register the failing test output / reproduction log as a context block before the debug call; reuse it on the review call so the reviewer can compare against the same evidence.
|
|
113
109
|
|
|
114
110
|
### Recipe C — Investigate-plan-execute
|
|
115
111
|
|
|
@@ -121,7 +117,7 @@ When `mma-execute-plan` returns mixed `done` / `done_with_concerns` / `failed`,
|
|
|
121
117
|
|
|
122
118
|
### Anti-patterns
|
|
123
119
|
|
|
124
|
-
1. **`parallel-rounds-same-target`** — Caller fans out 3 parallel calls of the same skill on the same target — `mma-audit` on one document, `mma-review` on the same source file
|
|
120
|
+
1. **`parallel-rounds-same-target`** — Caller fans out 3 parallel calls of the same skill on the same target — `mma-audit` on one document, or `mma-review` on the same source file. The reports overlap heavily; later rounds never see the fix from earlier rounds, so they re-flag the same issues. Corrective: sequential rounds with a fix between each (Recipe A).
|
|
125
121
|
|
|
126
122
|
2. **`inline-labor-leakage`** — Caller does 3+ `Read` calls, or any `grep`, in main context "just to understand the situation." Main tokens get burned on labor; the answer the caller actually needs is one paragraph of synthesis. Corrective: `mma-investigate` for codebase Q&A; if the goal is implementation, jump straight to `mma-delegate` with file paths and let the worker read.
|
|
127
123
|
|
|
@@ -168,7 +164,6 @@ Every other route hardcodes its tier and rejects `agentType` with HTTP 400:
|
|
|
168
164
|
| `mma-audit` | `complex` |
|
|
169
165
|
| `mma-review` | `complex` |
|
|
170
166
|
| `mma-debug` | `complex` |
|
|
171
|
-
| `mma-verify` | `complex` |
|
|
172
167
|
| `mma-investigate` | `complex` |
|
|
173
168
|
| `mma-explore` | `complex` (all three workers — internal, external, synthesizer) |
|
|
174
169
|
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "@zhixuan92/multi-model-agent",
|
|
3
|
-
"version": "4.
|
|
3
|
+
"version": "4.4.0",
|
|
4
4
|
"type": "module",
|
|
5
5
|
"license": "MIT",
|
|
6
6
|
"description": "Standalone HTTP server for multi-model-agent. Routes tool-invocation work to Claude, Codex, or OpenAI-compatible sub-agents with async-polling REST dispatch and installable skills for Claude Code, Gemini CLI, Codex CLI, and Cursor.",
|
|
@@ -53,7 +53,7 @@
|
|
|
53
53
|
},
|
|
54
54
|
"dependencies": {
|
|
55
55
|
"@asteasolutions/zod-to-openapi": "^8.5.0",
|
|
56
|
-
"@zhixuan92/multi-model-agent-core": "^4.
|
|
56
|
+
"@zhixuan92/multi-model-agent-core": "^4.4.0",
|
|
57
57
|
"gray-matter": "^4.0.3",
|
|
58
58
|
"minimist": "^1.2.8",
|
|
59
59
|
"proper-lockfile": "^4.1.2",
|
|
@@ -1 +0,0 @@
|
|
|
1
|
-
{"version":3,"file":"verify.d.ts","sourceRoot":"","sources":["../../../../src/http/handlers/tools/verify.ts"],"names":[],"mappings":"AAQA,OAAO,KAAK,EAAE,WAAW,EAAE,MAAM,uBAAuB,CAAC;AAEzD,OAAO,KAAK,EAAE,UAAU,EAAE,MAAM,gBAAgB,CAAC;AACjD,wBAAgB,kBAAkB,CAAC,IAAI,EAAE,WAAW,GAAG,UAAU,CAkDhE"}
|