@zhixuan92/multi-model-agent 3.2.0 → 3.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (65) hide show
  1. package/README.md +62 -33
  2. package/dist/http/canonicalize-file-paths.d.ts +8 -0
  3. package/dist/http/canonicalize-file-paths.d.ts.map +1 -0
  4. package/dist/http/canonicalize-file-paths.js +43 -0
  5. package/dist/http/canonicalize-file-paths.js.map +1 -0
  6. package/dist/http/execution-context.d.ts.map +1 -1
  7. package/dist/http/execution-context.js +0 -14
  8. package/dist/http/execution-context.js.map +1 -1
  9. package/dist/http/handlers/control/batch-slice.d.ts +4 -0
  10. package/dist/http/handlers/control/batch-slice.d.ts.map +1 -0
  11. package/dist/http/handlers/control/batch-slice.js +40 -0
  12. package/dist/http/handlers/control/batch-slice.js.map +1 -0
  13. package/dist/http/handlers/control/retry.d.ts +4 -0
  14. package/dist/http/handlers/control/retry.d.ts.map +1 -0
  15. package/dist/http/handlers/control/retry.js +60 -0
  16. package/dist/http/handlers/control/retry.js.map +1 -0
  17. package/dist/http/handlers/tools/audit.d.ts.map +1 -1
  18. package/dist/http/handlers/tools/audit.js +2 -0
  19. package/dist/http/handlers/tools/audit.js.map +1 -1
  20. package/dist/http/handlers/tools/debug.d.ts.map +1 -1
  21. package/dist/http/handlers/tools/debug.js +2 -0
  22. package/dist/http/handlers/tools/debug.js.map +1 -1
  23. package/dist/http/handlers/tools/delegate.d.ts.map +1 -1
  24. package/dist/http/handlers/tools/delegate.js +2 -0
  25. package/dist/http/handlers/tools/delegate.js.map +1 -1
  26. package/dist/http/handlers/tools/execute-plan.d.ts.map +1 -1
  27. package/dist/http/handlers/tools/execute-plan.js +2 -0
  28. package/dist/http/handlers/tools/execute-plan.js.map +1 -1
  29. package/dist/http/handlers/tools/investigate.d.ts +4 -0
  30. package/dist/http/handlers/tools/investigate.d.ts.map +1 -0
  31. package/dist/http/handlers/tools/investigate.js +81 -0
  32. package/dist/http/handlers/tools/investigate.js.map +1 -0
  33. package/dist/http/handlers/tools/review.d.ts.map +1 -1
  34. package/dist/http/handlers/tools/review.js +2 -0
  35. package/dist/http/handlers/tools/review.js.map +1 -1
  36. package/dist/http/handlers/tools/verify.d.ts.map +1 -1
  37. package/dist/http/handlers/tools/verify.js +2 -0
  38. package/dist/http/handlers/tools/verify.js.map +1 -1
  39. package/dist/http/request-observability.d.ts +9 -0
  40. package/dist/http/request-observability.d.ts.map +1 -0
  41. package/dist/http/request-observability.js +36 -0
  42. package/dist/http/request-observability.js.map +1 -0
  43. package/dist/http/server.d.ts.map +1 -1
  44. package/dist/http/server.js +52 -11
  45. package/dist/http/server.js.map +1 -1
  46. package/dist/install/discover.d.ts +1 -1
  47. package/dist/install/discover.d.ts.map +1 -1
  48. package/dist/install/discover.js +1 -0
  49. package/dist/install/discover.js.map +1 -1
  50. package/dist/openapi.d.ts.map +1 -1
  51. package/dist/openapi.js +6 -0
  52. package/dist/openapi.js.map +1 -1
  53. package/dist/skills/_shared/verify-and-review.md +12 -0
  54. package/dist/skills/mma-audit/SKILL.md +45 -18
  55. package/dist/skills/mma-clarifications/SKILL.md +73 -29
  56. package/dist/skills/mma-context-blocks/SKILL.md +56 -24
  57. package/dist/skills/mma-debug/SKILL.md +54 -22
  58. package/dist/skills/mma-delegate/SKILL.md +59 -21
  59. package/dist/skills/mma-execute-plan/SKILL.md +56 -24
  60. package/dist/skills/mma-investigate/SKILL.md +137 -0
  61. package/dist/skills/mma-retry/SKILL.md +65 -22
  62. package/dist/skills/mma-review/SKILL.md +49 -20
  63. package/dist/skills/mma-verify/SKILL.md +49 -18
  64. package/dist/skills/multi-model-agent/SKILL.md +84 -46
  65. package/package.json +2 -2
@@ -0,0 +1,137 @@
1
+ ---
2
+ name: mma-investigate
3
+ description: >-
4
+ Use when you need to answer a question about the codebase ("how does X work",
5
+ "where is Y called", "what does this directory do") and reading + grepping the
6
+ codebase yourself would consume main-context tokens
7
+ when_to_use: >-
8
+ A question about THIS codebase has surfaced — from the user, from a
9
+ methodology skill, or from your own next-step planning — AND mmagent is
10
+ running. Delegate the read/grep/synthesis to a worker so the main context
11
+ stays on judgment. Codebase only — does not perform web research or
12
+ git-history queries.
13
+ version: 3.4.0
14
+ ---
15
+
16
+ # mma-investigate
17
+
18
+ ## Overview
19
+
20
+ Answer a codebase question via a read-only mmagent worker. The worker greps and reads on its cheap budget; you read its synthesis on yours.
21
+
22
+ **Core principle:** Investigation is labor (read, grep, synthesize). Delegate it. The main agent stays on judgment — deciding what the answer means and what to do with it.
23
+
24
+ ## When to Use
25
+
26
+ ```dot
27
+ digraph when_to_use {
28
+ "Question about codebase?" [shape=diamond];
29
+ "About web / git history?" [shape=diamond];
30
+ "Already have the file in context?" [shape=diamond];
31
+ "mma-investigate" [shape=box];
32
+ "Read inline (1–2 reads)" [shape=box];
33
+ "WebSearch / git log" [shape=box];
34
+
35
+ "Question about codebase?" -> "About web / git history?";
36
+ "About web / git history?" -> "WebSearch / git log" [label="yes"];
37
+ "About web / git history?" -> "Already have the file in context?" [label="no"];
38
+ "Already have the file in context?" -> "Read inline (1–2 reads)" [label="yes"];
39
+ "Already have the file in context?" -> "mma-investigate" [label="no"];
40
+ }
41
+ ```
42
+
43
+ **Use when:**
44
+ - "How does X work in this codebase?"
45
+ - "Where is Y called from?"
46
+ - "What does this directory do?"
47
+ - The answer requires reading 3+ files or grepping
48
+ - Cross-cutting investigations (auth flow across modules, data lineage)
49
+
50
+ **Don't use when:**
51
+ - The answer is in 1–2 files you already have in context → just `Read`
52
+ - It's about web docs / external APIs → `WebSearch` / `WebFetch`
53
+ - It's about git history → `git log` / `git blame`
54
+ - You need to MODIFY code based on the finding → `mma-delegate` (research + edit)
55
+
56
+ ## Endpoint
57
+
58
+ `POST /investigate?cwd=<abs-path>`
59
+
60
+ @include _shared/auth.md
61
+
62
+ ## Request body
63
+
64
+ ```json
65
+ {
66
+ "question": "How does the auth middleware handle token refresh?",
67
+ "filePaths": ["/project/src/auth/"],
68
+ "contextBlockIds": []
69
+ }
70
+ ```
71
+
72
+ | Field | Type | Required | Notes |
73
+ |---|---|---|---|
74
+ | `question` | string | yes | Natural-language investigation question |
75
+ | `filePaths` | string[] | no | Anchor paths the worker starts from. Worker may grep beyond. |
76
+ | `contextBlockIds` | string[] | no | IDs from `mma-context-blocks` — enables follow-up / delta investigation |
77
+ | `agentType` | `'standard' \| 'complex'` | no | Caller override of the route default (`'complex'`) |
78
+ | `tools` | `'none' \| 'readonly'` | no | Default `'readonly'`. `'no-shell'` and `'full'` are rejected — investigation is read-only |
79
+
80
+ **Anchor narrow questions with `filePaths`:**
81
+
82
+ ❌ `{ "question": "Where is parseConfig called?" }` — searches the whole repo
83
+ ✅ `{ "question": "Where is parseConfig called?", "filePaths": ["src/"] }` — bounded
84
+
85
+ **Why:** the worker greps and reads under its cost ceiling. Without anchors, broad questions exhaust the budget before they finish.
86
+
87
+ ## Full example
88
+
89
+ ```bash
90
+ BATCH=$(curl -f --show-error -s -X POST \
91
+ -H "Authorization: Bearer $TOKEN" \
92
+ -H "Content-Type: application/json" \
93
+ -d '{"question":"How does the auth middleware handle token refresh?"}' \
94
+ "http://localhost:$PORT/investigate?cwd=/project")
95
+ BATCH_ID=$(echo "$BATCH" | jq -r '.batchId')
96
+ ```
97
+
98
+ @include _shared/polling.md
99
+
100
+ @include _shared/response-shape.md
101
+
102
+ ## Per-task report shape
103
+
104
+ Each task carries an `investigation` field on its per-task report:
105
+
106
+ ```json
107
+ {
108
+ "investigation": {
109
+ "citations": [
110
+ { "file": "src/auth/refresh.ts", "lines": "45-72", "claim": "Refresh handler reads bearer." }
111
+ ],
112
+ "confidence": { "level": "high", "rationale": "All claims directly cited." },
113
+ "diagnostics": {
114
+ "malformedCitationLines": 0,
115
+ "missingRequiredSections": [],
116
+ "invalidRequiredSections": []
117
+ }
118
+ }
119
+ }
120
+ ```
121
+
122
+ `workerStatus` is one of `done`, `done_with_concerns`, `needs_context`, `blocked`. When `done_with_concerns`, the per-task report carries `incompleteReason` (`turn_cap`, `cost_cap`, `timeout`, or `missing_sections`). When `needs_context`, the worker flagged a `[needs_context]` bullet under `## Unresolved` — re-dispatch with extra context (anchor paths, a context block, or a clarification turn).
123
+
124
+ ## Common pitfalls
125
+
126
+ ❌ **Asking for a fix instead of an answer**
127
+ > question: "Refactor the auth middleware to use JWT"
128
+
129
+ The investigator can't write — `tools: 'readonly'`. **Fix:** use `mma-delegate` for research-then-edit, or split: investigate first, then dispatch the edit.
130
+
131
+ ❌ **Treating `done_with_concerns` as failure**
132
+ The worker still produced citations and a confidence level. Read them — partial coverage with `incompleteReason: 'turn_cap'` often answers the question well enough. Re-dispatch with a tighter scope only if the citations are unusable.
133
+
134
+ ❌ **Inline-reading instead of delegating**
135
+ About to `Read` 3+ files just to answer one question? That's the wrong tradeoff — the worker reads on its cheap budget; you read its synthesis on yours.
136
+
137
+ @include _shared/error-handling.md
@@ -1,30 +1,62 @@
1
1
  ---
2
2
  name: mma-retry
3
3
  description: >-
4
- Re-run specific failed or incomplete tasks from a previous mmagent batch by
5
- index. Preserves the original task specs and only re-executes the named
6
- indices.
4
+ Use when a previous mma-* batch returned partial results (some tasks failed or
5
+ came back incomplete) and you want to re-run JUST the failed indices without
6
+ re-dispatching the whole batch
7
7
  when_to_use: >-
8
- A previous mma-delegate / mma-execute-plan batch returned partial results and
9
- you want to re-try the failed indices only. Prefer this over redispatching the
10
- whole batch or inline-retrying it's idempotent and keeps the original
11
- batch's diagnostics intact.
12
- version: 3.2.0
8
+ A previous mma-delegate / mma-execute-plan / mma-audit / mma-review /
9
+ mma-verify / mma-debug / mma-investigate batch returned partial results AND
10
+ you want to re-try the failed indices only. Prefer this over re-dispatching
11
+ the whole batch or inline-retrying — it's idempotent and preserves the
12
+ original batch's diagnostics.
13
+ version: 3.4.0
13
14
  ---
14
15
 
15
- ## mma-retry
16
+ # mma-retry
16
17
 
17
- Re-run selected tasks from a completed or failed batch. Specify the original
18
- `batchId` and the zero-based indices of the tasks to re-run. The retry runs
19
- those tasks fresh with the same configuration as the original batch.
18
+ ## Overview
20
19
 
21
- ### Endpoint
20
+ Re-run selected tasks from a completed or failed batch. Specify the original `batchId` and the zero-based indices of the tasks to re-run. The retry runs those tasks fresh with the same configuration as the original batch and produces a new `batchId`.
21
+
22
+ **Core principle:** A batch is the unit of dispatch, but a TASK is the unit of failure. Retry at the task level so successful tasks aren't re-charged.
23
+
24
+ ## When to Use
25
+
26
+ ```dot
27
+ digraph when_to_use {
28
+ "Batch returned terminal?" [shape=diamond];
29
+ "Some tasks failed/incomplete?" [shape=diamond];
30
+ "All tasks failed?" [shape=diamond];
31
+ "mma-retry (selected indices)" [shape=box];
32
+ "Re-dispatch the whole batch" [shape=box];
33
+ "Investigate first (mma-debug)" [shape=box];
34
+
35
+ "Batch returned terminal?" -> "Some tasks failed/incomplete?";
36
+ "Some tasks failed/incomplete?" -> "All tasks failed?" [label="yes"];
37
+ "Some tasks failed/incomplete?" -> "Done — read results" [label="no"];
38
+ "All tasks failed?" -> "Investigate first (mma-debug)" [label="yes"];
39
+ "All tasks failed?" -> "mma-retry (selected indices)" [label="no — partial"];
40
+ }
41
+ ```
42
+
43
+ **Use when:**
44
+ - A previous batch's terminal envelope shows mixed `done` / `done_with_concerns` / `failed`
45
+ - 1–N tasks (but not all) need a re-run with the same config
46
+ - You want to keep the original batch's diagnostics intact for comparison
47
+
48
+ **Don't use when:**
49
+ - All tasks failed → investigate the systemic cause first (`mma-debug`); retrying won't help
50
+ - The original batch is `expired` (TTL elapsed) → re-dispatch fresh
51
+ - You want to change the prompt → re-dispatch with the new prompt; retry preserves the original
52
+
53
+ ## Endpoint
22
54
 
23
55
  `POST /retry?cwd=<abs-path>`
24
56
 
25
57
  @include _shared/auth.md
26
58
 
27
- ### Request body
59
+ ## Request body
28
60
 
29
61
  ```json
30
62
  {
@@ -35,13 +67,12 @@ those tasks fresh with the same configuration as the original batch.
35
67
 
36
68
  | Field | Type | Required | Notes |
37
69
  |---|---|---|---|
38
- | `batchId` | string (UUID) | yes | Batch ID from a previous dispatch |
39
- | `taskIndices` | number[] | yes | Zero-based indices to re-run |
70
+ | `batchId` | string (UUID) | yes | Batch ID from a previous dispatch (not yet expired) |
71
+ | `taskIndices` | number[] | yes | Zero-based indices to re-run; must be non-negative integers |
40
72
 
41
- `taskIndices` must be non-negative integers. To re-run all tasks, pass all
42
- indices from `0` to `tasks.length - 1`.
73
+ To re-run all tasks: pass `[0, 1, ..., tasks.length - 1]`. (But consider: if all failed, debug instead of retrying.)
43
74
 
44
- ### Full example
75
+ ## Full example
45
76
 
46
77
  ```bash
47
78
  # Original batch had 4 tasks; re-run tasks at index 1 and 3
@@ -50,13 +81,25 @@ BATCH=$(curl -f --show-error -s -X POST \
50
81
  -H "Content-Type: application/json" \
51
82
  -d '{"batchId":"550e8400-e29b-41d4-a716-446655440000","taskIndices":[1,3]}' \
52
83
  "http://localhost:$PORT/retry?cwd=/project")
53
- BATCH_ID=$(echo "$BATCH" | jq -r '.batchId')
84
+ BATCH_ID=$(echo "$BATCH" | jq -r '.batchId') # NEW batchId — not the original
54
85
  ```
55
86
 
56
- The retry produces a new `batchId`. Poll the new ID until complete:
57
-
58
87
  @include _shared/polling.md
59
88
 
60
89
  @include _shared/response-shape.md
61
90
 
91
+ ## Common pitfalls
92
+
93
+ ❌ **Retrying after the batch expired**
94
+ TTL elapsed → original task specs are gone. **Fix:** re-dispatch fresh; the retry endpoint returns 404.
95
+
96
+ ❌ **Retrying without addressing the root cause**
97
+ A flaky task that failed once will likely fail again. **Fix:** investigate (`mma-debug` or read the original `result.error.message`), then retry — or escalate `agentType` to `complex` by re-dispatching.
98
+
99
+ ❌ **Confusing the new and original `batchId`**
100
+ Retry produces a NEW batchId; polling the original returns the old terminal state. **Fix:** save the retry's `batchId` and poll that one.
101
+
102
+ ❌ **Using retry to change task config**
103
+ Retry preserves the ORIGINAL config (prompt, agentType, filePaths, reviewPolicy). **Fix:** if you want different config, re-dispatch with `mma-delegate` / `mma-execute-plan`.
104
+
62
105
  @include _shared/error-handling.md
@@ -1,29 +1,46 @@
1
1
  ---
2
2
  name: mma-review
3
3
  description: >-
4
- Review code for quality, security, performance, or correctness via the local
5
- mmagent HTTP service. Sub-agents run in parallel per file, independent
6
- context.
4
+ Use when source code needs a quality / security / correctness pass pre-merge
5
+ review, post-implementation sanity check, or focused look at a small file set
6
+ — and the review can run in parallel per file
7
7
  when_to_use: >-
8
- The user asks for a code review, pre-merge check, or quality pass over one or
9
- more files OR a methodology skill (superpowers:requesting-code-review,
10
- /review, /security-review) points at a review task. Delegate the reviewer pass
11
- to mmagent workers your main context stays free to decide what to merge.
12
- version: 3.2.0
8
+ User asks for a code review or pre-merge check, OR a methodology skill
9
+ (superpowers:requesting-code-review, /review, /security-review) points at one,
10
+ AND mmagent is running. Delegate so each file reviews on its own worker; the
11
+ main agent only decides what to merge. Review on SOURCE CODE use mma-audit
12
+ for prose specs / configs.
13
+ version: 3.4.0
13
14
  ---
14
15
 
15
- ## mma-review
16
+ # mma-review
16
17
 
17
- Send code or files to sub-agents for structured review. Each file is reviewed
18
- independently in parallel; results are index-aligned with `filePaths`.
18
+ ## Overview
19
19
 
20
- ### Endpoint
20
+ Send code files to workers for structured review. Each file is reviewed independently in parallel; results are index-aligned with `filePaths`.
21
+
22
+ **Core principle:** Reviewer is a different model from the implementer — different training, different blind spots. Cross-model review catches what self-review misses.
23
+
24
+ ## When to Use
25
+
26
+ **Use when:**
27
+ - 1+ source code files just changed (post-implementation review)
28
+ - Pre-merge sanity check on a focused diff
29
+ - Security-sensitive code path (`focus: ["security"]`)
30
+ - A specialized review pass (e.g. `focus: ["performance"]` on hot-path code)
31
+
32
+ **Don't use when:**
33
+ - The thing being reviewed is prose / spec / config → `mma-audit` (better-suited prompt template)
34
+ - You want to know whether a complete branch is mergeable → run `/ultrareview` (multi-model branch review) instead
35
+ - The diff is one-line / one-character → reading inline is faster than dispatch
36
+
37
+ ## Endpoint
21
38
 
22
39
  `POST /review?cwd=<abs-path>`
23
40
 
24
41
  @include _shared/auth.md
25
42
 
26
- ### Request body
43
+ ## Request body
27
44
 
28
45
  ```json
29
46
  {
@@ -36,14 +53,14 @@ independently in parallel; results are index-aligned with `filePaths`.
36
53
 
37
54
  | Field | Type | Required | Notes |
38
55
  |---|---|---|---|
39
- | `code` | string | no | Inline code to review |
40
- | `focus` | string[] | no | Any of `security`, `performance`, `correctness`, `style` |
41
- | `filePaths` | string[] | no | Files to review (parallel) |
42
- | `contextBlockIds` | string[] | no | IDs from `mma-context-blocks` |
56
+ | `code` | string | no | Inline code snippet to review |
57
+ | `focus` | string[] | no | Any of `security`, `performance`, `correctness`, `style`. Omit for general review. |
58
+ | `filePaths` | string[] | no | Files to review (one worker per file, parallel) |
59
+ | `contextBlockIds` | string[] | no | IDs from `mma-context-blocks` — useful for design docs the reviewer should validate against |
43
60
 
44
61
  Either `code` or `filePaths` (or both) must be provided.
45
62
 
46
- ### Full example
63
+ ## Full example
47
64
 
48
65
  ```bash
49
66
  BATCH=$(curl -f --show-error -s -X POST \
@@ -54,10 +71,22 @@ BATCH=$(curl -f --show-error -s -X POST \
54
71
  BATCH_ID=$(echo "$BATCH" | jq -r '.batchId')
55
72
  ```
56
73
 
57
- Then poll until complete:
58
-
59
74
  @include _shared/polling.md
60
75
 
61
76
  @include _shared/response-shape.md
62
77
 
78
+ ## Common pitfalls
79
+
80
+ ❌ **Reviewing a plan/spec markdown with `mma-review`**
81
+ The reviewer is tuned for code constructs (types, call sites, test coverage). On prose it produces vague nits. **Fix:** use `mma-audit` for docs/specs, `mma-review` for source.
82
+
83
+ ❌ **Omitting `focus` and getting watery findings**
84
+ A general review surfaces low-signal style nits alongside real bugs. **Fix:** specify `focus: ["correctness"]` or `["security"]` to bias the reviewer toward the dimension you care about.
85
+
86
+ ❌ **Inlining the spec the reviewer should validate against**
87
+ If the reviewer needs to check the diff against a design doc, register the doc once via `mma-context-blocks` and pass the `contextBlockIds`. Inlining N times wastes tokens.
88
+
89
+ ❌ **Skipping review because "I already read it"**
90
+ Self-review and cross-model review are not the same thing. The whole reason to delegate is the different blind spots. Read the findings; merge what you agree with.
91
+
63
92
  @include _shared/error-handling.md
@@ -1,28 +1,45 @@
1
1
  ---
2
2
  name: mma-verify
3
3
  description: >-
4
- Verify work against a checklist via the local mmagent HTTP service. Sub-agents
5
- check each item independently.
4
+ Use when work is "complete" and you need to confirm acceptance criteria are
5
+ actually met before claiming so to the user — each checklist item verified
6
+ independently against the work
6
7
  when_to_use: >-
7
8
  The user (or a methodology skill like
8
- superpowers:verification-before-completion) wants acceptance-criteria checked
9
- against implemented work. Delegate the evidence-gathering to mmagent workers
10
- each checklist item is verified independently and in parallel.
11
- version: 3.2.0
9
+ superpowers:verification-before-completion) needs acceptance-criteria checked
10
+ against implemented work BEFORE claiming success. Delegate so each checklist
11
+ item gets independent evidence-gathering on a worker. Use this BEFORE saying
12
+ "done" — never after.
13
+ version: 3.4.0
12
14
  ---
13
15
 
14
- ## mma-verify
16
+ # mma-verify
15
17
 
16
- Submit work product and a checklist to sub-agents for independent verification.
17
- Each checklist item is verified in parallel; results are index-aligned.
18
+ ## Overview
18
19
 
19
- ### Endpoint
20
+ Submit work product and a checklist to workers for independent verification. Each checklist item is verified in parallel; results are index-aligned with the input.
21
+
22
+ **Core principle:** Self-verification ("I read the files; they look correct") has no external validation. Workers check independently and return evidence (or absence of it) per item.
23
+
24
+ ## When to Use
25
+
26
+ **Use when:**
27
+ - You're about to claim a task is "done" and need evidence per acceptance item
28
+ - A methodology skill (superpowers:verification-before-completion) routed here
29
+ - The user gave a checklist and asked you to confirm each item
30
+
31
+ **Don't use when:**
32
+ - The "checklist" is one item — read inline, faster than dispatch
33
+ - You don't have explicit acceptance criteria — write them first, then dispatch
34
+ - The work hasn't been done yet — verification is a post-condition, not a pre-condition
35
+
36
+ ## Endpoint
20
37
 
21
38
  `POST /verify?cwd=<abs-path>`
22
39
 
23
40
  @include _shared/auth.md
24
41
 
25
- ### Request body
42
+ ## Request body
26
43
 
27
44
  ```json
28
45
  {
@@ -39,12 +56,12 @@ Each checklist item is verified in parallel; results are index-aligned.
39
56
 
40
57
  | Field | Type | Required | Notes |
41
58
  |---|---|---|---|
42
- | `work` | string | no | Inline work product description |
43
- | `checklist` | string[] | yes | At least one item |
44
- | `filePaths` | string[] | no | Files to verify against (parallel) |
45
- | `contextBlockIds` | string[] | no | IDs from `mma-context-blocks` |
59
+ | `work` | string | no | Inline work-product description (e.g. summary of what changed) |
60
+ | `checklist` | string[] | yes | At least one item — each item verified by its own worker |
61
+ | `filePaths` | string[] | no | Files to verify against (workers can read them) |
62
+ | `contextBlockIds` | string[] | no | IDs from `mma-context-blocks` (e.g. the spec the work was supposed to satisfy) |
46
63
 
47
- ### Full example
64
+ ## Full example
48
65
 
49
66
  ```bash
50
67
  BATCH=$(curl -f --show-error -s -X POST \
@@ -55,10 +72,24 @@ BATCH=$(curl -f --show-error -s -X POST \
55
72
  BATCH_ID=$(echo "$BATCH" | jq -r '.batchId')
56
73
  ```
57
74
 
58
- Then poll until complete:
59
-
60
75
  @include _shared/polling.md
61
76
 
62
77
  @include _shared/response-shape.md
63
78
 
79
+ ## Common pitfalls
80
+
81
+ ❌ **Vague checklist items**
82
+ > "Code is good"
83
+
84
+ The worker can't gather evidence for "good". **Fix:** specific, falsifiable criteria — `"Function parseConfig has at least 3 unit tests covering: missing field, malformed JSON, empty file"`.
85
+
86
+ ❌ **Verifying without `filePaths`**
87
+ Worker has nothing to read; verdict is speculative. **Fix:** always pass the file(s) the work landed in.
88
+
89
+ ❌ **Treating verify as the implementation step**
90
+ Verify CHECKS work; it doesn't DO work. If a checklist item fails, dispatch `mma-delegate` to fix it, then re-verify.
91
+
92
+ ❌ **Skipping verify because "tests pass"**
93
+ Tests verify the test cases that exist. Verify checks the acceptance criteria — which often include things tests don't (docs updated, no debug-print left, etc.).
94
+
64
95
  @include _shared/error-handling.md
@@ -1,27 +1,74 @@
1
1
  ---
2
2
  name: multi-model-agent
3
3
  description: >-
4
- Router for the multi-model-agent local service. Use first when you're about to
5
- delegate any tool-using work picks the right mma-* skill for the task
6
- (audit, review, verify, debug, plan execution, ad-hoc delegation) instead of
7
- defaulting to inline Agent dispatches.
4
+ Use first whenever you're about to delegate any tool-using work — picks the
5
+ right mma-* skill (audit, review, verify, debug, plan execution, codebase
6
+ investigation, ad-hoc delegation, retry, context-block reuse, clarification
7
+ resume) instead of defaulting to inline Agent dispatches
8
8
  when_to_use: >-
9
9
  The user asks for work you'd normally delegate — audit, code review, checklist
10
- verification, debugging, plan execution, or ad-hoc parallel tasks — AND
11
- mmagent is running. Read this once, pick the matching mma-* skill, and
12
- delegate there. Applies equally whether the user invoked a superpowers
13
- methodology skill or just asked directly.
14
- version: 3.2.0
10
+ verification, debugging, plan execution, codebase Q&A, or ad-hoc parallel
11
+ tasks — AND mmagent is running. Read this once, pick the matching mma-* skill,
12
+ and delegate there. Applies equally whether the user invoked a superpowers
13
+ methodology skill or asked directly.
14
+ version: 3.4.0
15
15
  ---
16
16
 
17
- ## multi-model-agent overview
18
-
19
- multi-model-agent is a local HTTP service that fans out tool-using work to
20
- sub-agents running on different LLM providers (Claude, OpenAI-compatible, Codex).
17
+ # multi-model-agent (router)
18
+
19
+ ## Overview
20
+
21
+ Local HTTP service that fans out tool-using work to sub-agents on different LLM providers (Claude, OpenAI-compatible, Codex). Workers run on cheap models; the main agent stays on judgment.
22
+
23
+ **Core principle:** Pick the most specific `mma-*` skill that fits the task. Specificity reduces input — specialized skills know their route, schema, and defaults so you write less.
24
+
25
+ ## Skill map
26
+
27
+ ```dot
28
+ digraph picker {
29
+ "Plan/spec file on disk?" [shape=diamond];
30
+ "Audit a doc?" [shape=diamond];
31
+ "Review code?" [shape=diamond];
32
+ "Verify a checklist?" [shape=diamond];
33
+ "Debug a failure?" [shape=diamond];
34
+ "Codebase question?" [shape=diamond];
35
+ "mma-execute-plan" [shape=box];
36
+ "mma-audit" [shape=box];
37
+ "mma-review" [shape=box];
38
+ "mma-verify" [shape=box];
39
+ "mma-debug" [shape=box];
40
+ "mma-investigate" [shape=box];
41
+ "mma-delegate" [shape=box];
42
+
43
+ "Plan/spec file on disk?" -> "mma-execute-plan" [label="yes"];
44
+ "Plan/spec file on disk?" -> "Audit a doc?" [label="no"];
45
+ "Audit a doc?" -> "mma-audit" [label="yes"];
46
+ "Audit a doc?" -> "Review code?" [label="no"];
47
+ "Review code?" -> "mma-review" [label="yes"];
48
+ "Review code?" -> "Verify a checklist?" [label="no"];
49
+ "Verify a checklist?" -> "mma-verify" [label="yes"];
50
+ "Verify a checklist?" -> "Debug a failure?" [label="no"];
51
+ "Debug a failure?" -> "mma-debug" [label="yes"];
52
+ "Debug a failure?" -> "Codebase question?" [label="no"];
53
+ "Codebase question?" -> "mma-investigate" [label="yes"];
54
+ "Codebase question?" -> "mma-delegate" [label="no — ad-hoc"];
55
+ }
56
+ ```
21
57
 
22
- ### Preflight: auto-start the daemon if it is not running
58
+ | Skill | Purpose |
59
+ |---|---|
60
+ | `mma-execute-plan` | Implement tasks from a plan or spec file (descriptors match plan headings) |
61
+ | `mma-audit` | Audit a document/spec/config for security, correctness, style, or performance |
62
+ | `mma-review` | Review code for quality, security, performance, correctness |
63
+ | `mma-verify` | Verify work against a checklist (one item per worker, parallel) |
64
+ | `mma-debug` | Debug a failure with a structured hypothesis |
65
+ | `mma-investigate` | Codebase Q&A — structured answer with `file:line` citations + confidence |
66
+ | `mma-delegate` | Ad-hoc implementation / research with no plan file |
67
+ | `mma-retry` | Re-run specific failed/incomplete tasks from a previous batch by index |
68
+ | `mma-context-blocks` | Register a reused doc once; reference by ID across N tasks |
69
+ | `mma-clarifications` | Confirm or correct the service's proposed interpretation |
23
70
 
24
- Before any mma-* call, check the server. If it is not up, start it in the background — do NOT run `mmagent serve` synchronously, it blocks forever.
71
+ ## Preflight: auto-start the daemon if it is not running
25
72
 
26
73
  ```bash
27
74
  PORT=7337
@@ -37,52 +84,43 @@ fi
37
84
 
38
85
  Idempotent: already-running daemon → curl succeeds → no-op.
39
86
 
40
- ### Auth token
87
+ `mmagent serve` (no `&`) — blocks forever, never reaches the next step.
88
+ ✅ `mmagent serve >/dev/null 2>&1 & disown` — backgrounded, releases the shell.
89
+
90
+ ## Auth token
41
91
 
42
- Set the token in your environment:
43
92
  ```bash
44
93
  export MMAGENT_AUTH_TOKEN=$(mmagent print-token)
45
94
  ```
46
95
 
47
- Or read it from the env var `MMAGENT_AUTH_TOKEN` if already set.
48
- Every request requires `Authorization: Bearer <token>`.
49
-
50
- ### Skill map
51
-
52
- | Skill | Purpose |
53
- |---|---|
54
- | `mma-delegate` | Ad-hoc implementation/research (no plan file) |
55
- | `mma-audit` | Audit a document for security, correctness, style, or performance |
56
- | `mma-review` | Review code for quality, security, or correctness |
57
- | `mma-verify` | Verify work against a checklist |
58
- | `mma-debug` | Debug a failure with a structured hypothesis |
59
- | `mma-execute-plan` | Implement tasks from a plan or spec file |
60
- | `mma-retry` | Re-run specific failed tasks from a previous batch |
61
- | `mma-context-blocks` | Register large reused documents to reference by ID |
62
- | `mma-clarifications` | Confirm or correct the service's proposed interpretation |
96
+ Every request requires `Authorization: Bearer $MMAGENT_AUTH_TOKEN`. The token rotates on every `mmagent serve` restart re-export after a `pkill`/upgrade.
63
97
 
64
- ### Worker tier: `agentType`
98
+ ## Worker tier: `agentType`
65
99
 
66
100
  `mma-delegate` and `mma-execute-plan` accept `agentType: "standard" | "complex"`. Default is `"standard"` (cheaper, faster). Pick `"complex"` when:
67
101
 
68
- - The task touches many files or requires multi-step reasoning a smaller model cannot hold in context.
69
- - A prior standard run came back with `filesWritten: 0` or exhausted its turn budget (visible in the verbose stream or the final envelope's `batchTimings` / `results`).
102
+ - The task touches many files or requires multi-step reasoning a standard-tier model cannot hold in context.
103
+ - A prior standard run came back with `filesWritten: 0` or `incompleteReason: "turn_cap"` / `"cost_cap"` / `"timeout"`.
70
104
  - The task is security-sensitive or ambiguous enough that being wrong is costly.
71
105
 
72
- `mma-audit`, `mma-review`, `mma-debug` already default to complex; `mma-verify` already defaults to standard these are not configurable from the caller and do not need an `agentType` field.
106
+ `mma-audit`, `mma-review`, `mma-debug`, `mma-investigate` already default to complex; `mma-verify` already defaults to standard. These are not caller-configurable.
73
107
 
74
- ### General flow
108
+ ## General flow
75
109
 
76
- 1. Call the appropriate `mma-*` skill → receive `{ batchId }`.
110
+ 1. Call the matching `mma-*` skill → receive `{ batchId, statusUrl }`.
77
111
  2. Poll `GET /batch/:id`: `202 text/plain` while pending (body is the running headline), `200 application/json` on terminal.
78
- 3. Read `results` / `error` / `proposedInterpretation` from the terminal envelope.
112
+ 3. Read `results` / `error` / `proposedInterpretation` from the 7-field terminal envelope.
79
113
 
80
- If the terminal envelope has `proposedInterpretation` as a string, use `mma-clarifications` to confirm or correct it.
114
+ If `proposedInterpretation` is a string (not the `not_applicable` sentinel) use `mma-clarifications` to confirm/correct.
81
115
 
82
- ### Diagnosing slow tasks
116
+ ## Common pitfalls
83
117
 
84
- Start the server with `mmagent serve --verbose` (or set `diagnostics.verbose: true` in config) to record `tool_call` and `llm_turn` events. Then tail them:
118
+ **Defaulting to inline Agent dispatch when mmagent is up.** mmagent workers cost ~10× less and don't pollute main context. **Why:** every inline tool call burns flagship-model tokens; that's exactly what mmagent exists to avoid.
85
119
 
86
- ```bash
87
- mmagent logs --follow --batch=$BATCH_ID
88
- ```
120
+ ❌ **Picking `mma-delegate` when a more specific skill fits.** Audit / review / verify / debug / investigate workers know their route's defaults and emit structured reports. **Why:** specialized skills require less input and produce richer output.
121
+
122
+ ❌ **Starting an investigation that needs to write code.** `mma-investigate` is read-only. **Fix:** dispatch `mma-delegate` with research-then-edit framing, or split: investigate → digest → edit.
123
+
124
+ ## Diagnosing slow tasks
125
+
126
+ `mmagent serve --verbose` (or `diagnostics.verbose: true` in config) records `tool_call`, `turn_complete`, and `heartbeat` events. Tail with `mmagent logs --follow --batch=$BATCH_ID`.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@zhixuan92/multi-model-agent",
3
- "version": "3.2.0",
3
+ "version": "3.4.0",
4
4
  "type": "module",
5
5
  "license": "MIT",
6
6
  "description": "Standalone HTTP server for multi-model-agent. Routes tool-invocation work to Claude, Codex, or OpenAI-compatible sub-agents with async-polling REST dispatch and installable skills for Claude Code, Gemini CLI, Codex CLI, and Cursor.",
@@ -52,7 +52,7 @@
52
52
  },
53
53
  "dependencies": {
54
54
  "@asteasolutions/zod-to-openapi": "^8.5.0",
55
- "@zhixuan92/multi-model-agent-core": "^3.2.0",
55
+ "@zhixuan92/multi-model-agent-core": "^3.4.0",
56
56
  "gray-matter": "^4.0.3",
57
57
  "minimist": "^1.2.8",
58
58
  "zod": "^4.0.0"