npm - @zhixuan92/multi-model-agent - Versions diffs - 3.2.0 → 3.4.0 - Mend

@zhixuan92/multi-model-agent 3.2.0 → 3.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (65) hide show

package/README.md +62 -33
package/dist/http/canonicalize-file-paths.d.ts +8 -0
package/dist/http/canonicalize-file-paths.d.ts.map +1 -0
package/dist/http/canonicalize-file-paths.js +43 -0
package/dist/http/canonicalize-file-paths.js.map +1 -0
package/dist/http/execution-context.d.ts.map +1 -1
package/dist/http/execution-context.js +0 -14
package/dist/http/execution-context.js.map +1 -1
package/dist/http/handlers/control/batch-slice.d.ts +4 -0
package/dist/http/handlers/control/batch-slice.d.ts.map +1 -0
package/dist/http/handlers/control/batch-slice.js +40 -0
package/dist/http/handlers/control/batch-slice.js.map +1 -0
package/dist/http/handlers/control/retry.d.ts +4 -0
package/dist/http/handlers/control/retry.d.ts.map +1 -0
package/dist/http/handlers/control/retry.js +60 -0
package/dist/http/handlers/control/retry.js.map +1 -0
package/dist/http/handlers/tools/audit.d.ts.map +1 -1
package/dist/http/handlers/tools/audit.js +2 -0
package/dist/http/handlers/tools/audit.js.map +1 -1
package/dist/http/handlers/tools/debug.d.ts.map +1 -1
package/dist/http/handlers/tools/debug.js +2 -0
package/dist/http/handlers/tools/debug.js.map +1 -1
package/dist/http/handlers/tools/delegate.d.ts.map +1 -1
package/dist/http/handlers/tools/delegate.js +2 -0
package/dist/http/handlers/tools/delegate.js.map +1 -1
package/dist/http/handlers/tools/execute-plan.d.ts.map +1 -1
package/dist/http/handlers/tools/execute-plan.js +2 -0
package/dist/http/handlers/tools/execute-plan.js.map +1 -1
package/dist/http/handlers/tools/investigate.d.ts +4 -0
package/dist/http/handlers/tools/investigate.d.ts.map +1 -0
package/dist/http/handlers/tools/investigate.js +81 -0
package/dist/http/handlers/tools/investigate.js.map +1 -0
package/dist/http/handlers/tools/review.d.ts.map +1 -1
package/dist/http/handlers/tools/review.js +2 -0
package/dist/http/handlers/tools/review.js.map +1 -1
package/dist/http/handlers/tools/verify.d.ts.map +1 -1
package/dist/http/handlers/tools/verify.js +2 -0
package/dist/http/handlers/tools/verify.js.map +1 -1
package/dist/http/request-observability.d.ts +9 -0
package/dist/http/request-observability.d.ts.map +1 -0
package/dist/http/request-observability.js +36 -0
package/dist/http/request-observability.js.map +1 -0
package/dist/http/server.d.ts.map +1 -1
package/dist/http/server.js +52 -11
package/dist/http/server.js.map +1 -1
package/dist/install/discover.d.ts +1 -1
package/dist/install/discover.d.ts.map +1 -1
package/dist/install/discover.js +1 -0
package/dist/install/discover.js.map +1 -1
package/dist/openapi.d.ts.map +1 -1
package/dist/openapi.js +6 -0
package/dist/openapi.js.map +1 -1
package/dist/skills/_shared/verify-and-review.md +12 -0
package/dist/skills/mma-audit/SKILL.md +45 -18
package/dist/skills/mma-clarifications/SKILL.md +73 -29
package/dist/skills/mma-context-blocks/SKILL.md +56 -24
package/dist/skills/mma-debug/SKILL.md +54 -22
package/dist/skills/mma-delegate/SKILL.md +59 -21
package/dist/skills/mma-execute-plan/SKILL.md +56 -24
package/dist/skills/mma-investigate/SKILL.md +137 -0
package/dist/skills/mma-retry/SKILL.md +65 -22
package/dist/skills/mma-review/SKILL.md +49 -20
package/dist/skills/mma-verify/SKILL.md +49 -18
package/dist/skills/multi-model-agent/SKILL.md +84 -46
package/package.json +2 -2

package/dist/skills/mma-investigate/SKILL.md ADDED Viewed

@@ -0,0 +1,137 @@
+---
+name: mma-investigate
+description: >-
+  Use when you need to answer a question about the codebase ("how does X work",
+  "where is Y called", "what does this directory do") and reading + grepping the
+  codebase yourself would consume main-context tokens
+when_to_use: >-
+  A question about THIS codebase has surfaced — from the user, from a
+  methodology skill, or from your own next-step planning — AND mmagent is
+  running. Delegate the read/grep/synthesis to a worker so the main context
+  stays on judgment. Codebase only — does not perform web research or
+  git-history queries.
+version: 3.4.0
+---
+# mma-investigate
+## Overview
+Answer a codebase question via a read-only mmagent worker. The worker greps and reads on its cheap budget; you read its synthesis on yours.
+**Core principle:** Investigation is labor (read, grep, synthesize). Delegate it. The main agent stays on judgment — deciding what the answer means and what to do with it.
+## When to Use
+```dot
+digraph when_to_use {
+    "Question about codebase?" [shape=diamond];
+    "About web / git history?" [shape=diamond];
+    "Already have the file in context?" [shape=diamond];
+    "mma-investigate" [shape=box];
+    "Read inline (1–2 reads)" [shape=box];
+    "WebSearch / git log" [shape=box];
+    "Question about codebase?" -> "About web / git history?";
+    "About web / git history?" -> "WebSearch / git log" [label="yes"];
+    "About web / git history?" -> "Already have the file in context?" [label="no"];
+    "Already have the file in context?" -> "Read inline (1–2 reads)" [label="yes"];
+    "Already have the file in context?" -> "mma-investigate" [label="no"];
+}
+```
+**Use when:**
+- "How does X work in this codebase?"
+- "Where is Y called from?"
+- "What does this directory do?"
+- The answer requires reading 3+ files or grepping
+- Cross-cutting investigations (auth flow across modules, data lineage)
+**Don't use when:**
+- The answer is in 1–2 files you already have in context → just `Read`
+- It's about web docs / external APIs → `WebSearch` / `WebFetch`
+- It's about git history → `git log` / `git blame`
+- You need to MODIFY code based on the finding → `mma-delegate` (research + edit)
+## Endpoint
+`POST /investigate?cwd=<abs-path>`
+@include _shared/auth.md
+## Request body
+```json
+{
+  "question": "How does the auth middleware handle token refresh?",
+  "filePaths": ["/project/src/auth/"],
+  "contextBlockIds": []
+}
+```
+| Field | Type | Required | Notes |
+|---|---|---|---|
+| `question` | string | yes | Natural-language investigation question |
+| `filePaths` | string[] | no | Anchor paths the worker starts from. Worker may grep beyond. |
+| `contextBlockIds` | string[] | no | IDs from `mma-context-blocks` — enables follow-up / delta investigation |
+| `agentType` | `'standard' \| 'complex'` | no | Caller override of the route default (`'complex'`) |
+| `tools` | `'none' \| 'readonly'` | no | Default `'readonly'`. `'no-shell'` and `'full'` are rejected — investigation is read-only |
+**Anchor narrow questions with `filePaths`:**
+❌ `{ "question": "Where is parseConfig called?" }` — searches the whole repo
+✅ `{ "question": "Where is parseConfig called?", "filePaths": ["src/"] }` — bounded
+**Why:** the worker greps and reads under its cost ceiling. Without anchors, broad questions exhaust the budget before they finish.
+## Full example
+```bash
+BATCH=$(curl -f --show-error -s -X POST \
+  -H "Authorization: Bearer $TOKEN" \
+  -H "Content-Type: application/json" \
+  -d '{"question":"How does the auth middleware handle token refresh?"}' \
+  "http://localhost:$PORT/investigate?cwd=/project")
+BATCH_ID=$(echo "$BATCH" | jq -r '.batchId')
+```
+@include _shared/polling.md
+@include _shared/response-shape.md
+## Per-task report shape
+Each task carries an `investigation` field on its per-task report:
+```json
+{
+  "investigation": {
+    "citations": [
+      { "file": "src/auth/refresh.ts", "lines": "45-72", "claim": "Refresh handler reads bearer." }
+    ],
+    "confidence": { "level": "high", "rationale": "All claims directly cited." },
+    "diagnostics": {
+      "malformedCitationLines": 0,
+      "missingRequiredSections": [],
+      "invalidRequiredSections": []
+    }
+  }
+}
+```
+`workerStatus` is one of `done`, `done_with_concerns`, `needs_context`, `blocked`. When `done_with_concerns`, the per-task report carries `incompleteReason` (`turn_cap`, `cost_cap`, `timeout`, or `missing_sections`). When `needs_context`, the worker flagged a `[needs_context]` bullet under `## Unresolved` — re-dispatch with extra context (anchor paths, a context block, or a clarification turn).
+## Common pitfalls
+❌ **Asking for a fix instead of an answer**
+> question: "Refactor the auth middleware to use JWT"
+The investigator can't write — `tools: 'readonly'`. **Fix:** use `mma-delegate` for research-then-edit, or split: investigate first, then dispatch the edit.
+❌ **Treating `done_with_concerns` as failure**
+The worker still produced citations and a confidence level. Read them — partial coverage with `incompleteReason: 'turn_cap'` often answers the question well enough. Re-dispatch with a tighter scope only if the citations are unusable.
+❌ **Inline-reading instead of delegating**
+About to `Read` 3+ files just to answer one question? That's the wrong tradeoff — the worker reads on its cheap budget; you read its synthesis on yours.
+@include _shared/error-handling.md

package/dist/skills/mma-retry/SKILL.md CHANGED Viewed

@@ -1,30 +1,62 @@
 ---
 name: mma-retry
 description: >-
-  Re-run specific failed or incomplete tasks from a previous mmagent batch by
-  index. Preserves the original task specs and only re-executes the named
-  indices.
+  Use when a previous mma-* batch returned partial results (some tasks failed or
+  came back incomplete) and you want to re-run JUST the failed indices without
+  re-dispatching the whole batch
 when_to_use: >-
-  A previous mma-delegate / mma-execute-plan batch returned partial results and
-  you want to re-try the failed indices only. Prefer this over redispatching the
-  whole batch or inline-retrying — it's idempotent and keeps the original
-  batch's diagnostics intact.
-version: 3.2.0
+  A previous mma-delegate / mma-execute-plan / mma-audit / mma-review /
+  mma-verify / mma-debug / mma-investigate batch returned partial results AND
+  you want to re-try the failed indices only. Prefer this over re-dispatching
+  the whole batch or inline-retrying — it's idempotent and preserves the
+  original batch's diagnostics.
+version: 3.4.0
 ---
-## mma-retry
+# mma-retry
-Re-run selected tasks from a completed or failed batch. Specify the original
-`batchId` and the zero-based indices of the tasks to re-run. The retry runs
-those tasks fresh with the same configuration as the original batch.
+## Overview
-### Endpoint
+Re-run selected tasks from a completed or failed batch. Specify the original `batchId` and the zero-based indices of the tasks to re-run. The retry runs those tasks fresh with the same configuration as the original batch and produces a new `batchId`.
+**Core principle:** A batch is the unit of dispatch, but a TASK is the unit of failure. Retry at the task level so successful tasks aren't re-charged.
+## When to Use
+```dot
+digraph when_to_use {
+    "Batch returned terminal?" [shape=diamond];
+    "Some tasks failed/incomplete?" [shape=diamond];
+    "All tasks failed?" [shape=diamond];
+    "mma-retry (selected indices)" [shape=box];
+    "Re-dispatch the whole batch" [shape=box];
+    "Investigate first (mma-debug)" [shape=box];
+    "Batch returned terminal?" -> "Some tasks failed/incomplete?";
+    "Some tasks failed/incomplete?" -> "All tasks failed?" [label="yes"];
+    "Some tasks failed/incomplete?" -> "Done — read results" [label="no"];
+    "All tasks failed?" -> "Investigate first (mma-debug)" [label="yes"];
+    "All tasks failed?" -> "mma-retry (selected indices)" [label="no — partial"];
+}
+```
+**Use when:**
+- A previous batch's terminal envelope shows mixed `done` / `done_with_concerns` / `failed`
+- 1–N tasks (but not all) need a re-run with the same config
+- You want to keep the original batch's diagnostics intact for comparison
+**Don't use when:**
+- All tasks failed → investigate the systemic cause first (`mma-debug`); retrying won't help
+- The original batch is `expired` (TTL elapsed) → re-dispatch fresh
+- You want to change the prompt → re-dispatch with the new prompt; retry preserves the original
+## Endpoint
 `POST /retry?cwd=<abs-path>`
 @include _shared/auth.md
-### Request body
+## Request body
 ```json
 {
@@ -35,13 +67,12 @@ those tasks fresh with the same configuration as the original batch.
 | Field | Type | Required | Notes |
 |---|---|---|---|
-| `batchId` | string (UUID) | yes | Batch ID from a previous dispatch |
-| `taskIndices` | number[] | yes | Zero-based indices to re-run |
+| `batchId` | string (UUID) | yes | Batch ID from a previous dispatch (not yet expired) |
+| `taskIndices` | number[] | yes | Zero-based indices to re-run; must be non-negative integers |
-`taskIndices` must be non-negative integers. To re-run all tasks, pass all
-indices from `0` to `tasks.length - 1`.
+To re-run all tasks: pass `[0, 1, ..., tasks.length - 1]`. (But consider: if all failed, debug instead of retrying.)
-### Full example
+## Full example
 ```bash
 # Original batch had 4 tasks; re-run tasks at index 1 and 3
@@ -50,13 +81,25 @@ BATCH=$(curl -f --show-error -s -X POST \
   -H "Content-Type: application/json" \
   -d '{"batchId":"550e8400-e29b-41d4-a716-446655440000","taskIndices":[1,3]}' \
   "http://localhost:$PORT/retry?cwd=/project")
-BATCH_ID=$(echo "$BATCH" | jq -r '.batchId')
+BATCH_ID=$(echo "$BATCH" | jq -r '.batchId')   # NEW batchId — not the original
 ```
-The retry produces a new `batchId`. Poll the new ID until complete:
 @include _shared/polling.md
 @include _shared/response-shape.md
+## Common pitfalls
+❌ **Retrying after the batch expired**
+TTL elapsed → original task specs are gone. **Fix:** re-dispatch fresh; the retry endpoint returns 404.
+❌ **Retrying without addressing the root cause**
+A flaky task that failed once will likely fail again. **Fix:** investigate (`mma-debug` or read the original `result.error.message`), then retry — or escalate `agentType` to `complex` by re-dispatching.
+❌ **Confusing the new and original `batchId`**
+Retry produces a NEW batchId; polling the original returns the old terminal state. **Fix:** save the retry's `batchId` and poll that one.
+❌ **Using retry to change task config**
+Retry preserves the ORIGINAL config (prompt, agentType, filePaths, reviewPolicy). **Fix:** if you want different config, re-dispatch with `mma-delegate` / `mma-execute-plan`.
 @include _shared/error-handling.md

package/dist/skills/mma-review/SKILL.md CHANGED Viewed

@@ -1,29 +1,46 @@
 ---
 name: mma-review
 description: >-
-  Review code for quality, security, performance, or correctness via the local
-  mmagent HTTP service. Sub-agents run in parallel per file, independent
-  context.
+  Use when source code needs a quality / security / correctness pass — pre-merge
+  review, post-implementation sanity check, or focused look at a small file set
+  — and the review can run in parallel per file
 when_to_use: >-
-  The user asks for a code review, pre-merge check, or quality pass over one or
-  more files OR a methodology skill (superpowers:requesting-code-review,
-  /review, /security-review) points at a review task. Delegate the reviewer pass
-  to mmagent workers — your main context stays free to decide what to merge.
-version: 3.2.0
+  User asks for a code review or pre-merge check, OR a methodology skill
+  (superpowers:requesting-code-review, /review, /security-review) points at one,
+  AND mmagent is running. Delegate so each file reviews on its own worker; the
+  main agent only decides what to merge. Review on SOURCE CODE — use mma-audit
+  for prose specs / configs.
+version: 3.4.0
 ---
-## mma-review
+# mma-review
-Send code or files to sub-agents for structured review. Each file is reviewed
-independently in parallel; results are index-aligned with `filePaths`.
+## Overview
-### Endpoint
+Send code files to workers for structured review. Each file is reviewed independently in parallel; results are index-aligned with `filePaths`.
+**Core principle:** Reviewer is a different model from the implementer — different training, different blind spots. Cross-model review catches what self-review misses.
+## When to Use
+**Use when:**
+- 1+ source code files just changed (post-implementation review)
+- Pre-merge sanity check on a focused diff
+- Security-sensitive code path (`focus: ["security"]`)
+- A specialized review pass (e.g. `focus: ["performance"]` on hot-path code)
+**Don't use when:**
+- The thing being reviewed is prose / spec / config → `mma-audit` (better-suited prompt template)
+- You want to know whether a complete branch is mergeable → run `/ultrareview` (multi-model branch review) instead
+- The diff is one-line / one-character → reading inline is faster than dispatch
+## Endpoint
 `POST /review?cwd=<abs-path>`
 @include _shared/auth.md
-### Request body
+## Request body
 ```json
 {
@@ -36,14 +53,14 @@ independently in parallel; results are index-aligned with `filePaths`.
 | Field | Type | Required | Notes |
 |---|---|---|---|
-| `code` | string | no | Inline code to review |
-| `focus` | string[] | no | Any of `security`, `performance`, `correctness`, `style` |
-| `filePaths` | string[] | no | Files to review (parallel) |
-| `contextBlockIds` | string[] | no | IDs from `mma-context-blocks` |
+| `code` | string | no | Inline code snippet to review |
+| `focus` | string[] | no | Any of `security`, `performance`, `correctness`, `style`. Omit for general review. |
+| `filePaths` | string[] | no | Files to review (one worker per file, parallel) |
+| `contextBlockIds` | string[] | no | IDs from `mma-context-blocks` — useful for design docs the reviewer should validate against |
 Either `code` or `filePaths` (or both) must be provided.
-### Full example
+## Full example
 ```bash
 BATCH=$(curl -f --show-error -s -X POST \
@@ -54,10 +71,22 @@ BATCH=$(curl -f --show-error -s -X POST \
 BATCH_ID=$(echo "$BATCH" | jq -r '.batchId')
 ```
-Then poll until complete:
 @include _shared/polling.md
 @include _shared/response-shape.md
+## Common pitfalls
+❌ **Reviewing a plan/spec markdown with `mma-review`**
+The reviewer is tuned for code constructs (types, call sites, test coverage). On prose it produces vague nits. **Fix:** use `mma-audit` for docs/specs, `mma-review` for source.
+❌ **Omitting `focus` and getting watery findings**
+A general review surfaces low-signal style nits alongside real bugs. **Fix:** specify `focus: ["correctness"]` or `["security"]` to bias the reviewer toward the dimension you care about.
+❌ **Inlining the spec the reviewer should validate against**
+If the reviewer needs to check the diff against a design doc, register the doc once via `mma-context-blocks` and pass the `contextBlockIds`. Inlining N times wastes tokens.
+❌ **Skipping review because "I already read it"**
+Self-review and cross-model review are not the same thing. The whole reason to delegate is the different blind spots. Read the findings; merge what you agree with.
 @include _shared/error-handling.md

package/dist/skills/mma-verify/SKILL.md CHANGED Viewed

@@ -1,28 +1,45 @@
 ---
 name: mma-verify
 description: >-
-  Verify work against a checklist via the local mmagent HTTP service. Sub-agents
-  check each item independently.
+  Use when work is "complete" and you need to confirm acceptance criteria are
+  actually met before claiming so to the user — each checklist item verified
+  independently against the work
 when_to_use: >-
   The user (or a methodology skill like
-  superpowers:verification-before-completion) wants acceptance-criteria checked
-  against implemented work. Delegate the evidence-gathering to mmagent workers —
-  each checklist item is verified independently and in parallel.
-version: 3.2.0
+  superpowers:verification-before-completion) needs acceptance-criteria checked
+  against implemented work BEFORE claiming success. Delegate so each checklist
+  item gets independent evidence-gathering on a worker. Use this BEFORE saying
+  "done" — never after.
+version: 3.4.0
 ---
-## mma-verify
+# mma-verify
-Submit work product and a checklist to sub-agents for independent verification.
-Each checklist item is verified in parallel; results are index-aligned.
+## Overview
-### Endpoint
+Submit work product and a checklist to workers for independent verification. Each checklist item is verified in parallel; results are index-aligned with the input.
+**Core principle:** Self-verification ("I read the files; they look correct") has no external validation. Workers check independently and return evidence (or absence of it) per item.
+## When to Use
+**Use when:**
+- You're about to claim a task is "done" and need evidence per acceptance item
+- A methodology skill (superpowers:verification-before-completion) routed here
+- The user gave a checklist and asked you to confirm each item
+**Don't use when:**
+- The "checklist" is one item — read inline, faster than dispatch
+- You don't have explicit acceptance criteria — write them first, then dispatch
+- The work hasn't been done yet — verification is a post-condition, not a pre-condition
+## Endpoint
 `POST /verify?cwd=<abs-path>`
 @include _shared/auth.md
-### Request body
+## Request body
 ```json
 {
@@ -39,12 +56,12 @@ Each checklist item is verified in parallel; results are index-aligned.
 | Field | Type | Required | Notes |
 |---|---|---|---|
-| `work` | string | no | Inline work product description |
-| `checklist` | string[] | yes | At least one item |
-| `filePaths` | string[] | no | Files to verify against (parallel) |
-| `contextBlockIds` | string[] | no | IDs from `mma-context-blocks` |
+| `work` | string | no | Inline work-product description (e.g. summary of what changed) |
+| `checklist` | string[] | yes | At least one item — each item verified by its own worker |
+| `filePaths` | string[] | no | Files to verify against (workers can read them) |
+| `contextBlockIds` | string[] | no | IDs from `mma-context-blocks` (e.g. the spec the work was supposed to satisfy) |
-### Full example
+## Full example
 ```bash
 BATCH=$(curl -f --show-error -s -X POST \
@@ -55,10 +72,24 @@ BATCH=$(curl -f --show-error -s -X POST \
 BATCH_ID=$(echo "$BATCH" | jq -r '.batchId')
 ```
-Then poll until complete:
 @include _shared/polling.md
 @include _shared/response-shape.md
+## Common pitfalls
+❌ **Vague checklist items**
+> "Code is good"
+The worker can't gather evidence for "good". **Fix:** specific, falsifiable criteria — `"Function parseConfig has at least 3 unit tests covering: missing field, malformed JSON, empty file"`.
+❌ **Verifying without `filePaths`**
+Worker has nothing to read; verdict is speculative. **Fix:** always pass the file(s) the work landed in.
+❌ **Treating verify as the implementation step**
+Verify CHECKS work; it doesn't DO work. If a checklist item fails, dispatch `mma-delegate` to fix it, then re-verify.
+❌ **Skipping verify because "tests pass"**
+Tests verify the test cases that exist. Verify checks the acceptance criteria — which often include things tests don't (docs updated, no debug-print left, etc.).
 @include _shared/error-handling.md

package/dist/skills/multi-model-agent/SKILL.md CHANGED Viewed

@@ -1,27 +1,74 @@
 ---
 name: multi-model-agent
 description: >-
-  Router for the multi-model-agent local service. Use first when you're about to
-  delegate any tool-using work — picks the right mma-* skill for the task
-  (audit, review, verify, debug, plan execution, ad-hoc delegation) instead of
-  defaulting to inline Agent dispatches.
+  Use first whenever you're about to delegate any tool-using work — picks the
+  right mma-* skill (audit, review, verify, debug, plan execution, codebase
+  investigation, ad-hoc delegation, retry, context-block reuse, clarification
+  resume) instead of defaulting to inline Agent dispatches
 when_to_use: >-
   The user asks for work you'd normally delegate — audit, code review, checklist
-  verification, debugging, plan execution, or ad-hoc parallel tasks — AND
-  mmagent is running. Read this once, pick the matching mma-* skill, and
-  delegate there. Applies equally whether the user invoked a superpowers
-  methodology skill or just asked directly.
-version: 3.2.0
+  verification, debugging, plan execution, codebase Q&A, or ad-hoc parallel
+  tasks — AND mmagent is running. Read this once, pick the matching mma-* skill,
+  and delegate there. Applies equally whether the user invoked a superpowers
+  methodology skill or asked directly.
+version: 3.4.0
 ---
-## multi-model-agent overview
-multi-model-agent is a local HTTP service that fans out tool-using work to
-sub-agents running on different LLM providers (Claude, OpenAI-compatible, Codex).
+# multi-model-agent (router)
+## Overview
+Local HTTP service that fans out tool-using work to sub-agents on different LLM providers (Claude, OpenAI-compatible, Codex). Workers run on cheap models; the main agent stays on judgment.
+**Core principle:** Pick the most specific `mma-*` skill that fits the task. Specificity reduces input — specialized skills know their route, schema, and defaults so you write less.
+## Skill map
+```dot
+digraph picker {
+    "Plan/spec file on disk?" [shape=diamond];
+    "Audit a doc?" [shape=diamond];
+    "Review code?" [shape=diamond];
+    "Verify a checklist?" [shape=diamond];
+    "Debug a failure?" [shape=diamond];
+    "Codebase question?" [shape=diamond];
+    "mma-execute-plan" [shape=box];
+    "mma-audit" [shape=box];
+    "mma-review" [shape=box];
+    "mma-verify" [shape=box];
+    "mma-debug" [shape=box];
+    "mma-investigate" [shape=box];
+    "mma-delegate" [shape=box];
+    "Plan/spec file on disk?" -> "mma-execute-plan" [label="yes"];
+    "Plan/spec file on disk?" -> "Audit a doc?" [label="no"];
+    "Audit a doc?" -> "mma-audit" [label="yes"];
+    "Audit a doc?" -> "Review code?" [label="no"];
+    "Review code?" -> "mma-review" [label="yes"];
+    "Review code?" -> "Verify a checklist?" [label="no"];
+    "Verify a checklist?" -> "mma-verify" [label="yes"];
+    "Verify a checklist?" -> "Debug a failure?" [label="no"];
+    "Debug a failure?" -> "mma-debug" [label="yes"];
+    "Debug a failure?" -> "Codebase question?" [label="no"];
+    "Codebase question?" -> "mma-investigate" [label="yes"];
+    "Codebase question?" -> "mma-delegate" [label="no — ad-hoc"];
+}
+```
-### Preflight: auto-start the daemon if it is not running
+| Skill | Purpose |
+|---|---|
+| `mma-execute-plan` | Implement tasks from a plan or spec file (descriptors match plan headings) |
+| `mma-audit` | Audit a document/spec/config for security, correctness, style, or performance |
+| `mma-review` | Review code for quality, security, performance, correctness |
+| `mma-verify` | Verify work against a checklist (one item per worker, parallel) |
+| `mma-debug` | Debug a failure with a structured hypothesis |
+| `mma-investigate` | Codebase Q&A — structured answer with `file:line` citations + confidence |
+| `mma-delegate` | Ad-hoc implementation / research with no plan file |
+| `mma-retry` | Re-run specific failed/incomplete tasks from a previous batch by index |
+| `mma-context-blocks` | Register a reused doc once; reference by ID across N tasks |
+| `mma-clarifications` | Confirm or correct the service's proposed interpretation |
-Before any mma-* call, check the server. If it is not up, start it in the background — do NOT run `mmagent serve` synchronously, it blocks forever.
+## Preflight: auto-start the daemon if it is not running
 ```bash
 PORT=7337
@@ -37,52 +84,43 @@ fi
 Idempotent: already-running daemon → curl succeeds → no-op.
-### Auth token
+❌ `mmagent serve` (no `&`) — blocks forever, never reaches the next step.
+✅ `mmagent serve >/dev/null 2>&1 & disown` — backgrounded, releases the shell.
+## Auth token
-Set the token in your environment:
 ```bash
 export MMAGENT_AUTH_TOKEN=$(mmagent print-token)
 ```
-Or read it from the env var `MMAGENT_AUTH_TOKEN` if already set.
-Every request requires `Authorization: Bearer <token>`.
-### Skill map
-| Skill | Purpose |
-|---|---|
-| `mma-delegate` | Ad-hoc implementation/research (no plan file) |
-| `mma-audit` | Audit a document for security, correctness, style, or performance |
-| `mma-review` | Review code for quality, security, or correctness |
-| `mma-verify` | Verify work against a checklist |
-| `mma-debug` | Debug a failure with a structured hypothesis |
-| `mma-execute-plan` | Implement tasks from a plan or spec file |
-| `mma-retry` | Re-run specific failed tasks from a previous batch |
-| `mma-context-blocks` | Register large reused documents to reference by ID |
-| `mma-clarifications` | Confirm or correct the service's proposed interpretation |
+Every request requires `Authorization: Bearer $MMAGENT_AUTH_TOKEN`. The token rotates on every `mmagent serve` restart — re-export after a `pkill`/upgrade.
-### Worker tier: `agentType`
+## Worker tier: `agentType`
 `mma-delegate` and `mma-execute-plan` accept `agentType: "standard" | "complex"`. Default is `"standard"` (cheaper, faster). Pick `"complex"` when:
-- The task touches many files or requires multi-step reasoning a smaller model cannot hold in context.
-- A prior standard run came back with `filesWritten: 0` or exhausted its turn budget (visible in the verbose stream or the final envelope's `batchTimings` / `results`).
+- The task touches many files or requires multi-step reasoning a standard-tier model cannot hold in context.
+- A prior standard run came back with `filesWritten: 0` or `incompleteReason: "turn_cap"` / `"cost_cap"` / `"timeout"`.
 - The task is security-sensitive or ambiguous enough that being wrong is costly.
-`mma-audit`, `mma-review`, `mma-debug` already default to complex; `mma-verify` already defaults to standard — these are not configurable from the caller and do not need an `agentType` field.
+`mma-audit`, `mma-review`, `mma-debug`, `mma-investigate` already default to complex; `mma-verify` already defaults to standard. These are not caller-configurable.
-### General flow
+## General flow
-1. Call the appropriate `mma-*` skill → receive `{ batchId }`.
+1. Call the matching `mma-*` skill → receive `{ batchId, statusUrl }`.
 2. Poll `GET /batch/:id`: `202 text/plain` while pending (body is the running headline), `200 application/json` on terminal.
-3. Read `results` / `error` / `proposedInterpretation` from the terminal envelope.
+3. Read `results` / `error` / `proposedInterpretation` from the 7-field terminal envelope.
-If the terminal envelope has `proposedInterpretation` as a string, use `mma-clarifications` to confirm or correct it.
+If `proposedInterpretation` is a string (not the `not_applicable` sentinel) → use `mma-clarifications` to confirm/correct.
-### Diagnosing slow tasks
+## Common pitfalls
-Start the server with `mmagent serve --verbose` (or set `diagnostics.verbose: true` in config) to record `tool_call` and `llm_turn` events. Then tail them:
+❌ **Defaulting to inline Agent dispatch when mmagent is up.** mmagent workers cost ~10× less and don't pollute main context. **Why:** every inline tool call burns flagship-model tokens; that's exactly what mmagent exists to avoid.
-```bash
-mmagent logs --follow --batch=$BATCH_ID
-```
+❌ **Picking `mma-delegate` when a more specific skill fits.** Audit / review / verify / debug / investigate workers know their route's defaults and emit structured reports. **Why:** specialized skills require less input and produce richer output.
+❌ **Starting an investigation that needs to write code.** `mma-investigate` is read-only. **Fix:** dispatch `mma-delegate` with research-then-edit framing, or split: investigate → digest → edit.
+## Diagnosing slow tasks
+`mmagent serve --verbose` (or `diagnostics.verbose: true` in config) records `tool_call`, `turn_complete`, and `heartbeat` events. Tail with `mmagent logs --follow --batch=$BATCH_ID`.

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@zhixuan92/multi-model-agent",
-  "version": "3.2.0",
+  "version": "3.4.0",
   "type": "module",
   "license": "MIT",
   "description": "Standalone HTTP server for multi-model-agent. Routes tool-invocation work to Claude, Codex, or OpenAI-compatible sub-agents with async-polling REST dispatch and installable skills for Claude Code, Gemini CLI, Codex CLI, and Cursor.",
@@ -52,7 +52,7 @@
   },
   "dependencies": {
     "@asteasolutions/zod-to-openapi": "^8.5.0",
-    "@zhixuan92/multi-model-agent-core": "^3.2.0",
+    "@zhixuan92/multi-model-agent-core": "^3.4.0",
     "gray-matter": "^4.0.3",
     "minimist": "^1.2.8",
     "zod": "^4.0.0"