npm - @zhixuan92/multi-model-agent - Versions diffs - 3.3.0 → 3.4.0 - Mend

@zhixuan92/multi-model-agent 3.3.0 → 3.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (35) hide show

package/README.md +62 -33
package/dist/http/canonicalize-file-paths.d.ts +8 -0
package/dist/http/canonicalize-file-paths.d.ts.map +1 -0
package/dist/http/canonicalize-file-paths.js +43 -0
package/dist/http/canonicalize-file-paths.js.map +1 -0
package/dist/http/execution-context.d.ts.map +1 -1
package/dist/http/execution-context.js +0 -14
package/dist/http/execution-context.js.map +1 -1
package/dist/http/handlers/tools/investigate.d.ts +4 -0
package/dist/http/handlers/tools/investigate.d.ts.map +1 -0
package/dist/http/handlers/tools/investigate.js +81 -0
package/dist/http/handlers/tools/investigate.js.map +1 -0
package/dist/http/server.d.ts.map +1 -1
package/dist/http/server.js +5 -2
package/dist/http/server.js.map +1 -1
package/dist/install/discover.d.ts +1 -1
package/dist/install/discover.d.ts.map +1 -1
package/dist/install/discover.js +1 -0
package/dist/install/discover.js.map +1 -1
package/dist/openapi.d.ts.map +1 -1
package/dist/openapi.js +6 -0
package/dist/openapi.js.map +1 -1
package/dist/skills/_shared/verify-and-review.md +12 -0
package/dist/skills/mma-audit/SKILL.md +45 -18
package/dist/skills/mma-clarifications/SKILL.md +73 -29
package/dist/skills/mma-context-blocks/SKILL.md +56 -24
package/dist/skills/mma-debug/SKILL.md +54 -22
package/dist/skills/mma-delegate/SKILL.md +58 -26
package/dist/skills/mma-execute-plan/SKILL.md +55 -29
package/dist/skills/mma-investigate/SKILL.md +137 -0
package/dist/skills/mma-retry/SKILL.md +65 -22
package/dist/skills/mma-review/SKILL.md +49 -20
package/dist/skills/mma-verify/SKILL.md +49 -18
package/dist/skills/multi-model-agent/SKILL.md +84 -46
package/package.json +2 -2

package/dist/skills/multi-model-agent/SKILL.md CHANGED Viewed

@@ -1,27 +1,74 @@
 ---
 name: multi-model-agent
 description: >-
-  Router for the multi-model-agent local service. Use first when you're about to
-  delegate any tool-using work — picks the right mma-* skill for the task
-  (audit, review, verify, debug, plan execution, ad-hoc delegation) instead of
-  defaulting to inline Agent dispatches.
+  Use first whenever you're about to delegate any tool-using work — picks the
+  right mma-* skill (audit, review, verify, debug, plan execution, codebase
+  investigation, ad-hoc delegation, retry, context-block reuse, clarification
+  resume) instead of defaulting to inline Agent dispatches
 when_to_use: >-
   The user asks for work you'd normally delegate — audit, code review, checklist
-  verification, debugging, plan execution, or ad-hoc parallel tasks — AND
-  mmagent is running. Read this once, pick the matching mma-* skill, and
-  delegate there. Applies equally whether the user invoked a superpowers
-  methodology skill or just asked directly.
-version: 3.3.0
+  verification, debugging, plan execution, codebase Q&A, or ad-hoc parallel
+  tasks — AND mmagent is running. Read this once, pick the matching mma-* skill,
+  and delegate there. Applies equally whether the user invoked a superpowers
+  methodology skill or asked directly.
+version: 3.4.0
 ---
-## multi-model-agent overview
-multi-model-agent is a local HTTP service that fans out tool-using work to
-sub-agents running on different LLM providers (Claude, OpenAI-compatible, Codex).
+# multi-model-agent (router)
+## Overview
+Local HTTP service that fans out tool-using work to sub-agents on different LLM providers (Claude, OpenAI-compatible, Codex). Workers run on cheap models; the main agent stays on judgment.
+**Core principle:** Pick the most specific `mma-*` skill that fits the task. Specificity reduces input — specialized skills know their route, schema, and defaults so you write less.
+## Skill map
+```dot
+digraph picker {
+    "Plan/spec file on disk?" [shape=diamond];
+    "Audit a doc?" [shape=diamond];
+    "Review code?" [shape=diamond];
+    "Verify a checklist?" [shape=diamond];
+    "Debug a failure?" [shape=diamond];
+    "Codebase question?" [shape=diamond];
+    "mma-execute-plan" [shape=box];
+    "mma-audit" [shape=box];
+    "mma-review" [shape=box];
+    "mma-verify" [shape=box];
+    "mma-debug" [shape=box];
+    "mma-investigate" [shape=box];
+    "mma-delegate" [shape=box];
+    "Plan/spec file on disk?" -> "mma-execute-plan" [label="yes"];
+    "Plan/spec file on disk?" -> "Audit a doc?" [label="no"];
+    "Audit a doc?" -> "mma-audit" [label="yes"];
+    "Audit a doc?" -> "Review code?" [label="no"];
+    "Review code?" -> "mma-review" [label="yes"];
+    "Review code?" -> "Verify a checklist?" [label="no"];
+    "Verify a checklist?" -> "mma-verify" [label="yes"];
+    "Verify a checklist?" -> "Debug a failure?" [label="no"];
+    "Debug a failure?" -> "mma-debug" [label="yes"];
+    "Debug a failure?" -> "Codebase question?" [label="no"];
+    "Codebase question?" -> "mma-investigate" [label="yes"];
+    "Codebase question?" -> "mma-delegate" [label="no — ad-hoc"];
+}
+```
-### Preflight: auto-start the daemon if it is not running
+| Skill | Purpose |
+|---|---|
+| `mma-execute-plan` | Implement tasks from a plan or spec file (descriptors match plan headings) |
+| `mma-audit` | Audit a document/spec/config for security, correctness, style, or performance |
+| `mma-review` | Review code for quality, security, performance, correctness |
+| `mma-verify` | Verify work against a checklist (one item per worker, parallel) |
+| `mma-debug` | Debug a failure with a structured hypothesis |
+| `mma-investigate` | Codebase Q&A — structured answer with `file:line` citations + confidence |
+| `mma-delegate` | Ad-hoc implementation / research with no plan file |
+| `mma-retry` | Re-run specific failed/incomplete tasks from a previous batch by index |
+| `mma-context-blocks` | Register a reused doc once; reference by ID across N tasks |
+| `mma-clarifications` | Confirm or correct the service's proposed interpretation |
-Before any mma-* call, check the server. If it is not up, start it in the background — do NOT run `mmagent serve` synchronously, it blocks forever.
+## Preflight: auto-start the daemon if it is not running
 ```bash
 PORT=7337
@@ -37,52 +84,43 @@ fi
 Idempotent: already-running daemon → curl succeeds → no-op.
-### Auth token
+❌ `mmagent serve` (no `&`) — blocks forever, never reaches the next step.
+✅ `mmagent serve >/dev/null 2>&1 & disown` — backgrounded, releases the shell.
+## Auth token
-Set the token in your environment:
 ```bash
 export MMAGENT_AUTH_TOKEN=$(mmagent print-token)
 ```
-Or read it from the env var `MMAGENT_AUTH_TOKEN` if already set.
-Every request requires `Authorization: Bearer <token>`.
-### Skill map
-| Skill | Purpose |
-|---|---|
-| `mma-delegate` | Ad-hoc implementation/research (no plan file) |
-| `mma-audit` | Audit a document for security, correctness, style, or performance |
-| `mma-review` | Review code for quality, security, or correctness |
-| `mma-verify` | Verify work against a checklist |
-| `mma-debug` | Debug a failure with a structured hypothesis |
-| `mma-execute-plan` | Implement tasks from a plan or spec file |
-| `mma-retry` | Re-run specific failed tasks from a previous batch |
-| `mma-context-blocks` | Register large reused documents to reference by ID |
-| `mma-clarifications` | Confirm or correct the service's proposed interpretation |
+Every request requires `Authorization: Bearer $MMAGENT_AUTH_TOKEN`. The token rotates on every `mmagent serve` restart — re-export after a `pkill`/upgrade.
-### Worker tier: `agentType`
+## Worker tier: `agentType`
 `mma-delegate` and `mma-execute-plan` accept `agentType: "standard" | "complex"`. Default is `"standard"` (cheaper, faster). Pick `"complex"` when:
-- The task touches many files or requires multi-step reasoning a smaller model cannot hold in context.
-- A prior standard run came back with `filesWritten: 0` or exhausted its turn budget (visible in the verbose stream or the final envelope's `batchTimings` / `results`).
+- The task touches many files or requires multi-step reasoning a standard-tier model cannot hold in context.
+- A prior standard run came back with `filesWritten: 0` or `incompleteReason: "turn_cap"` / `"cost_cap"` / `"timeout"`.
 - The task is security-sensitive or ambiguous enough that being wrong is costly.
-`mma-audit`, `mma-review`, `mma-debug` already default to complex; `mma-verify` already defaults to standard — these are not configurable from the caller and do not need an `agentType` field.
+`mma-audit`, `mma-review`, `mma-debug`, `mma-investigate` already default to complex; `mma-verify` already defaults to standard. These are not caller-configurable.
-### General flow
+## General flow
-1. Call the appropriate `mma-*` skill → receive `{ batchId }`.
+1. Call the matching `mma-*` skill → receive `{ batchId, statusUrl }`.
 2. Poll `GET /batch/:id`: `202 text/plain` while pending (body is the running headline), `200 application/json` on terminal.
-3. Read `results` / `error` / `proposedInterpretation` from the terminal envelope.
+3. Read `results` / `error` / `proposedInterpretation` from the 7-field terminal envelope.
-If the terminal envelope has `proposedInterpretation` as a string, use `mma-clarifications` to confirm or correct it.
+If `proposedInterpretation` is a string (not the `not_applicable` sentinel) → use `mma-clarifications` to confirm/correct.
-### Diagnosing slow tasks
+## Common pitfalls
-Start the server with `mmagent serve --verbose` (or set `diagnostics.verbose: true` in config) to record `tool_call` and `llm_turn` events. Then tail them:
+❌ **Defaulting to inline Agent dispatch when mmagent is up.** mmagent workers cost ~10× less and don't pollute main context. **Why:** every inline tool call burns flagship-model tokens; that's exactly what mmagent exists to avoid.
-```bash
-mmagent logs --follow --batch=$BATCH_ID
-```
+❌ **Picking `mma-delegate` when a more specific skill fits.** Audit / review / verify / debug / investigate workers know their route's defaults and emit structured reports. **Why:** specialized skills require less input and produce richer output.
+❌ **Starting an investigation that needs to write code.** `mma-investigate` is read-only. **Fix:** dispatch `mma-delegate` with research-then-edit framing, or split: investigate → digest → edit.
+## Diagnosing slow tasks
+`mmagent serve --verbose` (or `diagnostics.verbose: true` in config) records `tool_call`, `turn_complete`, and `heartbeat` events. Tail with `mmagent logs --follow --batch=$BATCH_ID`.

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@zhixuan92/multi-model-agent",
-  "version": "3.3.0",
+  "version": "3.4.0",
   "type": "module",
   "license": "MIT",
   "description": "Standalone HTTP server for multi-model-agent. Routes tool-invocation work to Claude, Codex, or OpenAI-compatible sub-agents with async-polling REST dispatch and installable skills for Claude Code, Gemini CLI, Codex CLI, and Cursor.",
@@ -52,7 +52,7 @@
   },
   "dependencies": {
     "@asteasolutions/zod-to-openapi": "^8.5.0",
-    "@zhixuan92/multi-model-agent-core": "^3.3.0",
+    "@zhixuan92/multi-model-agent-core": "^3.4.0",
     "gray-matter": "^4.0.3",
     "minimist": "^1.2.8",
     "zod": "^4.0.0"