@zhixuan92/multi-model-agent 5.0.1 → 5.0.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (2) hide show
  1. package/README.md +19 -10
  2. package/package.json +9 -9
package/README.md CHANGED
@@ -2,15 +2,17 @@
2
2
 
3
3
  [![npm](https://img.shields.io/npm/v/@zhixuan92/multi-model-agent?label=npm)](https://www.npmjs.com/package/@zhixuan92/multi-model-agent)
4
4
 
5
- Local HTTP daemon that delegates tool-using work to sub-agents on different LLM providers. One process serves Claude Code, Codex CLI, Gemini CLI, and Cursor via installable skills.
5
+ **The horizontal harness for AI engineering** — a local HTTP daemon that routes the right agent to the right task, gets it done right with **cross-agent review**, and caps spend with **bounded execution**. One process serves Claude Code, Codex CLI, Gemini CLI, and Cursor via installable skills; bring your own keys.
6
6
 
7
- *Renamed from `@zhixuan92/multi-model-agent-mcp` in 3.0.0 — the package no longer uses MCP. See [CHANGELOG](https://github.com/zhixuan312/multi-model-agent/blob/master/CHANGELOG.md).*
7
+ **The bet:** a reviewed multi-agent harness matches or beats a single frontier model, at a fraction of the cost. **Models go deep; we connect them wide** and the engineer always keeps the judgment.
8
+
9
+ *Renamed from `@zhixuan92/multi-model-agent-mcp` in 3.0.0 — the package no longer uses MCP. North star: [DIRECTION.md](https://github.com/zhixuan312/multi-model-agent/blob/master/DIRECTION.md). See [CHANGELOG](https://github.com/zhixuan312/multi-model-agent/blob/master/CHANGELOG.md).*
8
10
 
9
11
  ## Why
10
12
 
11
13
  Your flagship model reasoning about architecture is money well spent. That same model grepping files, writing boilerplate, and running tests is waste.
12
14
 
13
- | Project | MMA — MiniMax-M2.7 | MMA — DeepSeek V4 Pro | Flagship: Claude Opus 4.7 |
15
+ | Project | MMA — MiniMax-M3 | MMA — DeepSeek V4 Pro | Flagship: Claude Opus 4.8 |
14
16
  |---|---|---|---|
15
17
  | Feature impl (30 files, ~50 tasks) | **$1.50** · **33× ROI** · ~35 min | **~$2.50** · **20× ROI** · ~15 min | $50 · 1× · *baseline* |
16
18
  | Full web SPA (59 tasks) | **$5.65** · **12× ROI** · ~50 min | **~$9** · **7.5× ROI** · ~22 min | $68 · 1× · *baseline* |
@@ -18,6 +20,13 @@ Your flagship model reasoning about architecture is money well spent. That same
18
20
 
19
21
  Plus structural quality: implementation and review run on **different** model families — different blind spots, catches what self-review can't.
20
22
 
23
+ ## How it works
24
+
25
+ - **Three layers.** Your own agent keeps the judgment on top; beneath it sit two labor slots you configure — `complex` and `standard` (labor *categories*, not fixed intelligence tiers).
26
+ - **AIDLC + rods.** Each tool is a *rod* — a gate over one stage of the AI Development Life Cycle: `investigate` / `research` feed the front, `audit` gates the spec and plan, `delegate` / `execute-plan` build, `review` / `debug` guard the output, `retry` closes the loop, `journal` remembers. The harness instruments the lifecycle; the engineer authors it.
27
+ - **Reviewed by default.** Write tasks run implement → spec review → quality review → rework with implementer and reviewer on **different agents**; read-only rods return findings and skip review.
28
+ - **Built on the providers' own runtimes.** Claude work runs through the **Claude Agent SDK**, OpenAI/Codex work through the official **Codex CLI** — when a provider deepens its runtime the harness gets better for free, and a Claude Code task can run on Codex underneath (or vice versa).
29
+
21
30
  ## Initial setup
22
31
 
23
32
  Four steps, in order.
@@ -42,7 +51,7 @@ mmagent sync-skills --target=claude-code # claude-code | gemini-cli | codex-c
42
51
 
43
52
  Your **main model** is **the model you'd use without mmagent** — the cost baseline for every per-task headline (`$X actual / $Y saved vs <mainModel> (Z× ROI)`).
44
53
 
45
- - Heavy Claude Code user → `claude-opus-4-7`
54
+ - Heavy Claude Code user → `claude-opus-4-8`
46
55
  - ChatGPT-led workflow → `gpt-5.5`
47
56
  - Gemini-led workflow → `gemini-3.1-pro`
48
57
 
@@ -50,7 +59,7 @@ Both `X-MMA-Client` and `X-MMA-Main-Model` are required on tool routes (`400 cli
50
59
 
51
60
  ```bash
52
61
  export MMAGENT_CLIENT=claude-code # or codex-cli, gemini-cli, cursor
53
- export MMAGENT_MAIN_MODEL=claude-opus-4-7 # whatever your calling agent runs on
62
+ export MMAGENT_MAIN_MODEL=claude-opus-4-8 # whatever your calling agent runs on
54
63
  ```
55
64
 
56
65
  ### 3. Write the config
@@ -119,7 +128,7 @@ Skills are the surface your AI client sees. `mmagent sync-skills` writes them to
119
128
  | `mma-research` | `POST /research` | External multi-source research with citations — arxiv, semantic_scholar, github_search, brave-with-`site:`-filters — for a focused question. |
120
129
  | `mma-debug` | `POST /debug` | A test fails, a build breaks, or behavior is unexpected — delegate the reproduce/trace, keep the hypothesis on the main agent. |
121
130
  | `mma-review` | `POST /review` | Source-code review (pre-merge, post-implementation, security-focused). One worker per file, in parallel. |
122
- | `mma-audit` | `POST /audit` | Audit a spec / plan / design doc / recommendation doc for executability blockers (contradictions, ambiguity, recommendation-coherence gaps). Default is the comprehensive sweep; `security` and `performance` are narrow opt-in lenses. |
131
+ | `mma-audit` | `POST /audit` | Audit a prose artifact against a named criteria set pick the **subtype**: `default` (general prose-coherence), `spec` (requirements: testability, decision-trace), `plan` (a plan verified against the actual codebase), `skill` (a SKILL.md). Run `subtype=plan` before `mma-execute-plan`. |
123
132
  | `mma-journal-record` | `POST /journal-record` | Record a durable project learning into the cross-agent journal — what was tried, what happened, the lesson — integrated into a graph of ADR "node" files under `.mmagent/journal/` (create / refine / supersede / merge with typed edges). |
124
133
  | `mma-journal-recall` | `POST /journal-recall` | Recall relevant prior learnings from the journal for a question or situation — traverses the node graph rather than keyword-filtering. |
125
134
 
@@ -141,10 +150,10 @@ You: "Execute tasks 3, 4, and 5 from docs/plans/auth-rewrite.md"
141
150
 
142
151
  Client picks mma-execute-plan (plan file on disk, multiple independent tasks)
143
152
 
144
- mmagent dispatches 3 workers in parallel on the standard agent (e.g. MiniMax-M2.7),
153
+ mmagent dispatches 3 workers in parallel on the standard agent (e.g. MiniMax-M3),
145
154
  each runs cross-agent review on the complex agent, returns a structured report.
146
155
 
147
- You see one consolidated headline: "$0.04 actual / $1.20 saved vs claude-opus-4-7 (30× ROI)"
156
+ You see one consolidated headline: "$0.04 actual / $1.20 saved vs claude-opus-4-8 (30× ROI)"
148
157
  ```
149
158
 
150
159
  **Sample 2 — debug a failing test (multiple skills chained)**
@@ -292,9 +301,9 @@ Full design rationale: [DIRECTION.md](https://github.com/zhixuan312/multi-model-
292
301
  | TLS `handshake_failure` to a known-good telemetry endpoint | Local DNS cache is stale. `sudo dscacheutil -flushcache && sudo killall -HUP mDNSResponder` (macOS); restart the daemon so its process re-resolves |
293
302
  | Local telemetry queue stops draining | Daemon's flusher is in exponential backoff after a transport failure (capped at 1 hr). Restart the daemon to force an immediate boot-flush |
294
303
 
295
- ## What's new in 4.9.0
304
+ ## What's new in 5.0
296
305
 
297
- - **Delegate skill passthrough.** `POST /delegate` tasks accept an optional `skills: string[]`. Each name is resolved from the caller's skill store (by `X-MMA-Client`), staged into an ephemeral per-task directory, and delivered natively Claude workers via a local SDK plugin, Codex workers via an ephemeral `CODEX_HOME`. Unknown names hard-fail just that task; omitting `skills` is byte-for-byte the previous behavior.
306
+ - **Runtime migrated to Bun**, and this package now ships as **standalone per-platform binaries** with Bun embedded: `npm i -g @zhixuan92/multi-model-agent` resolves a native binary that needs neither Node nor Bun to run (npm uses Node ≥18 only for the install shim). Behavior is identical to 4.x.
298
307
 
299
308
  Full history: [CHANGELOG](https://github.com/zhixuan312/multi-model-agent/blob/master/CHANGELOG.md).
300
309
 
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@zhixuan92/multi-model-agent",
3
- "version": "5.0.1",
3
+ "version": "5.0.2",
4
4
  "description": "Standalone HTTP server for multi-model-agent. Routes tool-invocation work to Claude, Codex, or OpenAI-compatible sub-agents with async-polling REST dispatch and installable skills for Claude Code, Gemini CLI, Codex CLI, and Cursor.",
5
5
  "keywords": [
6
6
  "llm",
@@ -26,14 +26,14 @@
26
26
  "postinstall": "node postinstall.mjs"
27
27
  },
28
28
  "optionalDependencies": {
29
- "@zhixuan92/mmagent-darwin-arm64": "5.0.1",
30
- "@zhixuan92/mmagent-darwin-x64": "5.0.1",
31
- "@zhixuan92/mmagent-linux-x64": "5.0.1",
32
- "@zhixuan92/mmagent-linux-arm64": "5.0.1",
33
- "@zhixuan92/mmagent-linux-x64-musl": "5.0.1",
34
- "@zhixuan92/mmagent-linux-arm64-musl": "5.0.1",
35
- "@zhixuan92/mmagent-windows-x64": "5.0.1",
36
- "@zhixuan92/mmagent-windows-arm64": "5.0.1"
29
+ "@zhixuan92/mmagent-darwin-arm64": "5.0.2",
30
+ "@zhixuan92/mmagent-darwin-x64": "5.0.2",
31
+ "@zhixuan92/mmagent-linux-x64": "5.0.2",
32
+ "@zhixuan92/mmagent-linux-arm64": "5.0.2",
33
+ "@zhixuan92/mmagent-linux-x64-musl": "5.0.2",
34
+ "@zhixuan92/mmagent-linux-arm64-musl": "5.0.2",
35
+ "@zhixuan92/mmagent-windows-x64": "5.0.2",
36
+ "@zhixuan92/mmagent-windows-arm64": "5.0.2"
37
37
  },
38
38
  "files": [
39
39
  "bin",