npm - @zhixuan92/multi-model-agent - Versions diffs - 5.0.1 → 5.0.2 - Mend

@zhixuan92/multi-model-agent 5.0.1 → 5.0.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (2) hide show

package/README.md +19 -10
package/package.json +9 -9

package/README.md CHANGED Viewed

@@ -2,15 +2,17 @@
 [![npm](https://img.shields.io/npm/v/@zhixuan92/multi-model-agent?label=npm)](https://www.npmjs.com/package/@zhixuan92/multi-model-agent)
-Local HTTP daemon that delegates tool-using work to sub-agents on different LLM providers. One process serves Claude Code, Codex CLI, Gemini CLI, and Cursor via installable skills.
+**The horizontal harness for AI engineering** — a local HTTP daemon that routes the right agent to the right task, gets it done right with **cross-agent review**, and caps spend with **bounded execution**. One process serves Claude Code, Codex CLI, Gemini CLI, and Cursor via installable skills; bring your own keys.
-*Renamed from `@zhixuan92/multi-model-agent-mcp` in 3.0.0 — the package no longer uses MCP. See [CHANGELOG](https://github.com/zhixuan312/multi-model-agent/blob/master/CHANGELOG.md).*
+**The bet:** a reviewed multi-agent harness matches or beats a single frontier model, at a fraction of the cost. **Models go deep; we connect them wide** — and the engineer always keeps the judgment.
+*Renamed from `@zhixuan92/multi-model-agent-mcp` in 3.0.0 — the package no longer uses MCP. North star: [DIRECTION.md](https://github.com/zhixuan312/multi-model-agent/blob/master/DIRECTION.md). See [CHANGELOG](https://github.com/zhixuan312/multi-model-agent/blob/master/CHANGELOG.md).*
 ## Why
 Your flagship model reasoning about architecture is money well spent. That same model grepping files, writing boilerplate, and running tests is waste.
-| Project | MMA — MiniMax-M2.7 | MMA — DeepSeek V4 Pro | Flagship: Claude Opus 4.7 |
+| Project | MMA — MiniMax-M3 | MMA — DeepSeek V4 Pro | Flagship: Claude Opus 4.8 |
 |---|---|---|---|
 | Feature impl (30 files, ~50 tasks) | **$1.50** · **33× ROI** · ~35 min | **~$2.50** · **20× ROI** · ~15 min | $50 · 1× · *baseline* |
 | Full web SPA (59 tasks) | **$5.65** · **12× ROI** · ~50 min | **~$9** · **7.5× ROI** · ~22 min | $68 · 1× · *baseline* |
@@ -18,6 +20,13 @@ Your flagship model reasoning about architecture is money well spent. That same
 Plus structural quality: implementation and review run on **different** model families — different blind spots, catches what self-review can't.
+## How it works
+- **Three layers.** Your own agent keeps the judgment on top; beneath it sit two labor slots you configure — `complex` and `standard` (labor *categories*, not fixed intelligence tiers).
+- **AIDLC + rods.** Each tool is a *rod* — a gate over one stage of the AI Development Life Cycle: `investigate` / `research` feed the front, `audit` gates the spec and plan, `delegate` / `execute-plan` build, `review` / `debug` guard the output, `retry` closes the loop, `journal` remembers. The harness instruments the lifecycle; the engineer authors it.
+- **Reviewed by default.** Write tasks run implement → spec review → quality review → rework with implementer and reviewer on **different agents**; read-only rods return findings and skip review.
+- **Built on the providers' own runtimes.** Claude work runs through the **Claude Agent SDK**, OpenAI/Codex work through the official **Codex CLI** — when a provider deepens its runtime the harness gets better for free, and a Claude Code task can run on Codex underneath (or vice versa).
 ## Initial setup
 Four steps, in order.
@@ -42,7 +51,7 @@ mmagent sync-skills --target=claude-code    # claude-code | gemini-cli | codex-c
 Your **main model** is **the model you'd use without mmagent** — the cost baseline for every per-task headline (`$X actual / $Y saved vs <mainModel> (Z× ROI)`).
-- Heavy Claude Code user → `claude-opus-4-7`
+- Heavy Claude Code user → `claude-opus-4-8`
 - ChatGPT-led workflow → `gpt-5.5`
 - Gemini-led workflow → `gemini-3.1-pro`
@@ -50,7 +59,7 @@ Both `X-MMA-Client` and `X-MMA-Main-Model` are required on tool routes (`400 cli
 ```bash
 export MMAGENT_CLIENT=claude-code              # or codex-cli, gemini-cli, cursor
-export MMAGENT_MAIN_MODEL=claude-opus-4-7      # whatever your calling agent runs on
+export MMAGENT_MAIN_MODEL=claude-opus-4-8      # whatever your calling agent runs on
 ```
 ### 3. Write the config
@@ -119,7 +128,7 @@ Skills are the surface your AI client sees. `mmagent sync-skills` writes them to
 | `mma-research` | `POST /research` | External multi-source research with citations — arxiv, semantic_scholar, github_search, brave-with-`site:`-filters — for a focused question. |
 | `mma-debug` | `POST /debug` | A test fails, a build breaks, or behavior is unexpected — delegate the reproduce/trace, keep the hypothesis on the main agent. |
 | `mma-review` | `POST /review` | Source-code review (pre-merge, post-implementation, security-focused). One worker per file, in parallel. |
-| `mma-audit` | `POST /audit` | Audit a spec / plan / design doc / recommendation doc for executability blockers (contradictions, ambiguity, recommendation-coherence gaps). Default is the comprehensive sweep; `security` and `performance` are narrow opt-in lenses. |
+| `mma-audit` | `POST /audit` | Audit a prose artifact against a named criteria set — pick the **subtype**: `default` (general prose-coherence), `spec` (requirements: testability, decision-trace), `plan` (a plan verified against the actual codebase), `skill` (a SKILL.md). Run `subtype=plan` before `mma-execute-plan`. |
 | `mma-journal-record` | `POST /journal-record` | Record a durable project learning into the cross-agent journal — what was tried, what happened, the lesson — integrated into a graph of ADR "node" files under `.mmagent/journal/` (create / refine / supersede / merge with typed edges). |
 | `mma-journal-recall` | `POST /journal-recall` | Recall relevant prior learnings from the journal for a question or situation — traverses the node graph rather than keyword-filtering. |
@@ -141,10 +150,10 @@ You: "Execute tasks 3, 4, and 5 from docs/plans/auth-rewrite.md"
 ↓
 Client picks mma-execute-plan (plan file on disk, multiple independent tasks)
 ↓
-mmagent dispatches 3 workers in parallel on the standard agent (e.g. MiniMax-M2.7),
+mmagent dispatches 3 workers in parallel on the standard agent (e.g. MiniMax-M3),
 each runs cross-agent review on the complex agent, returns a structured report.
 ↓
-You see one consolidated headline: "$0.04 actual / $1.20 saved vs claude-opus-4-7 (30× ROI)"
+You see one consolidated headline: "$0.04 actual / $1.20 saved vs claude-opus-4-8 (30× ROI)"
 ```
 **Sample 2 — debug a failing test (multiple skills chained)**
@@ -292,9 +301,9 @@ Full design rationale: [DIRECTION.md](https://github.com/zhixuan312/multi-model-
 | TLS `handshake_failure` to a known-good telemetry endpoint | Local DNS cache is stale. `sudo dscacheutil -flushcache && sudo killall -HUP mDNSResponder` (macOS); restart the daemon so its process re-resolves |
 | Local telemetry queue stops draining | Daemon's flusher is in exponential backoff after a transport failure (capped at 1 hr). Restart the daemon to force an immediate boot-flush |
-## What's new in 4.9.0
+## What's new in 5.0
-- **Delegate skill passthrough.** `POST /delegate` tasks accept an optional `skills: string[]`. Each name is resolved from the caller's skill store (by `X-MMA-Client`), staged into an ephemeral per-task directory, and delivered natively — Claude workers via a local SDK plugin, Codex workers via an ephemeral `CODEX_HOME`. Unknown names hard-fail just that task; omitting `skills` is byte-for-byte the previous behavior.
+- **Runtime migrated to Bun**, and this package now ships as **standalone per-platform binaries** with Bun embedded: `npm i -g @zhixuan92/multi-model-agent` resolves a native binary that needs neither Node nor Bun to run (npm uses Node ≥18 only for the install shim). Behavior is identical to 4.x.
 Full history: [CHANGELOG](https://github.com/zhixuan312/multi-model-agent/blob/master/CHANGELOG.md).

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@zhixuan92/multi-model-agent",
-  "version": "5.0.1",
+  "version": "5.0.2",
   "description": "Standalone HTTP server for multi-model-agent. Routes tool-invocation work to Claude, Codex, or OpenAI-compatible sub-agents with async-polling REST dispatch and installable skills for Claude Code, Gemini CLI, Codex CLI, and Cursor.",
   "keywords": [
     "llm",
@@ -26,14 +26,14 @@
     "postinstall": "node postinstall.mjs"
   },
   "optionalDependencies": {
-    "@zhixuan92/mmagent-darwin-arm64": "5.0.1",
-    "@zhixuan92/mmagent-darwin-x64": "5.0.1",
-    "@zhixuan92/mmagent-linux-x64": "5.0.1",
-    "@zhixuan92/mmagent-linux-arm64": "5.0.1",
-    "@zhixuan92/mmagent-linux-x64-musl": "5.0.1",
-    "@zhixuan92/mmagent-linux-arm64-musl": "5.0.1",
-    "@zhixuan92/mmagent-windows-x64": "5.0.1",
-    "@zhixuan92/mmagent-windows-arm64": "5.0.1"
+    "@zhixuan92/mmagent-darwin-arm64": "5.0.2",
+    "@zhixuan92/mmagent-darwin-x64": "5.0.2",
+    "@zhixuan92/mmagent-linux-x64": "5.0.2",
+    "@zhixuan92/mmagent-linux-arm64": "5.0.2",
+    "@zhixuan92/mmagent-linux-x64-musl": "5.0.2",
+    "@zhixuan92/mmagent-linux-arm64-musl": "5.0.2",
+    "@zhixuan92/mmagent-windows-x64": "5.0.2",
+    "@zhixuan92/mmagent-windows-arm64": "5.0.2"
   },
   "files": [
     "bin",