npm - @zhixuan92/multi-model-agent - Versions diffs - 5.0.0 → 5.0.2 - Mend

@zhixuan92/multi-model-agent 5.0.0 → 5.0.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (3) hide show

package/LICENSE ADDED Viewed

@@ -0,0 +1,21 @@
+MIT License
+Copyright (c) 2026 Zhang Zhixuan
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

package/README.md ADDED Viewed

@@ -0,0 +1,316 @@
+# @zhixuan92/multi-model-agent
+[![npm](https://img.shields.io/npm/v/@zhixuan92/multi-model-agent?label=npm)](https://www.npmjs.com/package/@zhixuan92/multi-model-agent)
+**The horizontal harness for AI engineering** — a local HTTP daemon that routes the right agent to the right task, gets it done right with **cross-agent review**, and caps spend with **bounded execution**. One process serves Claude Code, Codex CLI, Gemini CLI, and Cursor via installable skills; bring your own keys.
+**The bet:** a reviewed multi-agent harness matches or beats a single frontier model, at a fraction of the cost. **Models go deep; we connect them wide** — and the engineer always keeps the judgment.
+*Renamed from `@zhixuan92/multi-model-agent-mcp` in 3.0.0 — the package no longer uses MCP. North star: [DIRECTION.md](https://github.com/zhixuan312/multi-model-agent/blob/master/DIRECTION.md). See [CHANGELOG](https://github.com/zhixuan312/multi-model-agent/blob/master/CHANGELOG.md).*
+## Why
+Your flagship model reasoning about architecture is money well spent. That same model grepping files, writing boilerplate, and running tests is waste.
+| Project | MMA — MiniMax-M3 | MMA — DeepSeek V4 Pro | Flagship: Claude Opus 4.8 |
+|---|---|---|---|
+| Feature impl (30 files, ~50 tasks) | **$1.50** · **33× ROI** · ~35 min | **~$2.50** · **20× ROI** · ~15 min | $50 · 1× · *baseline* |
+| Full web SPA (59 tasks) | **$5.65** · **12× ROI** · ~50 min | **~$9** · **7.5× ROI** · ~22 min | $68 · 1× · *baseline* |
+| Backend microservice (91 tasks) | **$8.21** · **13× ROI** · ~1.5 hrs | **~$14** · **7.5× ROI** · ~40 min | $104 · 1× · *baseline* |
+Plus structural quality: implementation and review run on **different** model families — different blind spots, catches what self-review can't.
+## How it works
+- **Three layers.** Your own agent keeps the judgment on top; beneath it sit two labor slots you configure — `complex` and `standard` (labor *categories*, not fixed intelligence tiers).
+- **AIDLC + rods.** Each tool is a *rod* — a gate over one stage of the AI Development Life Cycle: `investigate` / `research` feed the front, `audit` gates the spec and plan, `delegate` / `execute-plan` build, `review` / `debug` guard the output, `retry` closes the loop, `journal` remembers. The harness instruments the lifecycle; the engineer authors it.
+- **Reviewed by default.** Write tasks run implement → spec review → quality review → rework with implementer and reviewer on **different agents**; read-only rods return findings and skip review.
+- **Built on the providers' own runtimes.** Claude work runs through the **Claude Agent SDK**, OpenAI/Codex work through the official **Codex CLI** — when a provider deepens its runtime the harness gets better for free, and a Claude Code task can run on Codex underneath (or vice versa).
+## Initial setup
+Four steps, in order.
+### 1. Install CLI + skills
+```bash
+npm i -g @zhixuan92/multi-model-agent       # standalone binary (Bun embedded) — npm uses Node ≥18 only to install; the daemon needs no Node/Bun
+mmagent sync-skills                         # auto-detect all clients (idempotent install + update)
+# or pin a specific target:
+mmagent sync-skills --target=claude-code    # claude-code | gemini-cli | codex-cli | cursor
+```
+| Client | Install location | Loaded |
+|---|---|---|
+| Claude Code | `~/.claude/skills/` | next session |
+| Gemini CLI | Gemini CLI skill directory | next session (requires version with external-skill support) |
+| Codex CLI | `~/.codex/skills/` | next session |
+| Cursor | Cursor extension manifest | restart Cursor |
+### 2. Choose your main model — intentionally
+Your **main model** is **the model you'd use without mmagent** — the cost baseline for every per-task headline (`$X actual / $Y saved vs <mainModel> (Z× ROI)`).
+- Heavy Claude Code user → `claude-opus-4-8`
+- ChatGPT-led workflow → `gpt-5.5`
+- Gemini-led workflow → `gemini-3.1-pro`
+Both `X-MMA-Client` and `X-MMA-Main-Model` are required on tool routes (`400 client_required` / `400 main_model_required` if missing). The 4.3.0 auto-detect chain was reverted in 4.4.0 — the claude-agent-sdk used by claude-tier workers wrote JSONL files into the same `~/.claude/projects/<slug>/` the resolver was reading, so auto-detect could return a worker's model as the calling agent's "main". The calling client is the only reliable source.
+```bash
+export MMAGENT_CLIENT=claude-code              # or codex-cli, gemini-cli, cursor
+export MMAGENT_MAIN_MODEL=claude-opus-4-8      # whatever your calling agent runs on
+```
+### 3. Write the config
+Paste this into your shell — it creates `~/.multi-model/config.json` with the minimum-viable starter config (overwrites any existing file at that path):
+```bash
+mkdir -p ~/.multi-model && cat > ~/.multi-model/config.json <<'EOF'
+{
+  "agents": {
+    "standard": {
+      "type": "claude-compatible",
+      "model": "deepseek-v4-pro",
+      "baseUrl": "https://api.deepseek.com/anthropic",
+      "apiKeyEnv": "DEEPSEEK_API_KEY"
+    },
+    "complex": {
+      "type": "codex",
+      "model": "gpt-5.5"
+    }
+  }
+}
+EOF
+```
+That's the whole minimum-viable file. All other knobs (`server.*`, `defaults.timeoutMs`, `defaults.tools`, …) have sane built-in defaults — see [Configuration reference](#configuration-reference).
+### 4. Start the daemon + verify
+Two ways — pick one:
+**Option A — let your AI client auto-spawn it.** Open your client (Claude Code / Codex CLI / etc.) and call any mma-* skill; the skill's preflight check spawns `mmagent serve` on `127.0.0.1:7337` and reuses it for every subsequent call.
+**Option B — start it manually.** Useful when you want the daemon up before opening a client:
+```bash
+mmagent serve                          # 127.0.0.1:7337 by default
+curl -s http://localhost:7337/health   # → {"status":"ok"}
+```
+For an always-on background install (survives reboots): [launchd / systemd templates](./scripts/README.md).
+## Updating
+```bash
+npm install -g @zhixuan92/multi-model-agent@latest
+pkill -f "mmagent serve"            # stop the running daemon
+mmagent sync-skills                 # reconcile installed skills with the new bundle
+# next AI-client session respawns the daemon via the skill preflight
+```
+A drift warning prints on `mmagent serve` if installed skills are older than the daemon. To rotate the auth token: `rm ~/.multi-model/auth-token && mmagent serve`.
+## Skills
+Skills are the surface your AI client sees. `mmagent sync-skills` writes them to the client's skill directory and keeps them reconciled across upgrades; the client then picks the right one based on what you ask. You don't call them by hand — you describe the work, the client routes it to the matching skill, the skill calls the matching REST endpoint.
+### Work-delegation skills
+| Skill | Target endpoint | Use when |
+|---|---|---|
+| `mma-delegate` | `POST /delegate` | Ad-hoc implementation or research tasks **without** a plan file — run them in parallel on cheap workers. |
+| `mma-execute-plan` | `POST /execute-plan` | A plan / spec markdown exists on disk with numbered task headings; implement one or more tasks from it. |
+| `mma-investigate` | `POST /investigate` | Answer a question about *this* codebase ("how does X work", "where is Y called") without burning main-context tokens on grep + reads. |
+| `mma-explore` | (orchestrator playbook — no dedicated route) | Fans out `mma-investigate` + `mma-research` + `mma-journal-recall` in parallel and synthesises 3–5 distinct directions. Run before `superpowers:brainstorming`. Not for "where is X" questions (use `mma-investigate`). |
+| `mma-research` | `POST /research` | External multi-source research with citations — arxiv, semantic_scholar, github_search, brave-with-`site:`-filters — for a focused question. |
+| `mma-debug` | `POST /debug` | A test fails, a build breaks, or behavior is unexpected — delegate the reproduce/trace, keep the hypothesis on the main agent. |
+| `mma-review` | `POST /review` | Source-code review (pre-merge, post-implementation, security-focused). One worker per file, in parallel. |
+| `mma-audit` | `POST /audit` | Audit a prose artifact against a named criteria set — pick the **subtype**: `default` (general prose-coherence), `spec` (requirements: testability, decision-trace), `plan` (a plan verified against the actual codebase), `skill` (a SKILL.md). Run `subtype=plan` before `mma-execute-plan`. |
+| `mma-journal-record` | `POST /journal-record` | Record a durable project learning into the cross-agent journal — what was tried, what happened, the lesson — integrated into a graph of ADR "node" files under `.mmagent/journal/` (create / refine / supersede / merge with typed edges). |
+| `mma-journal-recall` | `POST /journal-recall` | Recall relevant prior learnings from the journal for a question or situation — traverses the node graph rather than keyword-filtering. |
+### Plumbing skills
+| Skill | Target endpoint | Use when |
+|---|---|---|
+| `mma-context-blocks` | `POST/DELETE /context-blocks` | The same large doc (>~2 KB) will be referenced by 2+ subsequent mma-* calls — register once, pass the ID instead of re-uploading. |
+| `mma-retry` | `POST /retry` | A previous batch came back partial — re-run only the failed indices without re-dispatching the whole batch. |
+The `multi-model-agent` skill (no `mma-` prefix) is a top-level overview your client reads first to pick which `mma-*` skill applies.
+### Two generic usage samples
+**Sample 1 — implement a feature from a plan**
+```
+You: "Execute tasks 3, 4, and 5 from docs/plans/auth-rewrite.md"
+↓
+Client picks mma-execute-plan (plan file on disk, multiple independent tasks)
+↓
+mmagent dispatches 3 workers in parallel on the standard agent (e.g. MiniMax-M3),
+each runs cross-agent review on the complex agent, returns a structured report.
+↓
+You see one consolidated headline: "$0.04 actual / $1.20 saved vs claude-opus-4-8 (30× ROI)"
+```
+**Sample 2 — debug a failing test (multiple skills chained)**
+```
+You: "tests/auth/session.test.ts is failing intermittently after the token-refresh refactor — figure it out and fix it"
+↓
+Step 1 — mma-context-blocks
+  The failing test output + the refactor diff are ~8 KB and will be referenced by every
+  downstream call. Register once, get a contextBlockId, reuse it.
+↓
+Step 2 — mma-debug
+  Worker reproduces the failure, traces across session.ts + token-refresh.ts, returns a
+  root-cause hypothesis: "race between refresh-in-flight and session.invalidate()".
+  Main agent stays on the hypothesis, decides the fix shape.
+↓
+Step 3 — mma-delegate
+  Dispatch the actual code change as an ad-hoc task (no plan file). Worker writes the
+  fix, runs the failing test 20× to confirm the race is gone.
+↓
+Step 4 — mma-review (with the acceptance checklist in the brief)
+  Reviewer worker checks the diff against the acceptance criteria: (a) failing
+  test now passes, (b) no other auth tests regressed, (c) refresh path still
+  emits the expected telemetry.
+↓
+Total cost: ~$0.08. Main-context tokens consumed: just the hypotheses and the verdicts.
+```
+## Configuration reference
+### Lookup order
+`--config <path>` → `$MMAGENT_CONFIG` → `<cwd>/.multi-model-agent.json` → `~/.multi-model/config.json`.
+### Agent types
+| Type | Auth | When to pick |
+|---|---|---|
+| `claude` | Local Claude Code OAuth (`claude login`) | Stay on Claude end-to-end with subscription auth |
+| `codex` | Codex CLI subscription (`codex login`) | OpenAI flagship work without juggling API keys |
+| `openai-compatible` | `apiKey` or `apiKeyEnv` | Any OpenAI-compatible endpoint — MiniMax, Groq, Together, local vLLM, plus OpenAI direct |
+| `claude-compatible` | `apiKey` or `apiKeyEnv` | Vendors exposing an Anthropic-format endpoint (DeepSeek's `/anthropic`, etc.) — preserves thinking content blocks across multi-turn tool use |
+DeepSeek V4 Pro under `claude-compatible` keeps reasoning ON; under `openai-compatible` it works but auto-disables thinking.
+### Tuning
+Every `defaults` knob has a built-in. Override only when you need to.
+| Field | Default | What it does |
+|---|---|---|
+| `defaults.timeoutMs` | `3600000` (60 min) | Hard task-level wall-clock cap (bumped from 30 min in 3.9.0) |
+| `defaults.stallTimeoutMs` | `1200000` (20 min) | Aborts in-flight runs idle for this long (bumped from 10 min in 3.9.0) |
+| `defaults.tools` | `"full"` | Tool surface: `none` / `readonly` / `no-shell` / `full` |
+| `defaults.sandboxPolicy` | `"cwd-only"` | Path-traversal + symlink confinement to the request's `cwd` |
+### Telemetry
+**Off by default.** Opt in via `mmagent telemetry enable` (or `MMAGENT_TELEMETRY=1`), or set in config:
+```json
+{
+  "agents": { "...": "..." },
+  "telemetry": { "enabled": true }
+}
+```
+Every upload batch is signed with a per-install Ed25519 key (TOFU; lives at `~/.multi-model/identity.json`); receivers can verify it came from the install whose `installId` it claims. Full disclosure: [PRIVACY.md](https://github.com/zhixuan312/multi-model-agent/blob/master/PRIVACY.md).
+### Verbose / diagnostics
+```json
+{
+  "agents": { "...": "..." },
+  "diagnostics": { "log": true, "verbose": true }
+}
+```
+Or per-run via `mmagent serve --verbose --log`. JSONL goes to `~/.multi-model/logs/mmagent-<date>.jsonl`; large request bodies (>16 KB UTF-8) spill to `~/.multi-model/logs/requests/<batchId>.json`.
+> **Note:** verbose logs may include prompts, file paths, and other task content — disable for production servers handling sensitive data.
+### Auth token
+Generated on first `mmagent serve`. Retrieve with `mmagent print-token`, or set `MMAGENT_AUTH_TOKEN` to override.
+## REST API
+16 endpoints. All tool endpoints are async: they return `202 { batchId, statusUrl }` immediately and the executor runs in the background. Poll `GET /batch/:id` for the terminal envelope.
+| Endpoint | Purpose |
+|---|---|
+| `POST /delegate?cwd=<abs>` | Fan out ad-hoc tasks to sub-agents |
+| `POST /audit?cwd=<abs>` | Audit a document (or a code-execution plan via `subtype: 'plan'`) |
+| `POST /review?cwd=<abs>` | Review code (pass acceptance checklists in the brief for verification-style checks) |
+| `POST /debug?cwd=<abs>` | Debug a failure with a hypothesis |
+| `POST /execute-plan?cwd=<abs>` | Implement from a plan file |
+| `POST /retry?cwd=<abs>` | Re-run specific tasks from a previous batch |
+| `POST /investigate?cwd=<abs>` | Codebase Q&A — structured answer with file:line citations + confidence |
+| `POST /research?cwd=<abs>` | External multi-source research — arxiv, semantic_scholar, github_search, brave-with-`site:`-filters — for a focused question |
+| `POST /journal-record?cwd=<abs>` | Record one learning into the project's cross-agent journal graph (`.mmagent/journal/`) — create / refine / supersede / merge |
+| `POST /journal-recall?cwd=<abs>` | Recall relevant prior learnings from the journal graph for a question or situation |
+| `GET /batch/:id[?taskIndex=N]` | Poll a batch: `202 text/plain` (pending) or `200 application/json` (terminal). `?taskIndex=N` slices on complete state |
+| `POST /context-blocks?cwd=<abs>` | Register a reusable context block |
+| `DELETE /context-blocks/:id?cwd=<abs>` | Delete a context block |
+| `POST /control/batch-slice` | Slice an in-flight batch — return a subset of its tasks by index |
+| `GET /health` | Liveness probe (unauthenticated, loopback-only) |
+| `GET /status` | Server status (authenticated, loopback-only) |
+All tool endpoints require bearer auth: `Authorization: Bearer <token>`.
+## Operator commands
+```bash
+mmagent serve [--verbose] [--log]                # start daemon
+mmagent info  [--json]                           # cliVersion, bind/port, token fingerprint, daemon identity
+mmagent status [--json]                          # health + stats from a running daemon
+mmagent logs  [--follow] [--batch=<id>]          # tail today's diagnostic log
+mmagent print-token                              # print the current auth token
+mmagent sync-skills [--target=<client>] [--all-targets] [--dry-run] [--json]   # idempotent install + update + reconcile
+mmagent disable [--target=<client>] [--all-targets] [--dry-run] [--json]       # remove skills + pin off (survives upgrades)
+mmagent enable  [--target=<client>] [--all-targets] [--dry-run] [--json]       # clear the pin + reinstall skills
+mmagent telemetry status                         # show consent state + source
+mmagent telemetry enable                         # opt in
+mmagent telemetry disable                        # opt out + delete local queue
+mmagent telemetry reset-id                       # rotate the local Ed25519 identity
+mmagent telemetry dump-queue                     # print the locally-queued events as JSON
+```
+## Architecture
+`mmagent serve` runs a loopback HTTP server. Each tool call dispatches to a labor agent (standard or complex), runs a cross-agent review cycle, and returns a structured report. Tasks run in parallel; each has a wall-clock timeout.
+Full design rationale: [DIRECTION.md](https://github.com/zhixuan312/multi-model-agent/blob/master/DIRECTION.md). Layer map and request lifecycle: [docs/ARCHITECTURE.md](https://github.com/zhixuan312/multi-model-agent/blob/master/docs/ARCHITECTURE.md).
+## Troubleshooting
+| Symptom | Fix |
+|---|---|
+| Port 7337 already in use | `lsof -nP -i :7337` → kill the stale process |
+| Daemon stale after upgrade | `pkill -f "mmagent serve"`; the skill preflight respawns it on next client session |
+| Skill version mismatch | `mmagent sync-skills` and restart your client |
+| `401 unauthorized` from a skill | `export MMAGENT_AUTH_TOKEN=$(mmagent print-token)` |
+| `pkill` reports success but `mmagent info` still shows the old PID | The pattern didn't match — try `kill <pid-from-mmagent-info>` directly |
+| TLS `handshake_failure` to a known-good telemetry endpoint | Local DNS cache is stale. `sudo dscacheutil -flushcache && sudo killall -HUP mDNSResponder` (macOS); restart the daemon so its process re-resolves |
+| Local telemetry queue stops draining | Daemon's flusher is in exponential backoff after a transport failure (capped at 1 hr). Restart the daemon to force an immediate boot-flush |
+## What's new in 5.0
+- **Runtime migrated to Bun**, and this package now ships as **standalone per-platform binaries** with Bun embedded: `npm i -g @zhixuan92/multi-model-agent` resolves a native binary that needs neither Node nor Bun to run (npm uses Node ≥18 only for the install shim). Behavior is identical to 4.x.
+Full history: [CHANGELOG](https://github.com/zhixuan312/multi-model-agent/blob/master/CHANGELOG.md).
+## Full documentation
+→ **[github.com/zhixuan312/multi-model-agent](https://github.com/zhixuan312/multi-model-agent)**
+## License
+[MIT](./LICENSE) — Copyright (c) 2026 Zhang Zhixuan

package/package.json CHANGED Viewed

@@ -1,6 +1,23 @@
 {
   "name": "@zhixuan92/multi-model-agent",
-  "version": "5.0.0",
+  "version": "5.0.2",
+  "description": "Standalone HTTP server for multi-model-agent. Routes tool-invocation work to Claude, Codex, or OpenAI-compatible sub-agents with async-polling REST dispatch and installable skills for Claude Code, Gemini CLI, Codex CLI, and Cursor.",
+  "keywords": [
+    "llm",
+    "claude",
+    "codex",
+    "openai",
+    "agent",
+    "multi-model",
+    "delegation",
+    "http-server"
+  ],
+  "license": "MIT",
+  "homepage": "https://github.com/zhixuan312/multi-model-agent#readme",
+  "repository": {
+    "type": "git",
+    "url": "git+https://github.com/zhixuan312/multi-model-agent.git"
+  },
   "bin": {
     "mmagent": "bin/mmagent.mjs",
     "multi-model-agent": "bin/mmagent.mjs"
@@ -9,18 +26,20 @@
     "postinstall": "node postinstall.mjs"
   },
   "optionalDependencies": {
-    "@zhixuan92/mmagent-darwin-arm64": "5.0.0",
-    "@zhixuan92/mmagent-darwin-x64": "5.0.0",
-    "@zhixuan92/mmagent-linux-x64": "5.0.0",
-    "@zhixuan92/mmagent-linux-arm64": "5.0.0",
-    "@zhixuan92/mmagent-linux-x64-musl": "5.0.0",
-    "@zhixuan92/mmagent-linux-arm64-musl": "5.0.0",
-    "@zhixuan92/mmagent-windows-x64": "5.0.0",
-    "@zhixuan92/mmagent-windows-arm64": "5.0.0"
+    "@zhixuan92/mmagent-darwin-arm64": "5.0.2",
+    "@zhixuan92/mmagent-darwin-x64": "5.0.2",
+    "@zhixuan92/mmagent-linux-x64": "5.0.2",
+    "@zhixuan92/mmagent-linux-arm64": "5.0.2",
+    "@zhixuan92/mmagent-linux-x64-musl": "5.0.2",
+    "@zhixuan92/mmagent-linux-arm64-musl": "5.0.2",
+    "@zhixuan92/mmagent-windows-x64": "5.0.2",
+    "@zhixuan92/mmagent-windows-arm64": "5.0.2"
   },
   "files": [
     "bin",
-    "postinstall.mjs"
+    "postinstall.mjs",
+    "README.md",
+    "LICENSE"
   ],
   "engines": {
     "node": ">=18"