@zhixuan92/multi-model-agent 4.0.3 → 4.0.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -90,7 +90,7 @@ Two ways — pick one:
90
90
 
91
91
  ```bash
92
92
  mmagent serve # 127.0.0.1:7337 by default
93
- curl -s http://localhost:7337/health # → {"ok":true,"version":"4.0.3",...}
93
+ curl -s http://localhost:7337/health # → {"ok":true,"version":"4.0.4",...}
94
94
  ```
95
95
 
96
96
  For an always-on background install (survives reboots): [launchd / systemd templates](./scripts/README.md).
@@ -290,20 +290,17 @@ Full design rationale: [DIRECTION.md](https://github.com/zhixuan312/multi-model-
290
290
  | TLS `handshake_failure` to a known-good telemetry endpoint | Local DNS cache is stale. `sudo dscacheutil -flushcache && sudo killall -HUP mDNSResponder` (macOS); restart the daemon so its Node process re-resolves |
291
291
  | Local telemetry queue stops draining | Daemon's flusher is in exponential backoff after a transport failure (capped at 1 hr). Restart the daemon to force an immediate boot-flush |
292
292
 
293
- ## What's new in 4.0.3
293
+ ## What's new in 4.0.4
294
294
 
295
- - **Required headers `X-MMA-Main-Model` + `X-MMA-Client` on every tool route.** Server returns `400 main_model_required` / `400 client_required` if missing. Replaces `defaults.parentModel` config (removed) and `PARENT_MODEL_NAME` env. Shipped skills set both headers automatically from `MMAGENT_MAIN_MODEL` + `MMAGENT_CLIENT` env vars.
296
- - **Wire field rename:** `parentModel*` `mainModel*` (matches DB column `main_model`). `costDeltaVsParentUSD` `costDeltaVsMainUSD`. Internal record now matches wire 1:1.
297
- - **Canonical model-name preservation.** `claude-opus-4-7` no longer collapses to `claude-opus`. Best-effort extraction handles arbitrary wrappers (`bedrock.claude-opus-4-7`, `vertex_ai/anthropic.claude-sonnet-4-6@2024-10-22`).
298
- - **`contextBlockIds` actually reach the worker prompt.** Round-over-round audit recipes were broken pre-4.0.3 the dispatcher dispatched the unexpanded task. Now the same expanded reference flows through both `state.task` and `executionContext.task`.
299
- - **File-backed context blocks survive daemon restarts.** Stored at `<projectCwd>/.mma/context-blocks/<id>.txt` with atomic writes, mode `0700`/`0600`, 7-day TTL, 1 MiB / 100 MiB caps. `.mma/` belongs in `.gitignore` (daemon prints a stderr breadcrumb on first creation).
300
- - **`totalDurationMs` reflects real wall-clock** (was implementer-only). Per-stage durations stay truthful drops the proportional scale-down that was masking the under-counting.
301
- - **Audit/review/delegate headlines** fall back to `runResult.annotatedFindings` and `runResult.filesWritten` when the structured report lacks them. Pre-fix headlines reported `0 findings (0 high)` and `(0 files)` for narrative-emitting tools.
302
- - **`batch_failed` fires when the executor packages an error envelope.** Operator visibility — verbose stream no longer says `batch_completed` while the run actually failed.
303
- - **`run_shell` write tracking.** Workers writing via `cat >`, `sed -i`, `tee`, etc. correctly increment the polling headline's write count.
304
- - **Stage-progression denominator derives from the StagePlan.** Audit `(1/3)`, delegate `(1/9)`, register-context-block `(1/1)`. Single source of truth.
295
+ - **Reviewers see the actual diff.** New `DiffTracker` (snapshot-based, works in non-git directories) gives spec / quality / diff reviewers the cumulative unified diff against the pre-task baseline. Pre-fix the reviewer judged the worker's text claim alone, defaulted to `changes_required`, and triggered rework spirals on already-correct work. Verdicts must now point to specific diff lines.
296
+ - **Coherent prompts via shared rubric.** `finding-criteria.ts` is the single source of truth for severity ladder, evidence-grounding, scope discipline, and stage awareness. Read-only tools share `ANNOTATOR_CHECK_AWARENESS_RO`; artifact-producing tools share `REVIEWER_AWARENESS_AP`. Workers self-align with what the reviewer will judge → cleaner first-round outputs.
297
+ - **Lenient JSON parsers** in both reviewer and annotator output paths. Accepts ` ```json ` fenced, ` ``` ` (no language tag), bare JSON, and embedded objects/arrays. Caused `verdict: 'error'` and `findings_low: 0` regressions despite valid model output.
298
+ - **Cumulative `filesWritten` across rework rounds.** Pre-fix the implementer's writes were wiped when a no-op rework round overwrote `lastRunResult`. Now unioned across rounds.
299
+ - **Headlines unified across all tools** `[<status>] <route>: <summary>`. `execute-plan` (was `execute_plan` snake-case), `retry: N/N tasks complete` now reflects per-task status, debug carries file path + finding count.
300
+ - **Implementer system prompt + per-tool prompts hardened.** "Trust `edit_file`/`write_file` — do NOT re-read just to verify your own successful edit" saves 4-6 min per artifact task. Read-only prompts include severity calibration, evidence-grounding, scope discipline, stage awareness.
301
+ - **Per-tool fixes:** investigate prompt aligned with parser, verify `findings_low` correctly populated, debug rewritten as proper read-only (PROPOSE do NOT apply), spec / quality concerns accumulated across rounds.
305
302
 
306
- **Migration from 4.0.2:** custom HTTP callers must add `X-MMA-Main-Model` and `X-MMA-Client`. Skills users get this for free after `mmagent sync-skills`.
303
+ **Migration from 4.0.3:** none. Wire envelope, schema fields, and route names are unchanged. `npm update` to take the bug fixes.
307
304
 
308
305
  Full history: [CHANGELOG](https://github.com/zhixuan312/multi-model-agent/blob/master/CHANGELOG.md).
309
306
 
@@ -8,7 +8,7 @@ when_to_use: >-
8
8
  User asks for a doc/spec/config audit OR a methodology skill
9
9
  (superpowers:dispatching-parallel-agents, /security-review) points at one AND
10
10
  mmagent is running. Audit on PROSE/SPEC docs — use mma-review for source code.
11
- version: 4.0.3
11
+ version: 4.0.4
12
12
  ---
13
13
 
14
14
  # mma-audit
@@ -12,7 +12,7 @@ when_to_use: >-
12
12
  Register once here, then pass the ID via `contextBlockIds` on mma-delegate /
13
13
  mma-execute-plan / mma-audit / mma-review / mma-verify / mma-debug /
14
14
  mma-investigate. Cheaper and faster than inlining the same content N times.
15
- version: 4.0.3
15
+ version: 4.0.4
16
16
  ---
17
17
 
18
18
  # mma-context-blocks
@@ -10,7 +10,7 @@ when_to_use: >-
10
10
  read files, reproduce, trace — OR a methodology skill
11
11
  (superpowers:systematic-debugging) points at the investigation step. Delegate
12
12
  the read/reproduce/trace; the main agent stays on the hypothesis and the fix.
13
- version: 4.0.3
13
+ version: 4.0.4
14
14
  ---
15
15
 
16
16
  # mma-debug
@@ -11,7 +11,7 @@ when_to_use: >-
11
11
  and keep main context free. If a plan file exists → use mma-execute-plan. If
12
12
  the task is audit / review / verify / debug / investigate → use the matching
13
13
  specialized skill.
14
- version: 4.0.3
14
+ version: 4.0.4
15
15
  ---
16
16
 
17
17
  # mma-delegate
@@ -10,7 +10,7 @@ when_to_use: >-
10
10
  superpowers:subagent-driven-development / superpowers:executing-plans —
11
11
  workers are cheaper and don't pollute main context. Task descriptors must
12
12
  match plan headings verbatim.
13
- version: 4.0.3
13
+ version: 4.0.4
14
14
  ---
15
15
 
16
16
  # mma-execute-plan
@@ -11,7 +11,7 @@ when_to_use: >-
11
11
  Delegating the read+grep+web-search to a worker keeps your main context on
12
12
  judgment. DO NOT use for convergent single-answer questions (where is X
13
13
  called, how does Y work) — those are mma-investigate.
14
- version: 4.0.3
14
+ version: 4.0.4
15
15
  ---
16
16
 
17
17
  # mma-explore
@@ -12,7 +12,7 @@ when_to_use: >-
12
12
  git-history queries. OR you are about to read 3+ files / run any grep in main
13
13
  context — that's the inline-labor-leakage anti-pattern (AP2); delegate to this
14
14
  skill instead.
15
- version: 4.0.3
15
+ version: 4.0.4
16
16
  ---
17
17
 
18
18
  # mma-investigate
@@ -10,7 +10,7 @@ when_to_use: >-
10
10
  you want to re-try the failed indices only. Prefer this over re-dispatching
11
11
  the whole batch or inline-retrying — it's idempotent and preserves the
12
12
  original batch's diagnostics.
13
- version: 4.0.3
13
+ version: 4.0.4
14
14
  ---
15
15
 
16
16
  # mma-retry
@@ -10,7 +10,7 @@ when_to_use: >-
10
10
  AND mmagent is running. Delegate so each file reviews on its own worker; the
11
11
  main agent only decides what to merge. Review on SOURCE CODE — use mma-audit
12
12
  for prose specs / configs.
13
- version: 4.0.3
13
+ version: 4.0.4
14
14
  ---
15
15
 
16
16
  # mma-review
@@ -10,7 +10,7 @@ when_to_use: >-
10
10
  against implemented work BEFORE claiming success. Delegate so each checklist
11
11
  item gets independent evidence-gathering on a worker. Use this BEFORE saying
12
12
  "done" — never after.
13
- version: 4.0.3
13
+ version: 4.0.4
14
14
  ---
15
15
 
16
16
  # mma-verify
@@ -11,7 +11,7 @@ when_to_use: >-
11
11
  tasks — AND mmagent is running. Read this once, pick the matching mma-* skill,
12
12
  and delegate there. Applies equally whether the user invoked a superpowers
13
13
  methodology skill or asked directly.
14
- version: 4.0.3
14
+ version: 4.0.4
15
15
  ---
16
16
 
17
17
  # multi-model-agent (router)
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@zhixuan92/multi-model-agent",
3
- "version": "4.0.3",
3
+ "version": "4.0.4",
4
4
  "type": "module",
5
5
  "license": "MIT",
6
6
  "description": "Standalone HTTP server for multi-model-agent. Routes tool-invocation work to Claude, Codex, or OpenAI-compatible sub-agents with async-polling REST dispatch and installable skills for Claude Code, Gemini CLI, Codex CLI, and Cursor.",
@@ -53,7 +53,7 @@
53
53
  },
54
54
  "dependencies": {
55
55
  "@asteasolutions/zod-to-openapi": "^8.5.0",
56
- "@zhixuan92/multi-model-agent-core": "^4.0.3",
56
+ "@zhixuan92/multi-model-agent-core": "^4.0.4",
57
57
  "gray-matter": "^4.0.3",
58
58
  "minimist": "^1.2.8",
59
59
  "proper-lockfile": "^4.1.2",