@zhixuan92/multi-model-agent 4.0.3 → 4.0.4
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +10 -13
- package/dist/skills/mma-audit/SKILL.md +1 -1
- package/dist/skills/mma-context-blocks/SKILL.md +1 -1
- package/dist/skills/mma-debug/SKILL.md +1 -1
- package/dist/skills/mma-delegate/SKILL.md +1 -1
- package/dist/skills/mma-execute-plan/SKILL.md +1 -1
- package/dist/skills/mma-explore/SKILL.md +1 -1
- package/dist/skills/mma-investigate/SKILL.md +1 -1
- package/dist/skills/mma-retry/SKILL.md +1 -1
- package/dist/skills/mma-review/SKILL.md +1 -1
- package/dist/skills/mma-verify/SKILL.md +1 -1
- package/dist/skills/multi-model-agent/SKILL.md +1 -1
- package/package.json +2 -2
package/README.md
CHANGED
|
@@ -90,7 +90,7 @@ Two ways — pick one:
|
|
|
90
90
|
|
|
91
91
|
```bash
|
|
92
92
|
mmagent serve # 127.0.0.1:7337 by default
|
|
93
|
-
curl -s http://localhost:7337/health # → {"ok":true,"version":"4.0.
|
|
93
|
+
curl -s http://localhost:7337/health # → {"ok":true,"version":"4.0.4",...}
|
|
94
94
|
```
|
|
95
95
|
|
|
96
96
|
For an always-on background install (survives reboots): [launchd / systemd templates](./scripts/README.md).
|
|
@@ -290,20 +290,17 @@ Full design rationale: [DIRECTION.md](https://github.com/zhixuan312/multi-model-
|
|
|
290
290
|
| TLS `handshake_failure` to a known-good telemetry endpoint | Local DNS cache is stale. `sudo dscacheutil -flushcache && sudo killall -HUP mDNSResponder` (macOS); restart the daemon so its Node process re-resolves |
|
|
291
291
|
| Local telemetry queue stops draining | Daemon's flusher is in exponential backoff after a transport failure (capped at 1 hr). Restart the daemon to force an immediate boot-flush |
|
|
292
292
|
|
|
293
|
-
## What's new in 4.0.
|
|
293
|
+
## What's new in 4.0.4
|
|
294
294
|
|
|
295
|
-
- **
|
|
296
|
-
- **
|
|
297
|
-
- **
|
|
298
|
-
-
|
|
299
|
-
- **
|
|
300
|
-
-
|
|
301
|
-
- **
|
|
302
|
-
- **`batch_failed` fires when the executor packages an error envelope.** Operator visibility — verbose stream no longer says `batch_completed` while the run actually failed.
|
|
303
|
-
- **`run_shell` write tracking.** Workers writing via `cat >`, `sed -i`, `tee`, etc. correctly increment the polling headline's write count.
|
|
304
|
-
- **Stage-progression denominator derives from the StagePlan.** Audit `(1/3)`, delegate `(1/9)`, register-context-block `(1/1)`. Single source of truth.
|
|
295
|
+
- **Reviewers see the actual diff.** New `DiffTracker` (snapshot-based, works in non-git directories) gives spec / quality / diff reviewers the cumulative unified diff against the pre-task baseline. Pre-fix the reviewer judged the worker's text claim alone, defaulted to `changes_required`, and triggered rework spirals on already-correct work. Verdicts must now point to specific diff lines.
|
|
296
|
+
- **Coherent prompts via shared rubric.** `finding-criteria.ts` is the single source of truth for severity ladder, evidence-grounding, scope discipline, and stage awareness. Read-only tools share `ANNOTATOR_CHECK_AWARENESS_RO`; artifact-producing tools share `REVIEWER_AWARENESS_AP`. Workers self-align with what the reviewer will judge → cleaner first-round outputs.
|
|
297
|
+
- **Lenient JSON parsers** in both reviewer and annotator output paths. Accepts ` ```json ` fenced, ` ``` ` (no language tag), bare JSON, and embedded objects/arrays. Caused `verdict: 'error'` and `findings_low: 0` regressions despite valid model output.
|
|
298
|
+
- **Cumulative `filesWritten` across rework rounds.** Pre-fix the implementer's writes were wiped when a no-op rework round overwrote `lastRunResult`. Now unioned across rounds.
|
|
299
|
+
- **Headlines unified across all tools** — `[<status>] <route>: <summary>`. `execute-plan` (was `execute_plan` snake-case), `retry: N/N tasks complete` now reflects per-task status, debug carries file path + finding count.
|
|
300
|
+
- **Implementer system prompt + per-tool prompts hardened.** "Trust `edit_file`/`write_file` — do NOT re-read just to verify your own successful edit" saves 4-6 min per artifact task. Read-only prompts include severity calibration, evidence-grounding, scope discipline, stage awareness.
|
|
301
|
+
- **Per-tool fixes:** investigate prompt aligned with parser, verify `findings_low` correctly populated, debug rewritten as proper read-only (PROPOSE — do NOT apply), spec / quality concerns accumulated across rounds.
|
|
305
302
|
|
|
306
|
-
**Migration from 4.0.
|
|
303
|
+
**Migration from 4.0.3:** none. Wire envelope, schema fields, and route names are unchanged. `npm update` to take the bug fixes.
|
|
307
304
|
|
|
308
305
|
Full history: [CHANGELOG](https://github.com/zhixuan312/multi-model-agent/blob/master/CHANGELOG.md).
|
|
309
306
|
|
|
@@ -8,7 +8,7 @@ when_to_use: >-
|
|
|
8
8
|
User asks for a doc/spec/config audit OR a methodology skill
|
|
9
9
|
(superpowers:dispatching-parallel-agents, /security-review) points at one AND
|
|
10
10
|
mmagent is running. Audit on PROSE/SPEC docs — use mma-review for source code.
|
|
11
|
-
version: 4.0.
|
|
11
|
+
version: 4.0.4
|
|
12
12
|
---
|
|
13
13
|
|
|
14
14
|
# mma-audit
|
|
@@ -12,7 +12,7 @@ when_to_use: >-
|
|
|
12
12
|
Register once here, then pass the ID via `contextBlockIds` on mma-delegate /
|
|
13
13
|
mma-execute-plan / mma-audit / mma-review / mma-verify / mma-debug /
|
|
14
14
|
mma-investigate. Cheaper and faster than inlining the same content N times.
|
|
15
|
-
version: 4.0.
|
|
15
|
+
version: 4.0.4
|
|
16
16
|
---
|
|
17
17
|
|
|
18
18
|
# mma-context-blocks
|
|
@@ -10,7 +10,7 @@ when_to_use: >-
|
|
|
10
10
|
read files, reproduce, trace — OR a methodology skill
|
|
11
11
|
(superpowers:systematic-debugging) points at the investigation step. Delegate
|
|
12
12
|
the read/reproduce/trace; the main agent stays on the hypothesis and the fix.
|
|
13
|
-
version: 4.0.
|
|
13
|
+
version: 4.0.4
|
|
14
14
|
---
|
|
15
15
|
|
|
16
16
|
# mma-debug
|
|
@@ -11,7 +11,7 @@ when_to_use: >-
|
|
|
11
11
|
and keep main context free. If a plan file exists → use mma-execute-plan. If
|
|
12
12
|
the task is audit / review / verify / debug / investigate → use the matching
|
|
13
13
|
specialized skill.
|
|
14
|
-
version: 4.0.
|
|
14
|
+
version: 4.0.4
|
|
15
15
|
---
|
|
16
16
|
|
|
17
17
|
# mma-delegate
|
|
@@ -10,7 +10,7 @@ when_to_use: >-
|
|
|
10
10
|
superpowers:subagent-driven-development / superpowers:executing-plans —
|
|
11
11
|
workers are cheaper and don't pollute main context. Task descriptors must
|
|
12
12
|
match plan headings verbatim.
|
|
13
|
-
version: 4.0.
|
|
13
|
+
version: 4.0.4
|
|
14
14
|
---
|
|
15
15
|
|
|
16
16
|
# mma-execute-plan
|
|
@@ -11,7 +11,7 @@ when_to_use: >-
|
|
|
11
11
|
Delegating the read+grep+web-search to a worker keeps your main context on
|
|
12
12
|
judgment. DO NOT use for convergent single-answer questions (where is X
|
|
13
13
|
called, how does Y work) — those are mma-investigate.
|
|
14
|
-
version: 4.0.
|
|
14
|
+
version: 4.0.4
|
|
15
15
|
---
|
|
16
16
|
|
|
17
17
|
# mma-explore
|
|
@@ -12,7 +12,7 @@ when_to_use: >-
|
|
|
12
12
|
git-history queries. OR you are about to read 3+ files / run any grep in main
|
|
13
13
|
context — that's the inline-labor-leakage anti-pattern (AP2); delegate to this
|
|
14
14
|
skill instead.
|
|
15
|
-
version: 4.0.
|
|
15
|
+
version: 4.0.4
|
|
16
16
|
---
|
|
17
17
|
|
|
18
18
|
# mma-investigate
|
|
@@ -10,7 +10,7 @@ when_to_use: >-
|
|
|
10
10
|
you want to re-try the failed indices only. Prefer this over re-dispatching
|
|
11
11
|
the whole batch or inline-retrying — it's idempotent and preserves the
|
|
12
12
|
original batch's diagnostics.
|
|
13
|
-
version: 4.0.
|
|
13
|
+
version: 4.0.4
|
|
14
14
|
---
|
|
15
15
|
|
|
16
16
|
# mma-retry
|
|
@@ -10,7 +10,7 @@ when_to_use: >-
|
|
|
10
10
|
AND mmagent is running. Delegate so each file reviews on its own worker; the
|
|
11
11
|
main agent only decides what to merge. Review on SOURCE CODE — use mma-audit
|
|
12
12
|
for prose specs / configs.
|
|
13
|
-
version: 4.0.
|
|
13
|
+
version: 4.0.4
|
|
14
14
|
---
|
|
15
15
|
|
|
16
16
|
# mma-review
|
|
@@ -10,7 +10,7 @@ when_to_use: >-
|
|
|
10
10
|
against implemented work BEFORE claiming success. Delegate so each checklist
|
|
11
11
|
item gets independent evidence-gathering on a worker. Use this BEFORE saying
|
|
12
12
|
"done" — never after.
|
|
13
|
-
version: 4.0.
|
|
13
|
+
version: 4.0.4
|
|
14
14
|
---
|
|
15
15
|
|
|
16
16
|
# mma-verify
|
|
@@ -11,7 +11,7 @@ when_to_use: >-
|
|
|
11
11
|
tasks — AND mmagent is running. Read this once, pick the matching mma-* skill,
|
|
12
12
|
and delegate there. Applies equally whether the user invoked a superpowers
|
|
13
13
|
methodology skill or asked directly.
|
|
14
|
-
version: 4.0.
|
|
14
|
+
version: 4.0.4
|
|
15
15
|
---
|
|
16
16
|
|
|
17
17
|
# multi-model-agent (router)
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "@zhixuan92/multi-model-agent",
|
|
3
|
-
"version": "4.0.
|
|
3
|
+
"version": "4.0.4",
|
|
4
4
|
"type": "module",
|
|
5
5
|
"license": "MIT",
|
|
6
6
|
"description": "Standalone HTTP server for multi-model-agent. Routes tool-invocation work to Claude, Codex, or OpenAI-compatible sub-agents with async-polling REST dispatch and installable skills for Claude Code, Gemini CLI, Codex CLI, and Cursor.",
|
|
@@ -53,7 +53,7 @@
|
|
|
53
53
|
},
|
|
54
54
|
"dependencies": {
|
|
55
55
|
"@asteasolutions/zod-to-openapi": "^8.5.0",
|
|
56
|
-
"@zhixuan92/multi-model-agent-core": "^4.0.
|
|
56
|
+
"@zhixuan92/multi-model-agent-core": "^4.0.4",
|
|
57
57
|
"gray-matter": "^4.0.3",
|
|
58
58
|
"minimist": "^1.2.8",
|
|
59
59
|
"proper-lockfile": "^4.1.2",
|