@zhixuan92/multi-model-agent 4.4.0 → 4.5.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -88,7 +88,7 @@ Two ways — pick one:
88
88
 
89
89
  ```bash
90
90
  mmagent serve # 127.0.0.1:7337 by default
91
- curl -s http://localhost:7337/health # → {"ok":true,"version":"4.4.0",...}
91
+ curl -s http://localhost:7337/health # → {"ok":true,"version":"4.5.1",...}
92
92
  ```
93
93
 
94
94
  For an always-on background install (survives reboots): [launchd / systemd templates](./scripts/README.md).
@@ -287,14 +287,12 @@ Full design rationale: [DIRECTION.md](https://github.com/zhixuan312/multi-model-
287
287
  | TLS `handshake_failure` to a known-good telemetry endpoint | Local DNS cache is stale. `sudo dscacheutil -flushcache && sudo killall -HUP mDNSResponder` (macOS); restart the daemon so its Node process re-resolves |
288
288
  | Local telemetry queue stops draining | Daemon's flusher is in exponential backoff after a transport failure (capped at 1 hr). Restart the daemon to force an immediate boot-flush |
289
289
 
290
- ## What's new in 4.4.0
290
+ ## What's new in 4.5.1
291
291
 
292
- - **Session-based provider boundary.** `Provider.openSession() Session.send() TurnResult` replaces the 1,559-line `RunnerShell` chain. All providers (claude-agent-sdk, codex CLI, `@openai/agents`) now expose only `openSession`; cost / termination-reason mapping cannot diverge.
293
- - **Five-stage lifecycle.** Stages collapse to `implementing review rework annotating committing`. Single `HUMAN_LABEL` source of truth for headline labels.
294
- - **Codex token accounting fixed (≈4× cost over-report on cached turns).** OpenAI/codex emit `input_tokens` as GROSS including `cached_input_tokens`; Anthropic emits NET. The codex adapter now subtracts the cached subset before pricing.
295
- - **`X-MMA-Main-Model` required again.** Auto-detect was reading JSONL files written by our own claude-tier workers and returning the worker's model as "main". The calling client is the only reliable source. Returns `400 main_model_required` if missing.
296
- - **LLM `verify` tool removed.** Verification is `verifyCommand` only (deterministic shell command run after the worker).
297
- - **Telemetry clamp ceilings raised** for 2026-era usage: per-stage input/cached `5M → 100M`, output `500K → 2M`, per-stage cost `$100 → $500`, per-task cost `$800 → $5000`.
292
+ - **Windows codex spawn fix.** `codex-cli-session` routes through `cross-spawn` so Node can resolve `codex.cmd` / `.bat` / `.ps1` shims on Windows without falling back to `shell: true` (which would mangle the `-c model_providers.X={…}` argument block on `cmd.exe`). Linux/macOS is a no-op passthrough. Fixes `spawn codex ENOENT` on Windows 4.5.0 daemons.
293
+ - **`mma-audit` plan subtype: 9 → 12 perspectives.** New SPEC COVERAGE (reads upstream spec from a registered context block), PLACEHOLDER LANGUAGE, and PLAN SKELETON perspectives. Grouped as EXTERNAL CODEBASE COHERENCE (1–8), INTRA-PLAN STRUCTURE (9, 11, 12), and SPEC ALIGNMENT (10).
294
+ - **`mma-audit` spec subtype: 7 9 criteria.** New PLACEHOLDER-SCAN and DESIGN-DECOMPOSITION-PRESENT criteria; existing SCOPE-EXPLICITNESS extended to flag multi-subsystem specs needing decomposition.
295
+ - **Recipe F (Spec-then-plan-then-execute) updated** in `mma-audit` SKILL.md register the spec via `mma-context-blocks` between writing-plans and the plan-audit so perspective 10 fires. No schema or wire-shape changes.
298
296
 
299
297
  Full history: [CHANGELOG](https://github.com/zhixuan312/multi-model-agent/blob/master/CHANGELOG.md).
300
298
 
@@ -12,7 +12,7 @@ when_to_use: >-
12
12
  (superpowers:dispatching-parallel-agents, /security-review) points at one AND
13
13
  mmagent is running. Audit on PROSE/SPEC docs — use mma-review for source code.
14
14
  Audit a CODE-EXECUTION PLAN against the codebase — use subtype=plan.
15
- version: 4.4.0
15
+ version: 4.5.1
16
16
  ---
17
17
 
18
18
  # mma-audit
@@ -27,7 +27,7 @@ version: 4.4.0
27
27
  |---|---|---|
28
28
  | A general prose artifact (design doc, recommendation, post-mortem, README) | `subtype: 'default'` | Comprehensive prose-coherence — would a literal-following worker produce the right outcome from this prose alone? Catches ambiguity, contradictions, missing branches, drift, scope-creep. **Does NOT verify against any codebase.** |
29
29
  | A **code-execution PLAN** (`docs/superpowers/plans/*.md` or similar) before running it via `mma-execute-plan` | `subtype: 'plan'` | Plan-vs-codebase coherence — for every method / type / file path / signature / import / verify command the plan names, the codebase actually contains it as described. Catches the bug class the prose-coherence audit cannot see (e.g. plan says `registerBlock` but actual interface is `register`). |
30
- | A **requirement spec** (what we want, why; success criteria) | `subtype: 'spec'` | Requirement-prose executability every requirement testable, scope explicit, acceptance criteria covered, non-functional requirements captured, decision-trace exposed, conflicts surfaced. |
30
+ | A **requirement spec** (what we want, why; success criteria) | `subtype: 'spec'` | Requirement-prose executability across 9 criteria — testability, scope explicitness AND decomposability, acceptance-criteria coverage, non-functional capture, requirement conflicts, decision-trace, assumption exposure, placeholder scan, and design-decomposition presence (architecture / components / data flow / error handling / testing). |
31
31
  | A **SKILL.md** for an `mma-*` skill or comparable agent-facing playbook | `subtype: 'skill'` | Skill-file reader-effectiveness — when-to-use specificity, endpoint contract integrity, example correctness, anti-pattern coverage, link integrity. |
32
32
 
33
33
  If you want to bias workers toward a narrow lens (security only, performance only, accessibility only), put that in the free-text `background` portion of the prompt — `subtype` is criteria machinery, not a lens selector.
@@ -75,7 +75,7 @@ Either `document` or `filePaths` (or both) must be provided.
75
75
  |---|---|
76
76
  | `default` (or omit the field) | **General prose — design doc, recommendation, post-mortem, README, brief.** Comprehensive prose-coherence audit. Does NOT verify against any codebase. |
77
77
  | `plan` | **Code-execution plans being audited against a real codebase.** Single-file input (the plan markdown). Workers grep / read source files under `cwd` to verify every named symbol / path / signature / import / verify command. Use this BEFORE every `mma-execute-plan` dispatch. |
78
- | `spec` | **Requirement spec / brainstorming-output / what-we-want prose.** Criteria target testability, scope explicitness, acceptance-criteria coverage, decision-trace, assumption exposure. |
78
+ | `spec` | **Requirement spec / brainstorming-output / what-we-want prose.** 9 criteria target testability, scope explicitness + decomposability, acceptance-criteria coverage, non-functional capture, requirement conflicts, decision-trace, assumption exposure, placeholder scan, and design-decomposition presence. |
79
79
  | `skill` | **`SKILL.md` or comparable agent-facing playbook.** Criteria target when-to-use specificity, endpoint contract integrity, example correctness, anti-pattern coverage, link integrity. |
80
80
 
81
81
  You can run BOTH on a plan: first `spec` or `default` (prose quality), then `plan` (does the plan match the codebase?). They cover orthogonal failure modes.
@@ -88,7 +88,8 @@ When `subtype: 'plan'`:
88
88
 
89
89
  - `filePaths` MUST contain exactly **one entry** — the plan markdown. Sending zero or 2+ entries → `400 invalid_request` with the message: *"Plan audit takes exactly one filePath (the plan markdown). The worker discovers and verifies source files itself via its tool surface — do not pre-list source files."*
90
90
  - `document` (inline content) is not used in plan mode — the plan must be on disk so workers can reference it by `?cwd=`-relative path.
91
- - The worker runs the sequential-criteria loop with the plan-audit criteria set: PATH EXISTENCE, SYMBOL EXISTENCE, SIGNATURE MATCH, IMPORT GRAPH, TEST HARNESS AVAILABILITY, STEP SEQUENCE WITHIN TASK, CROSS-TASK DEPENDENCIES, VERIFICATION COMMAND VALIDITY.
91
+ - The worker runs the sequential-criteria loop with the plan-audit criteria set across 12 perspectives in three groups: **EXTERNAL CODEBASE COHERENCE** (1 PATH EXISTENCE, 2 SYMBOL EXISTENCE, 3 SIGNATURE MATCH, 4 IMPORT GRAPH, 5 TEST HARNESS AVAILABILITY, 6 STEP SEQUENCE WITHIN TASK, 7 CROSS-TASK DEPENDENCIES, 8 VERIFICATION COMMAND VALIDITY), **INTRA-PLAN STRUCTURE** (9 TASK GRANULARITY, 11 PLACEHOLDER LANGUAGE, 12 PLAN SKELETON), and **SPEC ALIGNMENT** (10 SPEC COVERAGE).
92
+ - To enable perspective 10 (SPEC COVERAGE), register the upstream spec as a context block via `mma-context-blocks` and pass its `blockId` in `contextBlockIds`. Without a spec in context, perspective 10 emits "No findings for this criterion." and the other 11 still run.
92
93
  - Read the findings list. Fix the plan and re-audit if any `critical` or `high` plan-audit findings remain.
93
94
 
94
95
  ## Full example
@@ -192,7 +193,7 @@ This skill is one step in the larger flow described in `multi-model-agent` → "
192
193
 
193
194
  - **Recipe E — Plan-validate-execute.** Before any `mma-execute-plan` batch, run `mma-audit` with `subtype: 'plan'` on the plan file. Read the findings. If any `critical` / `high` finding survives, fix the plan and re-audit. This catches the bug class where the plan's named methods/files don't actually exist in the codebase — symbols a prose-coherence audit cannot see.
194
195
 
195
- - **Recipe F — Spec-then-plan-then-execute.** When working from a brainstorming spec: `mma-audit` (`subtype: 'spec'`) → fix → `writing-plans` → `mma-audit` (`subtype: 'plan'`) → fix → `mma-execute-plan`. Spec and plan audits catch orthogonal problem classes.
196
+ - **Recipe F — Spec-then-plan-then-execute (the canonical flow).** When working from a brainstorming spec: `mma-audit` (`subtype: 'spec'`) → fix → `writing-plans` → register the spec as a context block via `mma-context-blocks` → `mma-audit` (`subtype: 'plan'`, `contextBlockIds: [specBlockId]`) → fix → `mma-execute-plan`. Spec audit covers requirement-prose executability; plan audit covers BOTH plan-vs-codebase coherence AND plan-vs-spec coverage (perspective 10 fires only when the spec is in context, which is why the context-block step is load-bearing in this recipe).
196
197
 
197
198
  Anti-pattern alert: **`parallel-rounds-same-target`** (AP1). Three parallel audits on the same document re-flag the same issues without seeing each other's fixes. Run rounds sequentially with a fix between each.
198
199
 
@@ -12,7 +12,7 @@ when_to_use: >-
12
12
  Register once here, then pass the ID via `contextBlockIds` on mma-delegate /
13
13
  mma-execute-plan / mma-audit / mma-review / mma-debug / mma-investigate.
14
14
  Cheaper and faster than inlining the same content N times.
15
- version: 4.4.0
15
+ version: 4.5.1
16
16
  ---
17
17
 
18
18
  # mma-context-blocks
@@ -10,7 +10,7 @@ when_to_use: >-
10
10
  read files, reproduce, trace — OR a methodology skill
11
11
  (superpowers:systematic-debugging) points at the investigation step. Delegate
12
12
  the read/reproduce/trace; the main agent stays on the hypothesis and the fix.
13
- version: 4.4.0
13
+ version: 4.5.1
14
14
  ---
15
15
 
16
16
  # mma-debug
@@ -11,7 +11,7 @@ when_to_use: >-
11
11
  and keep main context free. If a plan file exists → use mma-execute-plan. If
12
12
  the task is audit / review / verify / debug / investigate → use the matching
13
13
  specialized skill.
14
- version: 4.4.0
14
+ version: 4.5.1
15
15
  ---
16
16
 
17
17
  # mma-delegate
@@ -10,7 +10,7 @@ when_to_use: >-
10
10
  superpowers:subagent-driven-development / superpowers:executing-plans —
11
11
  workers are cheaper and don't pollute main context. Task descriptors must
12
12
  match plan headings verbatim.
13
- version: 4.4.0
13
+ version: 4.5.1
14
14
  ---
15
15
 
16
16
  # mma-execute-plan
@@ -12,7 +12,7 @@ when_to_use: >-
12
12
  out mma-investigate (internal) + mma-research (external) in parallel and
13
13
  synthesise the results yourself. DO NOT use for convergent single-answer
14
14
  questions — those are mma-investigate.
15
- version: 4.4.0
15
+ version: 4.5.1
16
16
  ---
17
17
 
18
18
  # mma-explore
@@ -12,7 +12,7 @@ when_to_use: >-
12
12
  git-history queries. OR you are about to read 3+ files / run any grep in main
13
13
  context — that's the inline-labor-leakage anti-pattern (AP2); delegate to this
14
14
  skill instead.
15
- version: 4.4.0
15
+ version: 4.5.1
16
16
  ---
17
17
 
18
18
  # mma-investigate
@@ -10,7 +10,7 @@ when_to_use: >-
10
10
  others do, what published methods exist) AND mmagent is running. Delegate the
11
11
  multi-source web/adapter research to a worker so the main context stays on
12
12
  judgment. NOT for codebase questions — those are mma-investigate.
13
- version: 4.4.0
13
+ version: 4.5.1
14
14
  ---
15
15
 
16
16
  # mma-research
@@ -10,7 +10,7 @@ when_to_use: >-
10
10
  re-try the failed indices only. Prefer this over re-dispatching the whole
11
11
  batch or inline-retrying — it's idempotent and preserves the original batch's
12
12
  diagnostics.
13
- version: 4.4.0
13
+ version: 4.5.1
14
14
  ---
15
15
 
16
16
  # mma-retry
@@ -10,7 +10,7 @@ when_to_use: >-
10
10
  AND mmagent is running. Delegate so each file reviews on its own worker; the
11
11
  main agent only decides what to merge. Review on SOURCE CODE — use mma-audit
12
12
  for prose specs / configs.
13
- version: 4.4.0
13
+ version: 4.5.1
14
14
  ---
15
15
 
16
16
  # mma-review
@@ -11,7 +11,7 @@ when_to_use: >-
11
11
  tasks — AND mmagent is running. Read this once, pick the matching mma-* skill,
12
12
  and delegate there. Applies equally whether the user invoked a superpowers
13
13
  methodology skill or asked directly.
14
- version: 4.4.0
14
+ version: 4.5.1
15
15
  ---
16
16
 
17
17
  # multi-model-agent (router)
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@zhixuan92/multi-model-agent",
3
- "version": "4.4.0",
3
+ "version": "4.5.1",
4
4
  "type": "module",
5
5
  "license": "MIT",
6
6
  "description": "Standalone HTTP server for multi-model-agent. Routes tool-invocation work to Claude, Codex, or OpenAI-compatible sub-agents with async-polling REST dispatch and installable skills for Claude Code, Gemini CLI, Codex CLI, and Cursor.",
@@ -53,7 +53,7 @@
53
53
  },
54
54
  "dependencies": {
55
55
  "@asteasolutions/zod-to-openapi": "^8.5.0",
56
- "@zhixuan92/multi-model-agent-core": "^4.4.0",
56
+ "@zhixuan92/multi-model-agent-core": "^4.5.1",
57
57
  "gray-matter": "^4.0.3",
58
58
  "minimist": "^1.2.8",
59
59
  "proper-lockfile": "^4.1.2",