npm - @zhixuan92/multi-model-agent - Versions diffs - 3.8.0 → 3.8.1 - Mend

@zhixuan92/multi-model-agent 3.8.0 → 3.8.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (14) hide show

package/README.md +68 -20
package/dist/skills/_shared/response-shape.md +28 -17
package/dist/skills/mma-audit/SKILL.md +17 -10
package/dist/skills/mma-clarifications/SKILL.md +1 -1
package/dist/skills/mma-context-blocks/SKILL.md +1 -1
package/dist/skills/mma-debug/SKILL.md +17 -10
package/dist/skills/mma-delegate/SKILL.md +2 -1
package/dist/skills/mma-execute-plan/SKILL.md +5 -4
package/dist/skills/mma-investigate/SKILL.md +19 -11
package/dist/skills/mma-retry/SKILL.md +1 -1
package/dist/skills/mma-review/SKILL.md +17 -10
package/dist/skills/mma-verify/SKILL.md +17 -10
package/dist/skills/multi-model-agent/SKILL.md +1 -1
package/package.json +2 -2

package/README.md CHANGED Viewed

@@ -82,7 +82,7 @@ Two ways — pick one:
 ```bash
 mmagent serve                          # 127.0.0.1:7337 by default
-curl -s http://localhost:7337/health   # → {"ok":true,"version":"3.8.0",...}
+curl -s http://localhost:7337/health   # → {"ok":true,"version":"3.8.1",...}
 ```
 For an always-on background install (survives reboots): [launchd / systemd templates](./scripts/README.md).
@@ -98,6 +98,72 @@ mmagent update-skills               # refresh installed skills
 A drift warning prints on `mmagent serve` if installed skills are older than the daemon. To rotate the auth token: `rm ~/.multi-model/auth-token && mmagent serve`.
+## Skills
+Skills are the surface your AI client sees. `mmagent install-skill` writes them to the client's skill directory; the client then picks the right one based on what you ask. You don't call them by hand — you describe the work, the client routes it to the matching skill, the skill calls the matching REST endpoint.
+### Work-delegation skills
+| Skill | Target endpoint | Use when |
+|---|---|---|
+| `mma-delegate` | `POST /delegate` | Ad-hoc implementation or research tasks **without** a plan file — run them in parallel on cheap workers. |
+| `mma-execute-plan` | `POST /execute-plan` | A plan / spec markdown exists on disk with numbered task headings; implement one or more tasks from it. |
+| `mma-investigate` | `POST /investigate` | Answer a question about *this* codebase ("how does X work", "where is Y called") without burning main-context tokens on grep + reads. |
+| `mma-debug` | `POST /debug` | A test fails, a build breaks, or behavior is unexpected — delegate the reproduce/trace, keep the hypothesis on the main agent. |
+| `mma-review` | `POST /review` | Source-code review (pre-merge, post-implementation, security-focused). One worker per file, in parallel. |
+| `mma-audit` | `POST /audit` | Audit a prose document — spec, config, PR description — for correctness, security, or style. |
+| `mma-verify` | `POST /verify` | Check acceptance criteria against finished work *before* claiming done. One worker per checklist item. |
+### Plumbing skills
+| Skill | Target endpoint | Use when |
+|---|---|---|
+| `mma-context-blocks` | `POST/DELETE /context-blocks` | The same large doc (>~2 KB) will be referenced by 2+ subsequent mma-* calls — register once, pass the ID instead of re-uploading. |
+| `mma-clarifications` | `POST /clarifications/confirm` | A previous batch's terminal envelope returned a `proposedInterpretation` string — the service is paused waiting for you to confirm or correct its read. |
+| `mma-retry` | `POST /retry` | A previous batch came back partial — re-run only the failed indices without re-dispatching the whole batch. |
+The `multi-model-agent` skill (no `mma-` prefix) is a top-level overview your client reads first to pick which `mma-*` skill applies.
+### Two generic usage samples
+**Sample 1 — implement a feature from a plan**
+```
+You: "Execute tasks 3, 4, and 5 from docs/plans/auth-rewrite.md"
+↓
+Client picks mma-execute-plan (plan file on disk, multiple independent tasks)
+↓
+mmagent dispatches 3 workers in parallel on the standard agent (e.g. MiniMax-M2.7),
+each runs cross-agent review on the complex agent, returns a structured report.
+↓
+You see one consolidated headline: "$0.04 actual / $1.20 saved vs claude-opus-4-7 (30× ROI)"
+```
+**Sample 2 — debug a failing test (multiple skills chained)**
+```
+You: "tests/auth/session.test.ts is failing intermittently after the token-refresh refactor — figure it out and fix it"
+↓
+Step 1 — mma-context-blocks
+  The failing test output + the refactor diff are ~8 KB and will be referenced by every
+  downstream call. Register once, get a contextBlockId, reuse it.
+↓
+Step 2 — mma-debug
+  Worker reproduces the failure, traces across session.ts + token-refresh.ts, returns a
+  root-cause hypothesis: "race between refresh-in-flight and session.invalidate()".
+  Main agent stays on the hypothesis, decides the fix shape.
+↓
+Step 3 — mma-delegate
+  Dispatch the actual code change as an ad-hoc task (no plan file). Worker writes the
+  fix, runs the failing test 20× to confirm the race is gone.
+↓
+Step 4 — mma-verify
+  One worker per acceptance criterion: (a) failing test now passes, (b) no other
+  auth tests regressed, (c) refresh path still emits the expected telemetry.
+↓
+Total cost: ~$0.08. Main-context tokens consumed: just the hypotheses and the verdicts.
+```
 ## Configuration reference
 ### Lookup order
@@ -199,24 +265,6 @@ mmagent telemetry reset-id                       # rotate the local Ed25519 iden
 mmagent telemetry dump-queue                     # print the locally-queued events as JSON
 ```
-## Shipped skills
-Skills are Markdown prompts that tell your AI client when and how to call each endpoint. `mmagent install-skill` inlines the shared auth/polling patterns at install time.
-| Skill | Target endpoint |
-|---|---|
-| `multi-model-agent` | Overview + skill map (read first to pick the right `mma-*` skill) |
-| `mma-delegate` | `POST /delegate` |
-| `mma-audit` | `POST /audit` |
-| `mma-review` | `POST /review` |
-| `mma-verify` | `POST /verify` |
-| `mma-debug` | `POST /debug` |
-| `mma-execute-plan` | `POST /execute-plan` |
-| `mma-retry` | `POST /retry` |
-| `mma-investigate` | `POST /investigate` |
-| `mma-context-blocks` | `POST/DELETE /context-blocks` |
-| `mma-clarifications` | `POST /clarifications/confirm` |
 ## Architecture
 `mmagent serve` runs a loopback HTTP server. Each tool call dispatches to a labor agent (standard or complex), runs a cross-agent review cycle, and returns a structured report. Tasks run in parallel; each has a cost ceiling and wall-clock timeout.
@@ -237,7 +285,7 @@ Full design rationale: [DIRECTION.md](https://github.com/zhixuan312/multi-model-
 ## What's new
-Latest: **3.8.0** — read-only reviewed lifecycle: all 5 read-only routes (audit, review, verify, investigate, debug) now run a single `quality_only` review with bounded rework, structured `findings[]` worker output, and forced cross-tier review (worker complex, reviewer standard). Verify worker tier upgraded to complex. `MMAGENT_READ_ONLY_REVIEW` kill switch for rollback. Full history: [CHANGELOG](https://github.com/zhixuan312/multi-model-agent/blob/master/CHANGELOG.md).
+Latest: **3.8.1** — read-only review becomes annotation, not gating. The 5 read-only routes (audit, review, verify, investigate, debug) now run a single reviewer pass that annotates each worker finding with `reviewerConfidence` (0-100) and an optional `reviewerSeverity` correction — no rework loop, restoring 3.7.0-comparable wall-clock. `Finding` schema simplified (drop `file`/`line`/`sourceQuote`; required `evidence`; rename `suggestedFix` → `suggestion`). Full history: [CHANGELOG](https://github.com/zhixuan312/multi-model-agent/blob/master/CHANGELOG.md).
 ## Full documentation

package/dist/skills/_shared/response-shape.md CHANGED Viewed

@@ -3,34 +3,46 @@
 ### POST /<tool>?cwd=<abs> — dispatch response (202)
 ```json
-{
-  "batchId": "<uuid>",
-  "state": "pending"
-}
+{ "batchId": "<uuid>", "statusUrl": "/batch/<uuid>" }
 ```
+Use `batchId` to poll. `statusUrl` is a convenience pointer.
 ### GET /batch/:id — polling response
+The HTTP status is the state discriminator:
+| Status | Meaning |
+|---|---|
+| `202 text/plain` | Still pending — body is the running headline string (e.g. `"1/2 running, 47s elapsed"`) |
+| `200 application/json` | Terminal — body is the uniform 7-field envelope below |
+| `404` / `401` / `5xx` | Error — see Error response below; stop polling |
+The terminal JSON envelope always has these 7 fields. Each may be a real value or a `not_applicable` sentinel:
 ```json
 {
-  "batchId": "<uuid>",
-  "state": "pending | running | awaiting_clarification | complete | failed | expired",
-  "proposedInterpretation": "<string>",
-  "results": [ ... ],
   "headline": "<string>",
-  "batchTimings": { ... },
-  "costSummary": { ... }
+  "results": [ /* per-task result objects */ ],
+  "batchTimings": { /* timings */ },
+  "costSummary": { /* cost roll-up */ },
+  "structuredReport": { /* parsed sections */ },
+  "error": { "kind": "not_applicable", "reason": "batch succeeded" },
+  "proposedInterpretation": { "kind": "not_applicable", "reason": "batch not awaiting clarification" }
 }
 ```
-`proposedInterpretation` is only present when `state` is `awaiting_clarification`.
+Read the envelope by the shape of `error` and `proposedInterpretation`:
-`results`, `headline`, `batchTimings`, and `costSummary` are only present
-when `state` is `complete` or `failed`.
+| Shape | Meaning |
+|---|---|
+| `error` is a real object (with `code` / `message`) | Batch failed — read `error.code` + `error.message` |
+| `proposedInterpretation` is a string | Batch is awaiting clarification — invoke `mma-clarifications` |
+| Both are `{kind: "not_applicable", ...}` sentinels | Batch succeeded — read `results` |
 ### GET /batch/:id?taskIndex=N — single task slice
-Returns the same shape but `results` contains only the task at index `N`.
+Same 7-field envelope. `results` contains exactly the task at index `N`. Returns `404 unknown_task_index` if `N` is out of range.
 ### Error response (4xx / 5xx)
@@ -38,9 +50,8 @@ Returns the same shape but `results` contains only the task at index `N`.
 {
   "error": "<code>",
   "message": "<human-readable>",
-  "details": { ... }
+  "details": { /* optional structured context, e.g. fieldErrors for 400 */ }
 }
 ```
-`details` is optional and present only when the server has structured
-additional context (e.g. `fieldErrors` for validation failures).
+`details` is optional and present only when the server has structured additional context.

package/dist/skills/mma-audit/SKILL.md CHANGED Viewed

@@ -8,7 +8,7 @@ when_to_use: >-
   User asks for a doc/spec/config audit OR a methodology skill
   (superpowers:dispatching-parallel-agents, /security-review) points at one AND
   mmagent is running. Audit on PROSE/SPEC docs — use mma-review for source code.
-version: 3.8.0
+version: 3.8.1
 ---
 # mma-audit
@@ -72,19 +72,26 @@ BATCH_ID=$(echo "$BATCH" | jq -r '.batchId')
 @include _shared/response-shape.md
-## Reading the review verdicts
+## Reading the review verdicts (annotation model — 3.8.1+)
-The terminal envelope now includes:
+The terminal envelope includes:
 - `specReviewVerdict: 'not_applicable'` — read-only routes have no spec review stage.
-- `qualityReviewVerdict` — verdict from the cross-agent quality review.
-- `roundsUsed` — number of worker attempts (`1` = approved on first try; `2`+ = rework rounds; `0` = review topology disabled via env var).
+- `qualityReviewVerdict` — outcome of the single annotation pass.
+- `roundsUsed` — `1` when reviewer ran (annotated or errored), `0` when reviewer was skipped.
+There is no rework loop. The reviewer annotates each finding in place and exits — never gates, never causes the worker to re-run.
 Action per `qualityReviewVerdict`:
-- `'approved'` — findings are grounded; act on them.
-- `'changes_required'` — the worker reworked but couldn't fully satisfy the reviewer at the rework cap. Drill into individually flagged findings before acting.
-- `'concerns'` — non-blocking issues raised; proceed but read the per-finding feedback.
-- `'skipped'` — kill switch (`MMAGENT_READ_ONLY_REVIEW`) disabled review for this route. Treat output as today.
-- `'error'` — reviewer call failed (transport, rate-limit). No attestation; fall back to caution.
+- `'annotated'` — every finding in `findings[]` has `reviewerConfidence` (integer 0-100) and possibly `reviewerSeverity`. Sort or filter by confidence; treat low-confidence findings with skepticism.
+- `'skipped'` — kill switch (`MMAGENT_READ_ONLY_REVIEW=disabled` or per-route `MMAGENT_READ_ONLY_REVIEW_AUDIT=disabled`) bypassed the reviewer. Findings carry no reviewer fields; treat as raw worker output.
+- `'error'` — reviewer call or response parsing failed. Findings have no reviewer fields; fall back to caution.
+### Per-finding reviewer fields
+Every finding the worker emits has the standard fields (`id`, `severity`, `claim`, `evidence`, `suggestion?`). After a successful annotation pass, two more fields are added:
+- `reviewerConfidence` (integer 0-100): how confident the reviewer is that the finding is correct, on-brief, and grounded. Use as a filter (`>=70`) or a sort key for triage.
+- `reviewerSeverity?` (`'high' | 'medium' | 'low'`): only present when the reviewer disagrees with the worker's `severity`. Workers tend to inflate severity; use this to dial down. Trust `reviewerSeverity` over `severity` when present.
 ## Best practices

package/dist/skills/mma-clarifications/SKILL.md CHANGED Viewed

@@ -12,7 +12,7 @@ when_to_use: >-
   `proposedInterpretation` is a hard gate — the batch is paused, not
   informational. The batch will not complete until the caller responds. Treating
   it as advisory is the clarification-as-info anti-pattern (AP5).
-version: 3.8.0
+version: 3.8.1
 ---
 # mma-clarifications

package/dist/skills/mma-context-blocks/SKILL.md CHANGED Viewed

@@ -12,7 +12,7 @@ when_to_use: >-
   Register once here, then pass the ID via `contextBlockIds` on mma-delegate /
   mma-execute-plan / mma-audit / mma-review / mma-verify / mma-debug /
   mma-investigate. Cheaper and faster than inlining the same content N times.
-version: 3.8.0
+version: 3.8.1
 ---
 # mma-context-blocks

package/dist/skills/mma-debug/SKILL.md CHANGED Viewed

@@ -10,7 +10,7 @@ when_to_use: >-
   read files, reproduce, trace — OR a methodology skill
   (superpowers:systematic-debugging) points at the investigation step. Delegate
   the read/reproduce/trace; the main agent stays on the hypothesis and the fix.
-version: 3.8.0
+version: 3.8.1
 ---
 # mma-debug
@@ -78,19 +78,26 @@ BATCH_ID=$(echo "$BATCH" | jq -r '.batchId')
 @include _shared/response-shape.md
-## Reading the review verdicts
+## Reading the review verdicts (annotation model — 3.8.1+)
-The terminal envelope now includes:
+The terminal envelope includes:
 - `specReviewVerdict: 'not_applicable'` — read-only routes have no spec review stage.
-- `qualityReviewVerdict` — verdict from the cross-agent quality review.
-- `roundsUsed` — number of worker attempts (`1` = approved on first try; `2`+ = rework rounds; `0` = review topology disabled via env var).
+- `qualityReviewVerdict` — outcome of the single annotation pass.
+- `roundsUsed` — `1` when reviewer ran (annotated or errored), `0` when reviewer was skipped.
+There is no rework loop. The reviewer annotates each finding in place and exits — never gates, never causes the worker to re-run.
 Action per `qualityReviewVerdict`:
-- `'approved'` — findings are grounded; act on them.
-- `'changes_required'` — the worker reworked but couldn't fully satisfy the reviewer at the rework cap. Drill into individually flagged findings before acting.
-- `'concerns'` — non-blocking issues raised; proceed but read the per-finding feedback.
-- `'skipped'` — kill switch (`MMAGENT_READ_ONLY_REVIEW`) disabled review for this route. Treat output as today.
-- `'error'` — reviewer call failed (transport, rate-limit). No attestation; fall back to caution.
+- `'annotated'` — every finding in `findings[]` has `reviewerConfidence` (integer 0-100) and possibly `reviewerSeverity`. Sort or filter by confidence; treat low-confidence findings with skepticism.
+- `'skipped'` — kill switch (`MMAGENT_READ_ONLY_REVIEW=disabled` or per-route `MMAGENT_READ_ONLY_REVIEW_DEBUG=disabled`) bypassed the reviewer. Findings carry no reviewer fields; treat as raw worker output.
+- `'error'` — reviewer call or response parsing failed. Findings have no reviewer fields; fall back to caution.
+### Per-finding reviewer fields
+Every finding the worker emits has the standard fields (`id`, `severity`, `claim`, `evidence`, `suggestion?`). After a successful annotation pass, two more fields are added:
+- `reviewerConfidence` (integer 0-100): how confident the reviewer is that the finding is correct, on-brief, and grounded. Use as a filter (`>=70`) or a sort key for triage.
+- `reviewerSeverity?` (`'high' | 'medium' | 'low'`): only present when the reviewer disagrees with the worker's `severity`. Workers tend to inflate severity; use this to dial down. Trust `reviewerSeverity` over `severity` when present.
 ## Best practices

package/dist/skills/mma-delegate/SKILL.md CHANGED Viewed

@@ -11,7 +11,7 @@ when_to_use: >-
   and keep main context free. If a plan file exists → use mma-execute-plan. If
   the task is audit / review / verify / debug / investigate → use the matching
   specialized skill.
-version: 3.8.0
+version: 3.8.1
 ---
 # mma-delegate
@@ -65,6 +65,7 @@ Dispatch one or more ad-hoc tasks to workers concurrently. Each task is an indep
 | `tasks[].filePaths` | string[] | no | Files the worker focuses on |
 | `tasks[].done` | string | no | Acceptance criteria |
 | `tasks[].contextBlockIds` | string[] | no | IDs from `mma-context-blocks` |
+| `tasks[].maxCostUSD` | number | no | Per-task cost cap in USD (positive finite). Default 10 when omitted. |
 | `tasks[].verifyCommand` | string[] | no | See verify-and-review snippet below |
 | `tasks[].reviewPolicy` | `"full"` / `"spec_only"` / `"diff_only"` / `"off"` | no | See verify-and-review snippet below. Default `"full"` |

package/dist/skills/mma-execute-plan/SKILL.md CHANGED Viewed

@@ -10,7 +10,7 @@ when_to_use: >-
   superpowers:subagent-driven-development / superpowers:executing-plans —
   workers are cheaper and don't pollute main context. Task descriptors must
   match plan headings verbatim.
-version: 3.8.0
+version: 3.8.1
 ---
 # mma-execute-plan
@@ -52,8 +52,7 @@ Dispatch named tasks from a plan file to workers. Each `tasks` string must match
     "/project/docs/plan.md",
     "/project/src/auth/login.ts"
   ],
-  "contextBlockIds": [],
-  "agentType": "standard"
+  "contextBlockIds": []
 }
 ```
@@ -63,12 +62,14 @@ Dispatch named tasks from a plan file to workers. Each `tasks` string must match
 | `context` | string | no | Short additional context not in the plan |
 | `filePaths` | string[] | no | Plan file + relevant source files. Required: the plan file itself. |
 | `contextBlockIds` | string[] | no | IDs from `mma-context-blocks` |
-| `agentType` | `"standard"` / `"complex"` | no | Default `"standard"`. Use `"complex"` for tasks too large for the standard tier — reads many files, produces many edits, or the last run came back with `filesWritten: 0`. |
+| `maxCostUSD` | number | no | Per-task cost cap in USD (positive finite). Default 10 when omitted. |
 | `verifyCommand` | string[] | no | See verify-and-review snippet below |
 | `tasks[].reviewPolicy` | `"full"` / `"spec_only"` / `"diff_only"` / `"off"` | no | See verify-and-review snippet below. Default `"full"`. |
 @include _shared/verify-and-review.md
+> **No `agentType` here.** Worker tier is set by the plan and per-route defaults. For ad-hoc work where you need direct tier control, use `mma-delegate`.
 If the batch reaches `awaiting_clarification`, use `mma-clarifications` to confirm or correct the proposed interpretation.
 ## Full example

package/dist/skills/mma-investigate/SKILL.md CHANGED Viewed

@@ -12,7 +12,7 @@ when_to_use: >-
   git-history queries. OR you are about to read 3+ files / run any grep in main
   context — that's the inline-labor-leakage anti-pattern (AP2); delegate to this
   skill instead.
-version: 3.8.0
+version: 3.8.1
 ---
 # mma-investigate
@@ -76,9 +76,10 @@ digraph when_to_use {
 | `question` | string | yes | Natural-language investigation question |
 | `filePaths` | string[] | no | Anchor paths the worker starts from. Worker may grep beyond. |
 | `contextBlockIds` | string[] | no | IDs from `mma-context-blocks` — enables follow-up / delta investigation |
-| `agentType` | `'standard' \| 'complex'` | no | Caller override of the route default (`'complex'`) |
 | `tools` | `'none' \| 'readonly'` | no | Default `'readonly'`. `'no-shell'` and `'full'` are rejected — investigation is read-only |
+> Worker tier for `mma-investigate` is hardcoded to `complex` and is not caller-configurable. Sending `agentType` is rejected with HTTP 400.
 **Anchor narrow questions with `filePaths`:**
 ❌ `{ "question": "Where is parseConfig called?" }` — searches the whole repo
@@ -123,19 +124,26 @@ Each task carries an `investigation` field on its per-task report:
 `workerStatus` is one of `done`, `done_with_concerns`, `needs_context`, `blocked`. When `done_with_concerns`, the per-task report carries `incompleteReason` (`turn_cap`, `cost_cap`, `timeout`, or `missing_sections`). When `needs_context`, the worker flagged a `[needs_context]` bullet under `## Unresolved` — re-dispatch with extra context (anchor paths, a context block, or a clarification turn).
-## Reading the review verdicts
+## Reading the review verdicts (annotation model — 3.8.1+)
-The terminal envelope now includes:
+The terminal envelope includes:
 - `specReviewVerdict: 'not_applicable'` — read-only routes have no spec review stage.
-- `qualityReviewVerdict` — verdict from the cross-agent quality review.
-- `roundsUsed` — number of worker attempts (`1` = approved on first try; `2`+ = rework rounds; `0` = review topology disabled via env var).
+- `qualityReviewVerdict` — outcome of the single annotation pass.
+- `roundsUsed` — `1` when reviewer ran (annotated or errored), `0` when reviewer was skipped.
+There is no rework loop. The reviewer annotates each finding in place and exits — never gates, never causes the worker to re-run.
 Action per `qualityReviewVerdict`:
-- `'approved'` — findings are grounded; act on them.
-- `'changes_required'` — the worker reworked but couldn't fully satisfy the reviewer at the rework cap. Drill into individually flagged findings before acting.
-- `'concerns'` — non-blocking issues raised; proceed but read the per-finding feedback.
-- `'skipped'` — kill switch (`MMAGENT_READ_ONLY_REVIEW`) disabled review for this route. Treat output as today.
-- `'error'` — reviewer call failed (transport, rate-limit). No attestation; fall back to caution.
+- `'annotated'` — every finding in `findings[]` has `reviewerConfidence` (integer 0-100) and possibly `reviewerSeverity`. Sort or filter by confidence; treat low-confidence findings with skepticism.
+- `'skipped'` — kill switch (`MMAGENT_READ_ONLY_REVIEW=disabled` or per-route `MMAGENT_READ_ONLY_REVIEW_INVESTIGATE=disabled`) bypassed the reviewer. Findings carry no reviewer fields; treat as raw worker output.
+- `'error'` — reviewer call or response parsing failed. Findings have no reviewer fields; fall back to caution.
+### Per-finding reviewer fields
+Every finding the worker emits has the standard fields (`id`, `severity`, `claim`, `evidence`, `suggestion?`). After a successful annotation pass, two more fields are added:
+- `reviewerConfidence` (integer 0-100): how confident the reviewer is that the finding is correct, on-brief, and grounded. Use as a filter (`>=70`) or a sort key for triage.
+- `reviewerSeverity?` (`'high' | 'medium' | 'low'`): only present when the reviewer disagrees with the worker's `severity`. Workers tend to inflate severity; use this to dial down. Trust `reviewerSeverity` over `severity` when present.
 ## Best practices

package/dist/skills/mma-retry/SKILL.md CHANGED Viewed

@@ -10,7 +10,7 @@ when_to_use: >-
   you want to re-try the failed indices only. Prefer this over re-dispatching
   the whole batch or inline-retrying — it's idempotent and preserves the
   original batch's diagnostics.
-version: 3.8.0
+version: 3.8.1
 ---
 # mma-retry

package/dist/skills/mma-review/SKILL.md CHANGED Viewed

@@ -10,7 +10,7 @@ when_to_use: >-
   AND mmagent is running. Delegate so each file reviews on its own worker; the
   main agent only decides what to merge. Review on SOURCE CODE — use mma-audit
   for prose specs / configs.
-version: 3.8.0
+version: 3.8.1
 ---
 # mma-review
@@ -75,19 +75,26 @@ BATCH_ID=$(echo "$BATCH" | jq -r '.batchId')
 @include _shared/response-shape.md
-## Reading the review verdicts
+## Reading the review verdicts (annotation model — 3.8.1+)
-The terminal envelope now includes:
+The terminal envelope includes:
 - `specReviewVerdict: 'not_applicable'` — read-only routes have no spec review stage.
-- `qualityReviewVerdict` — verdict from the cross-agent quality review.
-- `roundsUsed` — number of worker attempts (`1` = approved on first try; `2`+ = rework rounds; `0` = review topology disabled via env var).
+- `qualityReviewVerdict` — outcome of the single annotation pass.
+- `roundsUsed` — `1` when reviewer ran (annotated or errored), `0` when reviewer was skipped.
+There is no rework loop. The reviewer annotates each finding in place and exits — never gates, never causes the worker to re-run.
 Action per `qualityReviewVerdict`:
-- `'approved'` — findings are grounded; act on them.
-- `'changes_required'` — the worker reworked but couldn't fully satisfy the reviewer at the rework cap. Drill into individually flagged findings before acting.
-- `'concerns'` — non-blocking issues raised; proceed but read the per-finding feedback.
-- `'skipped'` — kill switch (`MMAGENT_READ_ONLY_REVIEW`) disabled review for this route. Treat output as today.
-- `'error'` — reviewer call failed (transport, rate-limit). No attestation; fall back to caution.
+- `'annotated'` — every finding in `findings[]` has `reviewerConfidence` (integer 0-100) and possibly `reviewerSeverity`. Sort or filter by confidence; treat low-confidence findings with skepticism.
+- `'skipped'` — kill switch (`MMAGENT_READ_ONLY_REVIEW=disabled` or per-route `MMAGENT_READ_ONLY_REVIEW_REVIEW=disabled`) bypassed the reviewer. Findings carry no reviewer fields; treat as raw worker output.
+- `'error'` — reviewer call or response parsing failed. Findings have no reviewer fields; fall back to caution.
+### Per-finding reviewer fields
+Every finding the worker emits has the standard fields (`id`, `severity`, `claim`, `evidence`, `suggestion?`). After a successful annotation pass, two more fields are added:
+- `reviewerConfidence` (integer 0-100): how confident the reviewer is that the finding is correct, on-brief, and grounded. Use as a filter (`>=70`) or a sort key for triage.
+- `reviewerSeverity?` (`'high' | 'medium' | 'low'`): only present when the reviewer disagrees with the worker's `severity`. Workers tend to inflate severity; use this to dial down. Trust `reviewerSeverity` over `severity` when present.
 ## Best practices

package/dist/skills/mma-verify/SKILL.md CHANGED Viewed

@@ -10,7 +10,7 @@ when_to_use: >-
   against implemented work BEFORE claiming success. Delegate so each checklist
   item gets independent evidence-gathering on a worker. Use this BEFORE saying
   "done" — never after.
-version: 3.8.0
+version: 3.8.1
 ---
 # mma-verify
@@ -76,19 +76,26 @@ BATCH_ID=$(echo "$BATCH" | jq -r '.batchId')
 @include _shared/response-shape.md
-## Reading the review verdicts
+## Reading the review verdicts (annotation model — 3.8.1+)
-The terminal envelope now includes:
+The terminal envelope includes:
 - `specReviewVerdict: 'not_applicable'` — read-only routes have no spec review stage.
-- `qualityReviewVerdict` — verdict from the cross-agent quality review.
-- `roundsUsed` — number of worker attempts (`1` = approved on first try; `2`+ = rework rounds; `0` = review topology disabled via env var).
+- `qualityReviewVerdict` — outcome of the single annotation pass.
+- `roundsUsed` — `1` when reviewer ran (annotated or errored), `0` when reviewer was skipped.
+There is no rework loop. The reviewer annotates each finding in place and exits — never gates, never causes the worker to re-run.
 Action per `qualityReviewVerdict`:
-- `'approved'` — findings are grounded; act on them.
-- `'changes_required'` — the worker reworked but couldn't fully satisfy the reviewer at the rework cap. Drill into individually flagged findings before acting.
-- `'concerns'` — non-blocking issues raised; proceed but read the per-finding feedback.
-- `'skipped'` — kill switch (`MMAGENT_READ_ONLY_REVIEW`) disabled review for this route. Treat output as today.
-- `'error'` — reviewer call failed (transport, rate-limit). No attestation; fall back to caution.
+- `'annotated'` — every finding in `findings[]` has `reviewerConfidence` (integer 0-100) and possibly `reviewerSeverity`. Sort or filter by confidence; treat low-confidence findings with skepticism.
+- `'skipped'` — kill switch (`MMAGENT_READ_ONLY_REVIEW=disabled` or per-route `MMAGENT_READ_ONLY_REVIEW_VERIFY=disabled`) bypassed the reviewer. Findings carry no reviewer fields; treat as raw worker output.
+- `'error'` — reviewer call or response parsing failed. Findings have no reviewer fields; fall back to caution.
+### Per-finding reviewer fields
+Every finding the worker emits has the standard fields (`id`, `severity`, `claim`, `evidence`, `suggestion?`). After a successful annotation pass, two more fields are added:
+- `reviewerConfidence` (integer 0-100): how confident the reviewer is that the finding is correct, on-brief, and grounded. Use as a filter (`>=70`) or a sort key for triage.
+- `reviewerSeverity?` (`'high' | 'medium' | 'low'`): only present when the reviewer disagrees with the worker's `severity`. Workers tend to inflate severity; use this to dial down. Trust `reviewerSeverity` over `severity` when present.
 ## Best practices

package/dist/skills/multi-model-agent/SKILL.md CHANGED Viewed

@@ -11,7 +11,7 @@ when_to_use: >-
   tasks — AND mmagent is running. Read this once, pick the matching mma-* skill,
   and delegate there. Applies equally whether the user invoked a superpowers
   methodology skill or asked directly.
-version: 3.8.0
+version: 3.8.1
 ---
 # multi-model-agent (router)

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@zhixuan92/multi-model-agent",
-  "version": "3.8.0",
+  "version": "3.8.1",
   "type": "module",
   "license": "MIT",
   "description": "Standalone HTTP server for multi-model-agent. Routes tool-invocation work to Claude, Codex, or OpenAI-compatible sub-agents with async-polling REST dispatch and installable skills for Claude Code, Gemini CLI, Codex CLI, and Cursor.",
@@ -52,7 +52,7 @@
   },
   "dependencies": {
     "@asteasolutions/zod-to-openapi": "^8.5.0",
-    "@zhixuan92/multi-model-agent-core": "^3.8.0",
+    "@zhixuan92/multi-model-agent-core": "^3.8.1",
     "gray-matter": "^4.0.3",
     "minimist": "^1.2.8",
     "proper-lockfile": "^4.1.2",