npm - @zhixuan92/multi-model-agent - Versions diffs - 3.3.0 → 3.4.0 - Mend

@zhixuan92/multi-model-agent 3.3.0 → 3.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (35) hide show

package/README.md +62 -33
package/dist/http/canonicalize-file-paths.d.ts +8 -0
package/dist/http/canonicalize-file-paths.d.ts.map +1 -0
package/dist/http/canonicalize-file-paths.js +43 -0
package/dist/http/canonicalize-file-paths.js.map +1 -0
package/dist/http/execution-context.d.ts.map +1 -1
package/dist/http/execution-context.js +0 -14
package/dist/http/execution-context.js.map +1 -1
package/dist/http/handlers/tools/investigate.d.ts +4 -0
package/dist/http/handlers/tools/investigate.d.ts.map +1 -0
package/dist/http/handlers/tools/investigate.js +81 -0
package/dist/http/handlers/tools/investigate.js.map +1 -0
package/dist/http/server.d.ts.map +1 -1
package/dist/http/server.js +5 -2
package/dist/http/server.js.map +1 -1
package/dist/install/discover.d.ts +1 -1
package/dist/install/discover.d.ts.map +1 -1
package/dist/install/discover.js +1 -0
package/dist/install/discover.js.map +1 -1
package/dist/openapi.d.ts.map +1 -1
package/dist/openapi.js +6 -0
package/dist/openapi.js.map +1 -1
package/dist/skills/_shared/verify-and-review.md +12 -0
package/dist/skills/mma-audit/SKILL.md +45 -18
package/dist/skills/mma-clarifications/SKILL.md +73 -29
package/dist/skills/mma-context-blocks/SKILL.md +56 -24
package/dist/skills/mma-debug/SKILL.md +54 -22
package/dist/skills/mma-delegate/SKILL.md +58 -26
package/dist/skills/mma-execute-plan/SKILL.md +55 -29
package/dist/skills/mma-investigate/SKILL.md +137 -0
package/dist/skills/mma-retry/SKILL.md +65 -22
package/dist/skills/mma-review/SKILL.md +49 -20
package/dist/skills/mma-verify/SKILL.md +49 -18
package/dist/skills/multi-model-agent/SKILL.md +84 -46
package/package.json +2 -2

package/dist/skills/mma-execute-plan/SKILL.md CHANGED Viewed

@@ -1,32 +1,45 @@
 ---
 name: mma-execute-plan
 description: >-
-  Execute tasks from a plan or spec file on disk via the local mmagent HTTP
-  service. Delegates to cheap sub-agents that don't consume your main-model
-  context window. Task descriptors match plan headings; tasks run in parallel.
+  Use when a plan or spec file exists on disk (any markdown with numbered task
+  headings — docs/superpowers/plans/*.md, a TODO list, a spec doc) and you need
+  to implement one or more tasks from it on cheap workers in parallel
 when_to_use: >-
-  A plan file exists on disk (any markdown with numbered task headings —
-  docs/superpowers/plans/*.md, a TODO list, a spec doc) AND you need to
-  implement one or more tasks from it. Prefer this over inline Agent dispatches
-  or superpowers:subagent-driven-development / superpowers:executing-plans when
-  mmagent is running — delegated workers are cheaper and don't pollute main
-  context. Task descriptors must match the plan headings verbatim.
-version: 3.3.0
+  A plan file exists on disk AND you need to implement one or more tasks from it
+  AND mmagent is running. Prefer this over inline Agent dispatches or
+  superpowers:subagent-driven-development / superpowers:executing-plans —
+  workers are cheaper and don't pollute main context. Task descriptors must
+  match plan headings verbatim.
+version: 3.4.0
 ---
-## mma-execute-plan
+# mma-execute-plan
-Dispatch named tasks from a plan file to sub-agents. Task descriptors must
-match plan headings (e.g. `"1. Setup database schema"`). All tasks run in
-parallel and duplicate descriptors are rejected.
+## Overview
-### Endpoint
+Dispatch named tasks from a plan file to workers. Each `tasks` string must match a heading in the plan verbatim (e.g. `"1. Setup database schema"`). All tasks run in parallel; duplicate descriptors are rejected.
+**Core principle:** The plan IS the prompt. Workers re-read the plan file in-process and find their named task — you don't need to inline the task body.
+## When to Use
+**Use when:**
+- A plan/spec markdown exists with numbered task headings
+- You want to dispatch a subset (or all) of those tasks
+- Tasks are mostly independent (parallel-safe)
+**Don't use when:**
+- No plan file → `mma-delegate` (pass the prompt directly)
+- Tasks form a hard linear sequence (later tasks depend on earlier ones' outputs) → dispatch in order, one batch each
+- The "plan" is in conversation only, not on disk → write it to disk first, or use `mma-delegate`
+## Endpoint
 `POST /execute-plan?cwd=<abs-path>`
 @include _shared/auth.md
-### Request body
+## Request body
 ```json
 {
@@ -46,22 +59,19 @@ parallel and duplicate descriptors are rejected.
 | Field | Type | Required | Notes |
 |---|---|---|---|
-| `tasks` | string[] | yes | At least one; must be unique; match plan headings |
+| `tasks` | string[] \| `{task, reviewPolicy}[]` | yes | At least one; must be unique; each string matches a plan heading |
 | `context` | string | no | Short additional context not in the plan |
-| `filePaths` | string[] | no | Plan file + relevant source files |
+| `filePaths` | string[] | no | Plan file + relevant source files. Required: the plan file itself. |
 | `contextBlockIds` | string[] | no | IDs from `mma-context-blocks` |
-| `agentType` | `"standard"` / `"complex"` | no | Worker tier. Default `"standard"` (cheap). Switch to `"complex"` for tasks too large for a standard-tier model to finish in the turn budget (reads many files, produces many edits, or the last run came back with `filesWritten: 0`). |
-| `verifyCommand` | string[] | no | Commands to run after each plan task completion to verify the work |
-| `tasks[].reviewPolicy` | `"full"` / `"spec_only"` / `"diff_only"` / `"off"` | no | Per-task review lifecycle policy when a task is passed as `{ "task": "...", "reviewPolicy": "..." }`. Default `"full"` |
-Set `verifyCommand` when the worker can run a deterministic local check after editing, such as `npm test`, `npm run lint`, or a focused package test. Commands run in order after task completion; each string must be non-empty after trimming. Omit it when no reliable command exists.
+| `agentType` | `"standard"` / `"complex"` | no | Default `"standard"`. Use `"complex"` for tasks too large for the standard tier — reads many files, produces many edits, or the last run came back with `filesWritten: 0`. |
+| `verifyCommand` | string[] | no | See verify-and-review snippet below |
+| `tasks[].reviewPolicy` | `"full"` / `"spec_only"` / `"diff_only"` / `"off"` | no | See verify-and-review snippet below. Default `"full"`. |
-Set `reviewPolicy: 'diff_only'` when you want a cheaper single-pass review of the produced diff without spec-review rework loops. Use `reviewPolicy: 'full'` for default spec + quality review, `reviewPolicy: 'spec_only'` when quality review is not needed, and `reviewPolicy: 'off'` only for trusted low-risk tasks where verification is enough.
+@include _shared/verify-and-review.md
-If the batch reaches `awaiting_clarification`, use `mma-clarifications`
-to confirm or correct the proposed interpretation.
+If the batch reaches `awaiting_clarification`, use `mma-clarifications` to confirm or correct the proposed interpretation.
-### Full example
+## Full example
 ```bash
 BATCH=$(curl -f --show-error -s -X POST \
@@ -72,10 +82,26 @@ BATCH=$(curl -f --show-error -s -X POST \
 BATCH_ID=$(echo "$BATCH" | jq -r '.batchId')
 ```
-Then poll until complete:
 @include _shared/polling.md
 @include _shared/response-shape.md
+## Common pitfalls
+❌ **Task descriptor doesn't match plan heading verbatim**
+> tasks: ["Migrate db schema"]    ← plan heading is "3. Migrate database schema"
+Worker rejects with "no matching task" or matches the wrong one. **Fix:** copy the heading from the plan, including the leading number.
+❌ **Forgetting the plan file in `filePaths`**
+> filePaths: ["/project/src/db/schema.sql"]    ← no plan file
+Worker can't read the task body. **Fix:** always include the plan path: `filePaths: ["/project/docs/plan.md", "/project/src/db/schema.sql"]`.
+❌ **Dispatching dependent tasks in one batch**
+Task 5 depends on Task 4's output → workers race; Task 5 might run before Task 4 finishes. **Fix:** dispatch Task 4, wait for terminal, then dispatch Task 5.
+❌ **Skipping `verifyCommand` when one exists**
+A passing local check is the cheapest signal you're going to get. **Fix:** wire `["npm test"]` or the focused package test.
 @include _shared/error-handling.md

package/dist/skills/mma-investigate/SKILL.md ADDED Viewed

@@ -0,0 +1,137 @@
+---
+name: mma-investigate
+description: >-
+  Use when you need to answer a question about the codebase ("how does X work",
+  "where is Y called", "what does this directory do") and reading + grepping the
+  codebase yourself would consume main-context tokens
+when_to_use: >-
+  A question about THIS codebase has surfaced — from the user, from a
+  methodology skill, or from your own next-step planning — AND mmagent is
+  running. Delegate the read/grep/synthesis to a worker so the main context
+  stays on judgment. Codebase only — does not perform web research or
+  git-history queries.
+version: 3.4.0
+---
+# mma-investigate
+## Overview
+Answer a codebase question via a read-only mmagent worker. The worker greps and reads on its cheap budget; you read its synthesis on yours.
+**Core principle:** Investigation is labor (read, grep, synthesize). Delegate it. The main agent stays on judgment — deciding what the answer means and what to do with it.
+## When to Use
+```dot
+digraph when_to_use {
+    "Question about codebase?" [shape=diamond];
+    "About web / git history?" [shape=diamond];
+    "Already have the file in context?" [shape=diamond];
+    "mma-investigate" [shape=box];
+    "Read inline (1–2 reads)" [shape=box];
+    "WebSearch / git log" [shape=box];
+    "Question about codebase?" -> "About web / git history?";
+    "About web / git history?" -> "WebSearch / git log" [label="yes"];
+    "About web / git history?" -> "Already have the file in context?" [label="no"];
+    "Already have the file in context?" -> "Read inline (1–2 reads)" [label="yes"];
+    "Already have the file in context?" -> "mma-investigate" [label="no"];
+}
+```
+**Use when:**
+- "How does X work in this codebase?"
+- "Where is Y called from?"
+- "What does this directory do?"
+- The answer requires reading 3+ files or grepping
+- Cross-cutting investigations (auth flow across modules, data lineage)
+**Don't use when:**
+- The answer is in 1–2 files you already have in context → just `Read`
+- It's about web docs / external APIs → `WebSearch` / `WebFetch`
+- It's about git history → `git log` / `git blame`
+- You need to MODIFY code based on the finding → `mma-delegate` (research + edit)
+## Endpoint
+`POST /investigate?cwd=<abs-path>`
+@include _shared/auth.md
+## Request body
+```json
+{
+  "question": "How does the auth middleware handle token refresh?",
+  "filePaths": ["/project/src/auth/"],
+  "contextBlockIds": []
+}
+```
+| Field | Type | Required | Notes |
+|---|---|---|---|
+| `question` | string | yes | Natural-language investigation question |
+| `filePaths` | string[] | no | Anchor paths the worker starts from. Worker may grep beyond. |
+| `contextBlockIds` | string[] | no | IDs from `mma-context-blocks` — enables follow-up / delta investigation |
+| `agentType` | `'standard' \| 'complex'` | no | Caller override of the route default (`'complex'`) |
+| `tools` | `'none' \| 'readonly'` | no | Default `'readonly'`. `'no-shell'` and `'full'` are rejected — investigation is read-only |
+**Anchor narrow questions with `filePaths`:**
+❌ `{ "question": "Where is parseConfig called?" }` — searches the whole repo
+✅ `{ "question": "Where is parseConfig called?", "filePaths": ["src/"] }` — bounded
+**Why:** the worker greps and reads under its cost ceiling. Without anchors, broad questions exhaust the budget before they finish.
+## Full example
+```bash
+BATCH=$(curl -f --show-error -s -X POST \
+  -H "Authorization: Bearer $TOKEN" \
+  -H "Content-Type: application/json" \
+  -d '{"question":"How does the auth middleware handle token refresh?"}' \
+  "http://localhost:$PORT/investigate?cwd=/project")
+BATCH_ID=$(echo "$BATCH" | jq -r '.batchId')
+```
+@include _shared/polling.md
+@include _shared/response-shape.md
+## Per-task report shape
+Each task carries an `investigation` field on its per-task report:
+```json
+{
+  "investigation": {
+    "citations": [
+      { "file": "src/auth/refresh.ts", "lines": "45-72", "claim": "Refresh handler reads bearer." }
+    ],
+    "confidence": { "level": "high", "rationale": "All claims directly cited." },
+    "diagnostics": {
+      "malformedCitationLines": 0,
+      "missingRequiredSections": [],
+      "invalidRequiredSections": []
+    }
+  }
+}
+```
+`workerStatus` is one of `done`, `done_with_concerns`, `needs_context`, `blocked`. When `done_with_concerns`, the per-task report carries `incompleteReason` (`turn_cap`, `cost_cap`, `timeout`, or `missing_sections`). When `needs_context`, the worker flagged a `[needs_context]` bullet under `## Unresolved` — re-dispatch with extra context (anchor paths, a context block, or a clarification turn).
+## Common pitfalls
+❌ **Asking for a fix instead of an answer**
+> question: "Refactor the auth middleware to use JWT"
+The investigator can't write — `tools: 'readonly'`. **Fix:** use `mma-delegate` for research-then-edit, or split: investigate first, then dispatch the edit.
+❌ **Treating `done_with_concerns` as failure**
+The worker still produced citations and a confidence level. Read them — partial coverage with `incompleteReason: 'turn_cap'` often answers the question well enough. Re-dispatch with a tighter scope only if the citations are unusable.
+❌ **Inline-reading instead of delegating**
+About to `Read` 3+ files just to answer one question? That's the wrong tradeoff — the worker reads on its cheap budget; you read its synthesis on yours.
+@include _shared/error-handling.md

package/dist/skills/mma-retry/SKILL.md CHANGED Viewed

@@ -1,30 +1,62 @@
 ---
 name: mma-retry
 description: >-
-  Re-run specific failed or incomplete tasks from a previous mmagent batch by
-  index. Preserves the original task specs and only re-executes the named
-  indices.
+  Use when a previous mma-* batch returned partial results (some tasks failed or
+  came back incomplete) and you want to re-run JUST the failed indices without
+  re-dispatching the whole batch
 when_to_use: >-
-  A previous mma-delegate / mma-execute-plan batch returned partial results and
-  you want to re-try the failed indices only. Prefer this over redispatching the
-  whole batch or inline-retrying — it's idempotent and keeps the original
-  batch's diagnostics intact.
-version: 3.3.0
+  A previous mma-delegate / mma-execute-plan / mma-audit / mma-review /
+  mma-verify / mma-debug / mma-investigate batch returned partial results AND
+  you want to re-try the failed indices only. Prefer this over re-dispatching
+  the whole batch or inline-retrying — it's idempotent and preserves the
+  original batch's diagnostics.
+version: 3.4.0
 ---
-## mma-retry
+# mma-retry
-Re-run selected tasks from a completed or failed batch. Specify the original
-`batchId` and the zero-based indices of the tasks to re-run. The retry runs
-those tasks fresh with the same configuration as the original batch.
+## Overview
-### Endpoint
+Re-run selected tasks from a completed or failed batch. Specify the original `batchId` and the zero-based indices of the tasks to re-run. The retry runs those tasks fresh with the same configuration as the original batch and produces a new `batchId`.
+**Core principle:** A batch is the unit of dispatch, but a TASK is the unit of failure. Retry at the task level so successful tasks aren't re-charged.
+## When to Use
+```dot
+digraph when_to_use {
+    "Batch returned terminal?" [shape=diamond];
+    "Some tasks failed/incomplete?" [shape=diamond];
+    "All tasks failed?" [shape=diamond];
+    "mma-retry (selected indices)" [shape=box];
+    "Re-dispatch the whole batch" [shape=box];
+    "Investigate first (mma-debug)" [shape=box];
+    "Batch returned terminal?" -> "Some tasks failed/incomplete?";
+    "Some tasks failed/incomplete?" -> "All tasks failed?" [label="yes"];
+    "Some tasks failed/incomplete?" -> "Done — read results" [label="no"];
+    "All tasks failed?" -> "Investigate first (mma-debug)" [label="yes"];
+    "All tasks failed?" -> "mma-retry (selected indices)" [label="no — partial"];
+}
+```
+**Use when:**
+- A previous batch's terminal envelope shows mixed `done` / `done_with_concerns` / `failed`
+- 1–N tasks (but not all) need a re-run with the same config
+- You want to keep the original batch's diagnostics intact for comparison
+**Don't use when:**
+- All tasks failed → investigate the systemic cause first (`mma-debug`); retrying won't help
+- The original batch is `expired` (TTL elapsed) → re-dispatch fresh
+- You want to change the prompt → re-dispatch with the new prompt; retry preserves the original
+## Endpoint
 `POST /retry?cwd=<abs-path>`
 @include _shared/auth.md
-### Request body
+## Request body
 ```json
 {
@@ -35,13 +67,12 @@ those tasks fresh with the same configuration as the original batch.
 | Field | Type | Required | Notes |
 |---|---|---|---|
-| `batchId` | string (UUID) | yes | Batch ID from a previous dispatch |
-| `taskIndices` | number[] | yes | Zero-based indices to re-run |
+| `batchId` | string (UUID) | yes | Batch ID from a previous dispatch (not yet expired) |
+| `taskIndices` | number[] | yes | Zero-based indices to re-run; must be non-negative integers |
-`taskIndices` must be non-negative integers. To re-run all tasks, pass all
-indices from `0` to `tasks.length - 1`.
+To re-run all tasks: pass `[0, 1, ..., tasks.length - 1]`. (But consider: if all failed, debug instead of retrying.)
-### Full example
+## Full example
 ```bash
 # Original batch had 4 tasks; re-run tasks at index 1 and 3
@@ -50,13 +81,25 @@ BATCH=$(curl -f --show-error -s -X POST \
   -H "Content-Type: application/json" \
   -d '{"batchId":"550e8400-e29b-41d4-a716-446655440000","taskIndices":[1,3]}' \
   "http://localhost:$PORT/retry?cwd=/project")
-BATCH_ID=$(echo "$BATCH" | jq -r '.batchId')
+BATCH_ID=$(echo "$BATCH" | jq -r '.batchId')   # NEW batchId — not the original
 ```
-The retry produces a new `batchId`. Poll the new ID until complete:
 @include _shared/polling.md
 @include _shared/response-shape.md
+## Common pitfalls
+❌ **Retrying after the batch expired**
+TTL elapsed → original task specs are gone. **Fix:** re-dispatch fresh; the retry endpoint returns 404.
+❌ **Retrying without addressing the root cause**
+A flaky task that failed once will likely fail again. **Fix:** investigate (`mma-debug` or read the original `result.error.message`), then retry — or escalate `agentType` to `complex` by re-dispatching.
+❌ **Confusing the new and original `batchId`**
+Retry produces a NEW batchId; polling the original returns the old terminal state. **Fix:** save the retry's `batchId` and poll that one.
+❌ **Using retry to change task config**
+Retry preserves the ORIGINAL config (prompt, agentType, filePaths, reviewPolicy). **Fix:** if you want different config, re-dispatch with `mma-delegate` / `mma-execute-plan`.
 @include _shared/error-handling.md

package/dist/skills/mma-review/SKILL.md CHANGED Viewed

@@ -1,29 +1,46 @@
 ---
 name: mma-review
 description: >-
-  Review code for quality, security, performance, or correctness via the local
-  mmagent HTTP service. Sub-agents run in parallel per file, independent
-  context.
+  Use when source code needs a quality / security / correctness pass — pre-merge
+  review, post-implementation sanity check, or focused look at a small file set
+  — and the review can run in parallel per file
 when_to_use: >-
-  The user asks for a code review, pre-merge check, or quality pass over one or
-  more files OR a methodology skill (superpowers:requesting-code-review,
-  /review, /security-review) points at a review task. Delegate the reviewer pass
-  to mmagent workers — your main context stays free to decide what to merge.
-version: 3.3.0
+  User asks for a code review or pre-merge check, OR a methodology skill
+  (superpowers:requesting-code-review, /review, /security-review) points at one,
+  AND mmagent is running. Delegate so each file reviews on its own worker; the
+  main agent only decides what to merge. Review on SOURCE CODE — use mma-audit
+  for prose specs / configs.
+version: 3.4.0
 ---
-## mma-review
+# mma-review
-Send code or files to sub-agents for structured review. Each file is reviewed
-independently in parallel; results are index-aligned with `filePaths`.
+## Overview
-### Endpoint
+Send code files to workers for structured review. Each file is reviewed independently in parallel; results are index-aligned with `filePaths`.
+**Core principle:** Reviewer is a different model from the implementer — different training, different blind spots. Cross-model review catches what self-review misses.
+## When to Use
+**Use when:**
+- 1+ source code files just changed (post-implementation review)
+- Pre-merge sanity check on a focused diff
+- Security-sensitive code path (`focus: ["security"]`)
+- A specialized review pass (e.g. `focus: ["performance"]` on hot-path code)
+**Don't use when:**
+- The thing being reviewed is prose / spec / config → `mma-audit` (better-suited prompt template)
+- You want to know whether a complete branch is mergeable → run `/ultrareview` (multi-model branch review) instead
+- The diff is one-line / one-character → reading inline is faster than dispatch
+## Endpoint
 `POST /review?cwd=<abs-path>`
 @include _shared/auth.md
-### Request body
+## Request body
 ```json
 {
@@ -36,14 +53,14 @@ independently in parallel; results are index-aligned with `filePaths`.
 | Field | Type | Required | Notes |
 |---|---|---|---|
-| `code` | string | no | Inline code to review |
-| `focus` | string[] | no | Any of `security`, `performance`, `correctness`, `style` |
-| `filePaths` | string[] | no | Files to review (parallel) |
-| `contextBlockIds` | string[] | no | IDs from `mma-context-blocks` |
+| `code` | string | no | Inline code snippet to review |
+| `focus` | string[] | no | Any of `security`, `performance`, `correctness`, `style`. Omit for general review. |
+| `filePaths` | string[] | no | Files to review (one worker per file, parallel) |
+| `contextBlockIds` | string[] | no | IDs from `mma-context-blocks` — useful for design docs the reviewer should validate against |
 Either `code` or `filePaths` (or both) must be provided.
-### Full example
+## Full example
 ```bash
 BATCH=$(curl -f --show-error -s -X POST \
@@ -54,10 +71,22 @@ BATCH=$(curl -f --show-error -s -X POST \
 BATCH_ID=$(echo "$BATCH" | jq -r '.batchId')
 ```
-Then poll until complete:
 @include _shared/polling.md
 @include _shared/response-shape.md
+## Common pitfalls
+❌ **Reviewing a plan/spec markdown with `mma-review`**
+The reviewer is tuned for code constructs (types, call sites, test coverage). On prose it produces vague nits. **Fix:** use `mma-audit` for docs/specs, `mma-review` for source.
+❌ **Omitting `focus` and getting watery findings**
+A general review surfaces low-signal style nits alongside real bugs. **Fix:** specify `focus: ["correctness"]` or `["security"]` to bias the reviewer toward the dimension you care about.
+❌ **Inlining the spec the reviewer should validate against**
+If the reviewer needs to check the diff against a design doc, register the doc once via `mma-context-blocks` and pass the `contextBlockIds`. Inlining N times wastes tokens.
+❌ **Skipping review because "I already read it"**
+Self-review and cross-model review are not the same thing. The whole reason to delegate is the different blind spots. Read the findings; merge what you agree with.
 @include _shared/error-handling.md

package/dist/skills/mma-verify/SKILL.md CHANGED Viewed

@@ -1,28 +1,45 @@
 ---
 name: mma-verify
 description: >-
-  Verify work against a checklist via the local mmagent HTTP service. Sub-agents
-  check each item independently.
+  Use when work is "complete" and you need to confirm acceptance criteria are
+  actually met before claiming so to the user — each checklist item verified
+  independently against the work
 when_to_use: >-
   The user (or a methodology skill like
-  superpowers:verification-before-completion) wants acceptance-criteria checked
-  against implemented work. Delegate the evidence-gathering to mmagent workers —
-  each checklist item is verified independently and in parallel.
-version: 3.3.0
+  superpowers:verification-before-completion) needs acceptance-criteria checked
+  against implemented work BEFORE claiming success. Delegate so each checklist
+  item gets independent evidence-gathering on a worker. Use this BEFORE saying
+  "done" — never after.
+version: 3.4.0
 ---
-## mma-verify
+# mma-verify
-Submit work product and a checklist to sub-agents for independent verification.
-Each checklist item is verified in parallel; results are index-aligned.
+## Overview
-### Endpoint
+Submit work product and a checklist to workers for independent verification. Each checklist item is verified in parallel; results are index-aligned with the input.
+**Core principle:** Self-verification ("I read the files; they look correct") has no external validation. Workers check independently and return evidence (or absence of it) per item.
+## When to Use
+**Use when:**
+- You're about to claim a task is "done" and need evidence per acceptance item
+- A methodology skill (superpowers:verification-before-completion) routed here
+- The user gave a checklist and asked you to confirm each item
+**Don't use when:**
+- The "checklist" is one item — read inline, faster than dispatch
+- You don't have explicit acceptance criteria — write them first, then dispatch
+- The work hasn't been done yet — verification is a post-condition, not a pre-condition
+## Endpoint
 `POST /verify?cwd=<abs-path>`
 @include _shared/auth.md
-### Request body
+## Request body
 ```json
 {
@@ -39,12 +56,12 @@ Each checklist item is verified in parallel; results are index-aligned.
 | Field | Type | Required | Notes |
 |---|---|---|---|
-| `work` | string | no | Inline work product description |
-| `checklist` | string[] | yes | At least one item |
-| `filePaths` | string[] | no | Files to verify against (parallel) |
-| `contextBlockIds` | string[] | no | IDs from `mma-context-blocks` |
+| `work` | string | no | Inline work-product description (e.g. summary of what changed) |
+| `checklist` | string[] | yes | At least one item — each item verified by its own worker |
+| `filePaths` | string[] | no | Files to verify against (workers can read them) |
+| `contextBlockIds` | string[] | no | IDs from `mma-context-blocks` (e.g. the spec the work was supposed to satisfy) |
-### Full example
+## Full example
 ```bash
 BATCH=$(curl -f --show-error -s -X POST \
@@ -55,10 +72,24 @@ BATCH=$(curl -f --show-error -s -X POST \
 BATCH_ID=$(echo "$BATCH" | jq -r '.batchId')
 ```
-Then poll until complete:
 @include _shared/polling.md
 @include _shared/response-shape.md
+## Common pitfalls
+❌ **Vague checklist items**
+> "Code is good"
+The worker can't gather evidence for "good". **Fix:** specific, falsifiable criteria — `"Function parseConfig has at least 3 unit tests covering: missing field, malformed JSON, empty file"`.
+❌ **Verifying without `filePaths`**
+Worker has nothing to read; verdict is speculative. **Fix:** always pass the file(s) the work landed in.
+❌ **Treating verify as the implementation step**
+Verify CHECKS work; it doesn't DO work. If a checklist item fails, dispatch `mma-delegate` to fix it, then re-verify.
+❌ **Skipping verify because "tests pass"**
+Tests verify the test cases that exist. Verify checks the acceptance criteria — which often include things tests don't (docs updated, no debug-print left, etc.).
 @include _shared/error-handling.md