npm - @zhixuan92/multi-model-agent - Versions diffs - 3.2.0 → 3.4.0 - Mend

@zhixuan92/multi-model-agent 3.2.0 → 3.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (65) hide show

package/README.md +62 -33
package/dist/http/canonicalize-file-paths.d.ts +8 -0
package/dist/http/canonicalize-file-paths.d.ts.map +1 -0
package/dist/http/canonicalize-file-paths.js +43 -0
package/dist/http/canonicalize-file-paths.js.map +1 -0
package/dist/http/execution-context.d.ts.map +1 -1
package/dist/http/execution-context.js +0 -14
package/dist/http/execution-context.js.map +1 -1
package/dist/http/handlers/control/batch-slice.d.ts +4 -0
package/dist/http/handlers/control/batch-slice.d.ts.map +1 -0
package/dist/http/handlers/control/batch-slice.js +40 -0
package/dist/http/handlers/control/batch-slice.js.map +1 -0
package/dist/http/handlers/control/retry.d.ts +4 -0
package/dist/http/handlers/control/retry.d.ts.map +1 -0
package/dist/http/handlers/control/retry.js +60 -0
package/dist/http/handlers/control/retry.js.map +1 -0
package/dist/http/handlers/tools/audit.d.ts.map +1 -1
package/dist/http/handlers/tools/audit.js +2 -0
package/dist/http/handlers/tools/audit.js.map +1 -1
package/dist/http/handlers/tools/debug.d.ts.map +1 -1
package/dist/http/handlers/tools/debug.js +2 -0
package/dist/http/handlers/tools/debug.js.map +1 -1
package/dist/http/handlers/tools/delegate.d.ts.map +1 -1
package/dist/http/handlers/tools/delegate.js +2 -0
package/dist/http/handlers/tools/delegate.js.map +1 -1
package/dist/http/handlers/tools/execute-plan.d.ts.map +1 -1
package/dist/http/handlers/tools/execute-plan.js +2 -0
package/dist/http/handlers/tools/execute-plan.js.map +1 -1
package/dist/http/handlers/tools/investigate.d.ts +4 -0
package/dist/http/handlers/tools/investigate.d.ts.map +1 -0
package/dist/http/handlers/tools/investigate.js +81 -0
package/dist/http/handlers/tools/investigate.js.map +1 -0
package/dist/http/handlers/tools/review.d.ts.map +1 -1
package/dist/http/handlers/tools/review.js +2 -0
package/dist/http/handlers/tools/review.js.map +1 -1
package/dist/http/handlers/tools/verify.d.ts.map +1 -1
package/dist/http/handlers/tools/verify.js +2 -0
package/dist/http/handlers/tools/verify.js.map +1 -1
package/dist/http/request-observability.d.ts +9 -0
package/dist/http/request-observability.d.ts.map +1 -0
package/dist/http/request-observability.js +36 -0
package/dist/http/request-observability.js.map +1 -0
package/dist/http/server.d.ts.map +1 -1
package/dist/http/server.js +52 -11
package/dist/http/server.js.map +1 -1
package/dist/install/discover.d.ts +1 -1
package/dist/install/discover.d.ts.map +1 -1
package/dist/install/discover.js +1 -0
package/dist/install/discover.js.map +1 -1
package/dist/openapi.d.ts.map +1 -1
package/dist/openapi.js +6 -0
package/dist/openapi.js.map +1 -1
package/dist/skills/_shared/verify-and-review.md +12 -0
package/dist/skills/mma-audit/SKILL.md +45 -18
package/dist/skills/mma-clarifications/SKILL.md +73 -29
package/dist/skills/mma-context-blocks/SKILL.md +56 -24
package/dist/skills/mma-debug/SKILL.md +54 -22
package/dist/skills/mma-delegate/SKILL.md +59 -21
package/dist/skills/mma-execute-plan/SKILL.md +56 -24
package/dist/skills/mma-investigate/SKILL.md +137 -0
package/dist/skills/mma-retry/SKILL.md +65 -22
package/dist/skills/mma-review/SKILL.md +49 -20
package/dist/skills/mma-verify/SKILL.md +49 -18
package/dist/skills/multi-model-agent/SKILL.md +84 -46
package/package.json +2 -2

package/dist/skills/mma-clarifications/SKILL.md CHANGED Viewed

@@ -1,33 +1,65 @@
 ---
 name: mma-clarifications
 description: >-
-  Confirm or correct mmagent's proposed interpretation when a batch is awaiting
-  clarification before it can proceed. Paired skill to every mma-* task
-  dispatcher.
+  Use when a previous mma-* batch's terminal envelope has
+  `proposedInterpretation` as a string (not the `not_applicable` sentinel) — the
+  service paused waiting for you to confirm or correct its read of the task
 when_to_use: >-
-  A previous mma-delegate / mma-audit / mma-review / mma-execute-plan / etc.
-  terminal envelope has `proposedInterpretation` as a string (not a
-  NotApplicable sentinel). Read the proposal and call this skill to accept or
-  correct it. The batch resumes after the POST returns.
-version: 3.2.0
+  A previous mma-delegate / mma-audit / mma-review / mma-execute-plan /
+  mma-debug / mma-investigate terminal envelope has `proposedInterpretation` as
+  a string. Read the proposal, decide whether to accept or correct it, then call
+  this skill. The batch resumes immediately after the POST returns.
+version: 3.4.0
 ---
-## mma-clarifications
+# mma-clarifications
-When a batch pauses with `state: 'awaiting_clarification'`, the service has
-proposed an interpretation of the task and is waiting for your decision.
-Read the proposal, then call `POST /clarifications/confirm` to either accept
-or correct it. The batch resumes immediately after confirmation.
+## Overview
-### Endpoint
+When a batch pauses with `state: 'awaiting_clarification'`, the service has proposed an interpretation of an ambiguous task and is waiting for your decision. Read the proposal, then `POST /clarifications/confirm` with either the proposal verbatim (accept) or a corrected version (override). The batch resumes immediately.
+**Core principle:** Clarification is a quality gate, not an error. Ambiguous tasks would silently produce the wrong work — the pause forces a deliberate choice.
+## When to Use
+```dot
+digraph when_to_use {
+    "Polling a batch?" [shape=diamond];
+    "state == awaiting_clarification?" [shape=diamond];
+    "proposedInterpretation is a string?" [shape=diamond];
+    "Read proposal" [shape=box];
+    "Accept or correct" [shape=diamond];
+    "POST proposal verbatim" [shape=box];
+    "POST corrected text" [shape=box];
+    "Polling a batch?" -> "state == awaiting_clarification?";
+    "state == awaiting_clarification?" -> "proposedInterpretation is a string?" [label="yes"];
+    "state == awaiting_clarification?" -> "Continue polling" [label="no"];
+    "proposedInterpretation is a string?" -> "Read proposal" [label="yes"];
+    "Read proposal" -> "Accept or correct";
+    "Accept or correct" -> "POST proposal verbatim" [label="proposal is right"];
+    "Accept or correct" -> "POST corrected text" [label="proposal is wrong"];
+}
+```
+**Use when:**
+- Polling a batch and the terminal envelope has `proposedInterpretation` as a string
+- The mma-* skill that dispatched explicitly references this skill in its "if awaiting_clarification" line
+**Don't use when:**
+- `proposedInterpretation` is `{ kind: 'not_applicable', ... }` → batch isn't waiting; just read `results`
+- The batch failed (`error` is a real object) → don't confirm; debug or re-dispatch
+- You don't yet have a `batchId` → this skill resumes existing batches, not new ones
+## Endpoint
 `POST /clarifications/confirm`
-Auth required. Not cwd-gated (operates on a `batchId`).
+Auth required. NOT cwd-gated — operates on a `batchId`.
 @include _shared/auth.md
-### Request body
+## Request body
 ```json
 {
@@ -39,38 +71,50 @@ Auth required. Not cwd-gated (operates on a `batchId`).
 | Field | Type | Required | Notes |
 |---|---|---|---|
 | `batchId` | string (UUID) | yes | Batch in `awaiting_clarification` state |
-| `interpretation` | string | yes | Accept proposal verbatim or provide a corrected version |
+| `interpretation` | string | yes | Accept proposal verbatim, OR provide corrected text the worker should follow instead |
-### Response (200)
+## Response (200)
 ```json
 { "batchId": "...", "state": "pending" }
 ```
-`state` is usually `pending` (batch resumes). It may be `complete` if the
-executor was already waiting and finishes immediately.
+`state` is usually `pending` (batch resumes). May be `complete` if the executor was already waiting and finishes immediately.
-### Full flow
+## Full flow
 ```bash
-# 1. Poll until awaiting_clarification
-STATE=$(curl -f --show-error -s -H "Authorization: Bearer $TOKEN" \
-  "http://localhost:$PORT/batch/$BATCH_ID" | jq -r '.state')
+# 1. Poll until terminal
+RESP=$(curl -f --show-error -s -H "Authorization: Bearer $TOKEN" \
+  "http://localhost:$PORT/batch/$BATCH_ID")
-# 2. Read the proposal
-PROPOSAL=$(curl -f --show-error -s -H "Authorization: Bearer $TOKEN" \
-  "http://localhost:$PORT/batch/$BATCH_ID" | jq -r '.proposedInterpretation')
+# 2. Check for a string proposal (not the not_applicable sentinel)
+PROPOSAL=$(echo "$RESP" | jq -r 'select(.proposedInterpretation | type == "string") | .proposedInterpretation')
-# 3. Confirm (accept proposal or supply corrected text)
+# 3. Confirm — accept proposal verbatim, or supply corrected text
 curl -f --show-error -s -X POST \
   -H "Authorization: Bearer $TOKEN" \
   -H "Content-Type: application/json" \
   -d "{\"batchId\":\"$BATCH_ID\",\"interpretation\":\"$PROPOSAL\"}" \
   "http://localhost:$PORT/clarifications/confirm"
-# 4. Resume polling
+# 4. Resume polling for terminal
 ```
 @include _shared/polling.md
+## Common pitfalls
+❌ **Confirming a wrong proposal verbatim because "the service knows best"**
+The service is GUESSING from limited context. If the proposal would do the wrong thing, supply corrected `interpretation` text. **Why:** post-confirmation work is hard to undo.
+❌ **Treating the pause as an error**
+`awaiting_clarification` is a SUCCESS path — it caught ambiguity before producing wrong work. Read, decide, confirm.
+❌ **Forgetting the `batchId` is the original, not a new one**
+This endpoint mutates the existing batch — it does not create a new one. **Fix:** poll the SAME `batchId` after confirming.
+❌ **Polling without checking `proposedInterpretation`'s shape**
+The field is either a `string` (paused) or `{ kind: 'not_applicable' }` (terminal). **Fix:** check the JSON type before treating it as text.
 @include _shared/error-handling.md

package/dist/skills/mma-context-blocks/SKILL.md CHANGED Viewed

@@ -1,23 +1,40 @@
 ---
 name: mma-context-blocks
 description: >-
-  Register large reused documents (spec, plan, codebase summary) as a context
-  block the mmagent service caches, then reference it by ID across multiple
-  mma-* calls. Avoids re-uploading the same content on every task.
+  Use when a document larger than ~2 KB will be referenced by 2+ subsequent
+  mma-* calls — register once, pass the returned ID to each call instead of
+  re-uploading the same content
 when_to_use: >-
-  A document larger than ~2 KB will be referenced by two or more mma-* calls in
-  a row. Register once here, then pass the returned ID via the contextBlockIds
-  field on mma-delegate / mma-execute-plan / mma-audit / mma-review / mma-verify
-  / mma-debug. Cheaper and faster than inlining the same content in every
-  request body.
-version: 3.2.0
+  A document (spec, plan, codebase summary, prior round's findings, error log)
+  larger than ~2 KB will be referenced by two or more mma-* calls in a row.
+  Register once here, then pass the ID via `contextBlockIds` on mma-delegate /
+  mma-execute-plan / mma-audit / mma-review / mma-verify / mma-debug /
+  mma-investigate. Cheaper and faster than inlining the same content N times.
+version: 3.4.0
 ---
-## mma-context-blocks
+# mma-context-blocks
-Store large documents once; reference them by ID in subsequent `mma-*` calls
-via `contextBlockIds`. The service prepends the block content to each task
-prompt that references it.
+## Overview
+Store large documents once; reference them by ID in subsequent `mma-*` calls via `contextBlockIds`. The service prepends the block content to each task prompt that references the ID — content is transmitted ONCE to the daemon, then reused server-side.
+**Core principle:** Without context blocks, the same document is sent N times for N tasks. Blocks transmit once. The savings compound on shared specs, prior-round findings, and codebase summaries.
+## When to Use
+**Use when:**
+- A doc >2 KB will be referenced by ≥2 mma-* calls
+- You're running iterative audit/review rounds (round 2 references round 1's findings)
+- A spec or design doc is the shared input across N parallel tasks
+- A long error log is the context for debug + delegate calls
+**Don't use when:**
+- The doc is <2 KB and used once → just inline it (registration overhead exceeds savings)
+- The doc changes between calls → context blocks are immutable; register a new one
+- Single task that doesn't reference any large shared content → no benefit
+## Endpoints
 ### Register a context block
@@ -37,7 +54,7 @@ prompt that references it.
 | Field | Type | Required | Notes |
 |---|---|---|---|
 | `content` | string | yes | Document content (min 1 char) |
-| `ttlMs` | number | no | Time-to-live in ms; omit for session-scoped |
+| `ttlMs` | number | no | Time-to-live in ms; omit for session-scoped (default 1h) |
 #### Response (201)
@@ -45,34 +62,49 @@ prompt that references it.
 { "id": "cb_abc123" }
 ```
-Use this `id` as a `contextBlockIds` entry in `mma-delegate`, `mma-audit`,
-`mma-review`, `mma-verify`, `mma-debug`, or `mma-execute-plan`.
+Use this `id` as a `contextBlockIds` entry in any `mma-*` skill that supports it.
 ### Delete a context block
 `DELETE /context-blocks/:id?cwd=<abs-path>`
-Returns `200 { ok: true }` on success.
-Returns `409 pinned` if the block is held by one or more active batches —
-wait for those batches to complete before deleting.
+Returns `200 { ok: true }` on success. Returns `409 pinned` if the block is held by one or more active batches — wait for those batches to complete before deleting.
-### Example
+## Full example
 ```bash
-# Register spec document
+# Register spec document once
 ID=$(curl -f --show-error -s -X POST \
   -H "Authorization: Bearer $TOKEN" \
   -H "Content-Type: application/json" \
   -d "{\"content\":$(jq -Rs . < /project/docs/spec.md)}" \
   "http://localhost:$PORT/context-blocks?cwd=/project" | jq -r '.id')
-# Use in a delegate call
+# Reference from N delegate tasks
 curl -f --show-error -s -X POST \
   -H "Authorization: Bearer $TOKEN" \
   -H "Content-Type: application/json" \
-  -d "{\"tasks\":[{\"prompt\":\"Implement per spec\",\"contextBlockIds\":[\"$ID\"]}]}" \
+  -d "{\"tasks\":[
+    {\"prompt\":\"Implement section 3 per spec\",\"contextBlockIds\":[\"$ID\"]},
+    {\"prompt\":\"Implement section 4 per spec\",\"contextBlockIds\":[\"$ID\"]}
+  ]}" \
   "http://localhost:$PORT/delegate?cwd=/project"
 ```
+## Common pitfalls
+❌ **Inlining the same 50KB spec into every task prompt**
+> tasks: [{prompt: "Implement section 3:\n[50KB spec]"}, {prompt: "Implement section 4:\n[50KB spec]"}]
+N×50KB transmissions; main context burns through tokens. **Fix:** register the spec once, pass `contextBlockIds: ["cb_xxx"]` to each task.
+❌ **Forgetting to delete short-TTL blocks**
+Blocks count against the project's context-block quota. **Fix:** explicitly `DELETE` after the dependent batches finish — or set a short `ttlMs` so they self-evict.
+❌ **Trying to update a block's content**
+Blocks are immutable. **Fix:** register a new block with the new content; switch the `contextBlockIds` to the new ID.
+❌ **Deleting a block while a batch still references it**
+Returns `409 pinned`. **Fix:** poll the dependent batches to terminal first, then delete.
 @include _shared/error-handling.md

package/dist/skills/mma-debug/SKILL.md CHANGED Viewed

@@ -1,31 +1,46 @@
 ---
 name: mma-debug
 description: >-
-  Debug a failure using a structured hypothesis via the local mmagent HTTP
-  service. All provided files are investigated together in a single task on a
-  worker.
+  Use when a test fails, a build breaks, or behavior is unexpected AND narrowing
+  the root cause requires reading files, reproducing the failure, or tracing
+  across multiple modules — the worker investigates so the main agent stays on
+  the hypothesis
 when_to_use: >-
-  A test fails, a build breaks, or behavior is unexpected AND you need to read
-  files, reproduce the failure, or narrow root cause OR a methodology skill
+  A failure has surfaced (test/build/runtime) AND you need investigation work —
+  read files, reproduce, trace — OR a methodology skill
   (superpowers:systematic-debugging) points at the investigation step. Delegate
-  the read/reproduce/trace work to a mmagent worker so your main context stays
-  focused on the hypothesis and the fix.
-version: 3.2.0
+  the read/reproduce/trace; the main agent stays on the hypothesis and the fix.
+version: 3.4.0
 ---
-## mma-debug
+# mma-debug
-Submit a problem, context, and hypothesis to a sub-agent for focused
-debugging. Unlike other tools, all `filePaths` are investigated together
-in a single task (not parallelised per file).
+## Overview
-### Endpoint
+Submit a problem, context, and hypothesis to a worker for focused debugging. Unlike `mma-audit` and `mma-review`, all `filePaths` are investigated TOGETHER in a single task (not parallelized per file) — debugging needs cross-file reasoning.
+**Core principle:** The hypothesis is judgment (your job). Reading files and reproducing the failure is labor (the worker's job). Pass the hypothesis as input; receive structured findings.
+## When to Use
+**Use when:**
+- A test fails / build breaks / runtime behavior is unexpected
+- The root cause likely spans 2+ files
+- You have a hypothesis to test (or want the worker to suggest one)
+- A methodology skill (`superpowers:systematic-debugging`) routed here
+**Don't use when:**
+- The error message points at one file you can read in 30 seconds → just `Read`
+- You don't know what's broken yet → use `mma-investigate` first to map the area
+- You already know the fix → skip debug, dispatch `mma-delegate` with the fix
+## Endpoint
 `POST /debug?cwd=<abs-path>`
 @include _shared/auth.md
-### Request body
+## Request body
 ```json
 {
@@ -42,13 +57,13 @@ in a single task (not parallelised per file).
 | Field | Type | Required | Notes |
 |---|---|---|---|
-| `problem` | string | yes | What is broken |
-| `context` | string | no | Background information |
-| `hypothesis` | string | no | Initial theory to test |
-| `filePaths` | string[] | no | All files investigated together |
-| `contextBlockIds` | string[] | no | IDs from `mma-context-blocks` |
+| `problem` | string | yes | What is broken (one sentence; concrete symptom) |
+| `context` | string | no | Background — what changed recently, what works, what doesn't |
+| `hypothesis` | string | no | Your initial theory; worker tests it first, then explores |
+| `filePaths` | string[] | no | All files investigated together (cross-file reasoning) |
+| `contextBlockIds` | string[] | no | IDs from `mma-context-blocks` (e.g. error logs, traces) |
-### Full example
+## Full example
 ```bash
 BATCH=$(curl -f --show-error -s -X POST \
@@ -59,10 +74,27 @@ BATCH=$(curl -f --show-error -s -X POST \
 BATCH_ID=$(echo "$BATCH" | jq -r '.batchId')
 ```
-Then poll until complete:
 @include _shared/polling.md
 @include _shared/response-shape.md
+## Common pitfalls
+❌ **Vague `problem`**
+> "The login is broken"
+Worker has no symptom to chase. **Fix:** specific reproducer — `"POST /login with body {user:'a@b.c', pass:'café'} returns 500 with 'invalid character' in stderr"`.
+❌ **No `hypothesis`**
+The worker explores blindly, often investigates the wrong area first. **Fix:** even a weak hypothesis ("might be encoding-related") narrows the search space.
+❌ **Splitting one bug across multiple `mma-debug` calls**
+Debug intentionally bundles `filePaths` for cross-file reasoning. Splitting defeats this. **Fix:** one call with all suspect files; if you really have N independent failures, use `mma-delegate` with N tasks.
+❌ **Treating `mma-debug` as the fix step**
+Debug investigates and proposes; it doesn't necessarily write the fix. If the worker identifies a fix, dispatch `mma-delegate` to implement it (or write it inline if you understand it).
+❌ **Skipping when an error message looks self-explanatory**
+Often the obvious cause isn't the real one. A 30-second debug pass costs less than a wrong fix that breaks something else.
 @include _shared/error-handling.md

package/dist/skills/mma-delegate/SKILL.md CHANGED Viewed

@@ -1,32 +1,47 @@
 ---
 name: mma-delegate
 description: >-
-  Fan out ad-hoc implementation or research tasks to sub-agents in parallel via
-  the local mmagent HTTP service. Tasks run on cheap workers that don't consume
-  your main-model context window.
+  Use when you have one or more ad-hoc implementation or research tasks WITHOUT
+  a plan file on disk and you want them to run on cheap workers in parallel
+  instead of consuming main-context tokens
 when_to_use: >-
-  You have one or more ad-hoc implementation or research tasks WITHOUT a plan
-  file on disk AND mmagent is running. Prefer this over inline Agent dispatches
-  or superpowers:dispatching-parallel-agents — delegated workers are cheaper,
-  parallel-safe, and keep main context free. If a plan file exists, use
-  mma-execute-plan; if the task is an audit/review/verify/debug, prefer the
-  matching mma-* skill instead.
-version: 3.2.0
+  You have ad-hoc implementation or research tasks (no plan file on disk) AND
+  mmagent is running. Prefer this over inline Agent dispatches or
+  superpowers:dispatching-parallel-agents — workers are cheaper, parallel-safe,
+  and keep main context free. If a plan file exists → use mma-execute-plan. If
+  the task is audit / review / verify / debug / investigate → use the matching
+  specialized skill.
+version: 3.4.0
 ---
-## mma-delegate
+# mma-delegate
-Dispatch one or more tasks to sub-agents concurrently. Each task is an
-independent instruction with optional file scope, acceptance criteria, and
-context block references.
+## Overview
-### Endpoint
+Dispatch one or more ad-hoc tasks to sub-agents concurrently. Each task is an independent instruction with optional file scope, acceptance criteria, and context blocks.
+**Core principle:** Workers run on cheap providers; the main agent consumes only the structured per-task report. Parallelize freely as long as tasks don't write the same files.
+## When to Use
+**Use when:**
+- 2+ unrelated implementation tasks (parallel speedup)
+- A research task you'd otherwise spend tokens reading and grepping
+- A focused refactor that fits in one prompt
+- The task does NOT match audit / review / verify / debug / investigate (those have specialized skills)
+**Don't use when:**
+- A plan file exists on disk → `mma-execute-plan` (descriptors auto-match plan headings)
+- Two tasks write the same file → dispatch sequentially, not in one batch (workers race)
+- The work needs to read across many files for synthesis only → `mma-investigate` is cheaper (read-only)
+## Endpoint
 `POST /delegate?cwd=<abs-path>`
 @include _shared/auth.md
-### Request body
+## Request body
 ```json
 {
@@ -46,12 +61,16 @@ context block references.
 |---|---|---|---|
 | `tasks` | array | yes | At least one task |
 | `tasks[].prompt` | string | yes | The task instruction |
-| `tasks[].agentType` | `"standard"` / `"complex"` | no | Worker tier. Default `"standard"` (cheap). Pick `"complex"` when the task is ambiguous, touches many files, is security-sensitive, or a prior standard run came back with `filesWritten: 0` / ran out of turns. Complex workers cost more but finish bigger jobs. |
-| `tasks[].filePaths` | string[] | no | Files the sub-agent focuses on |
+| `tasks[].agentType` | `"standard"` / `"complex"` | no | Worker tier. Default `"standard"`. Pick `"complex"` when the task is ambiguous, security-sensitive, touches many files, or a prior standard run came back with `filesWritten: 0` / hit `incompleteReason: "turn_cap"`. |
+| `tasks[].filePaths` | string[] | no | Files the worker focuses on |
 | `tasks[].done` | string | no | Acceptance criteria |
 | `tasks[].contextBlockIds` | string[] | no | IDs from `mma-context-blocks` |
+| `tasks[].verifyCommand` | string[] | no | See verify-and-review snippet below |
+| `tasks[].reviewPolicy` | `"full"` / `"spec_only"` / `"diff_only"` / `"off"` | no | See verify-and-review snippet below. Default `"full"` |
+@include _shared/verify-and-review.md
-### Full example
+## Full example
 ```bash
 BATCH=$(curl -f --show-error -s -X POST \
@@ -62,10 +81,29 @@ BATCH=$(curl -f --show-error -s -X POST \
 BATCH_ID=$(echo "$BATCH" | jq -r '.batchId')
 ```
-Then poll until complete:
 @include _shared/polling.md
 @include _shared/response-shape.md
+## Common pitfalls
+❌ **Two tasks writing the same file in one batch**
+> tasks: [{prompt:"add JWT to login.ts"}, {prompt:"add logging to login.ts"}]
+Workers run concurrently and race on the file. **Fix:** dispatch sequentially, or merge into one prompt.
+❌ **Vague `prompt`, no `done` criterion**
+> "improve the auth module"
+Worker has no completion signal — likely returns `done_with_concerns`. **Fix:** specific verb + acceptance: `"Add input validation to login.ts so all string fields reject empty/whitespace; tests pass"`.
+❌ **Defaulting to `agentType: "complex"` for everything**
+Standard tier is 5–10× cheaper and finishes most edits. Escalate only when standard returns `filesWritten: 0` or `incompleteReason: "turn_cap"`.
+❌ **Inlining a 50KB doc into every prompt**
+N tasks × 50KB = N transmissions. **Fix:** register the doc once via `mma-context-blocks`, pass the `contextBlockIds` to each task.
+❌ **Reading the worker's diff inline before review**
+The reviewer sees the full diff with the original prompt as context. Reading inline burns main-context tokens for no quality gain.
 @include _shared/error-handling.md

package/dist/skills/mma-execute-plan/SKILL.md CHANGED Viewed

@@ -1,32 +1,45 @@
 ---
 name: mma-execute-plan
 description: >-
-  Execute tasks from a plan or spec file on disk via the local mmagent HTTP
-  service. Delegates to cheap sub-agents that don't consume your main-model
-  context window. Task descriptors match plan headings; tasks run in parallel.
+  Use when a plan or spec file exists on disk (any markdown with numbered task
+  headings — docs/superpowers/plans/*.md, a TODO list, a spec doc) and you need
+  to implement one or more tasks from it on cheap workers in parallel
 when_to_use: >-
-  A plan file exists on disk (any markdown with numbered task headings —
-  docs/superpowers/plans/*.md, a TODO list, a spec doc) AND you need to
-  implement one or more tasks from it. Prefer this over inline Agent dispatches
-  or superpowers:subagent-driven-development / superpowers:executing-plans when
-  mmagent is running — delegated workers are cheaper and don't pollute main
-  context. Task descriptors must match the plan headings verbatim.
-version: 3.2.0
+  A plan file exists on disk AND you need to implement one or more tasks from it
+  AND mmagent is running. Prefer this over inline Agent dispatches or
+  superpowers:subagent-driven-development / superpowers:executing-plans —
+  workers are cheaper and don't pollute main context. Task descriptors must
+  match plan headings verbatim.
+version: 3.4.0
 ---
-## mma-execute-plan
+# mma-execute-plan
-Dispatch named tasks from a plan file to sub-agents. Task descriptors must
-match plan headings (e.g. `"1. Setup database schema"`). All tasks run in
-parallel and duplicate descriptors are rejected.
+## Overview
-### Endpoint
+Dispatch named tasks from a plan file to workers. Each `tasks` string must match a heading in the plan verbatim (e.g. `"1. Setup database schema"`). All tasks run in parallel; duplicate descriptors are rejected.
+**Core principle:** The plan IS the prompt. Workers re-read the plan file in-process and find their named task — you don't need to inline the task body.
+## When to Use
+**Use when:**
+- A plan/spec markdown exists with numbered task headings
+- You want to dispatch a subset (or all) of those tasks
+- Tasks are mostly independent (parallel-safe)
+**Don't use when:**
+- No plan file → `mma-delegate` (pass the prompt directly)
+- Tasks form a hard linear sequence (later tasks depend on earlier ones' outputs) → dispatch in order, one batch each
+- The "plan" is in conversation only, not on disk → write it to disk first, or use `mma-delegate`
+## Endpoint
 `POST /execute-plan?cwd=<abs-path>`
 @include _shared/auth.md
-### Request body
+## Request body
 ```json
 {
@@ -46,16 +59,19 @@ parallel and duplicate descriptors are rejected.
 | Field | Type | Required | Notes |
 |---|---|---|---|
-| `tasks` | string[] | yes | At least one; must be unique; match plan headings |
+| `tasks` | string[] \| `{task, reviewPolicy}[]` | yes | At least one; must be unique; each string matches a plan heading |
 | `context` | string | no | Short additional context not in the plan |
-| `filePaths` | string[] | no | Plan file + relevant source files |
+| `filePaths` | string[] | no | Plan file + relevant source files. Required: the plan file itself. |
 | `contextBlockIds` | string[] | no | IDs from `mma-context-blocks` |
-| `agentType` | `"standard"` / `"complex"` | no | Worker tier. Default `"standard"` (cheap). Switch to `"complex"` for tasks too large for a standard-tier model to finish in the turn budget (reads many files, produces many edits, or the last run came back with `filesWritten: 0`). |
+| `agentType` | `"standard"` / `"complex"` | no | Default `"standard"`. Use `"complex"` for tasks too large for the standard tier — reads many files, produces many edits, or the last run came back with `filesWritten: 0`. |
+| `verifyCommand` | string[] | no | See verify-and-review snippet below |
+| `tasks[].reviewPolicy` | `"full"` / `"spec_only"` / `"diff_only"` / `"off"` | no | See verify-and-review snippet below. Default `"full"`. |
+@include _shared/verify-and-review.md
-If the batch reaches `awaiting_clarification`, use `mma-clarifications`
-to confirm or correct the proposed interpretation.
+If the batch reaches `awaiting_clarification`, use `mma-clarifications` to confirm or correct the proposed interpretation.
-### Full example
+## Full example
 ```bash
 BATCH=$(curl -f --show-error -s -X POST \
@@ -66,10 +82,26 @@ BATCH=$(curl -f --show-error -s -X POST \
 BATCH_ID=$(echo "$BATCH" | jq -r '.batchId')
 ```
-Then poll until complete:
 @include _shared/polling.md
 @include _shared/response-shape.md
+## Common pitfalls
+❌ **Task descriptor doesn't match plan heading verbatim**
+> tasks: ["Migrate db schema"]    ← plan heading is "3. Migrate database schema"
+Worker rejects with "no matching task" or matches the wrong one. **Fix:** copy the heading from the plan, including the leading number.
+❌ **Forgetting the plan file in `filePaths`**
+> filePaths: ["/project/src/db/schema.sql"]    ← no plan file
+Worker can't read the task body. **Fix:** always include the plan path: `filePaths: ["/project/docs/plan.md", "/project/src/db/schema.sql"]`.
+❌ **Dispatching dependent tasks in one batch**
+Task 5 depends on Task 4's output → workers race; Task 5 might run before Task 4 finishes. **Fix:** dispatch Task 4, wait for terminal, then dispatch Task 5.
+❌ **Skipping `verifyCommand` when one exists**
+A passing local check is the cheapest signal you're going to get. **Fix:** wire `["npm test"]` or the focused package test.
 @include _shared/error-handling.md