@zhixuan92/multi-model-agent 4.9.1 → 5.0.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +4 -3
- package/bin/mmagent.mjs +47 -0
- package/package.json +24 -43
- package/postinstall.mjs +8 -0
- package/dist/cli/index.d.ts +0 -62
- package/dist/cli/index.d.ts.map +0 -1
- package/dist/cli/index.js +0 -345
- package/dist/cli/index.js.map +0 -1
- package/dist/cli/info.d.ts +0 -22
- package/dist/cli/info.d.ts.map +0 -1
- package/dist/cli/info.js +0 -100
- package/dist/cli/info.js.map +0 -1
- package/dist/cli/logs.d.ts +0 -15
- package/dist/cli/logs.d.ts.map +0 -1
- package/dist/cli/logs.js +0 -102
- package/dist/cli/logs.js.map +0 -1
- package/dist/cli/print-token.d.ts +0 -18
- package/dist/cli/print-token.d.ts.map +0 -1
- package/dist/cli/print-token.js +0 -60
- package/dist/cli/print-token.js.map +0 -1
- package/dist/cli/serve.d.ts +0 -28
- package/dist/cli/serve.d.ts.map +0 -1
- package/dist/cli/serve.js +0 -405
- package/dist/cli/serve.js.map +0 -1
- package/dist/cli/status.d.ts +0 -49
- package/dist/cli/status.d.ts.map +0 -1
- package/dist/cli/status.js +0 -155
- package/dist/cli/status.js.map +0 -1
- package/dist/cli/sync-skills.d.ts +0 -58
- package/dist/cli/sync-skills.d.ts.map +0 -1
- package/dist/cli/sync-skills.js +0 -266
- package/dist/cli/sync-skills.js.map +0 -1
- package/dist/cli/telemetry.d.ts +0 -10
- package/dist/cli/telemetry.d.ts.map +0 -1
- package/dist/cli/telemetry.js +0 -161
- package/dist/cli/telemetry.js.map +0 -1
- package/dist/cli/toggle.d.ts +0 -26
- package/dist/cli/toggle.d.ts.map +0 -1
- package/dist/cli/toggle.js +0 -185
- package/dist/cli/toggle.js.map +0 -1
- package/dist/http/async-dispatch.d.ts +0 -44
- package/dist/http/async-dispatch.d.ts.map +0 -1
- package/dist/http/async-dispatch.js +0 -175
- package/dist/http/async-dispatch.js.map +0 -1
- package/dist/http/auth.d.ts +0 -20
- package/dist/http/auth.d.ts.map +0 -1
- package/dist/http/auth.js +0 -56
- package/dist/http/auth.js.map +0 -1
- package/dist/http/canonicalize-file-paths.d.ts +0 -8
- package/dist/http/canonicalize-file-paths.d.ts.map +0 -1
- package/dist/http/canonicalize-file-paths.js +0 -43
- package/dist/http/canonicalize-file-paths.js.map +0 -1
- package/dist/http/cwd-validator.d.ts +0 -11
- package/dist/http/cwd-validator.d.ts.map +0 -1
- package/dist/http/cwd-validator.js +0 -130
- package/dist/http/cwd-validator.js.map +0 -1
- package/dist/http/errors.d.ts +0 -4
- package/dist/http/errors.d.ts.map +0 -1
- package/dist/http/errors.js +0 -9
- package/dist/http/errors.js.map +0 -1
- package/dist/http/execution-context.d.ts +0 -18
- package/dist/http/execution-context.d.ts.map +0 -1
- package/dist/http/execution-context.js +0 -61
- package/dist/http/execution-context.js.map +0 -1
- package/dist/http/handler-deps.d.ts +0 -19
- package/dist/http/handler-deps.d.ts.map +0 -1
- package/dist/http/handler-deps.js +0 -2
- package/dist/http/handler-deps.js.map +0 -1
- package/dist/http/handlers/control/batch-slice.d.ts +0 -4
- package/dist/http/handlers/control/batch-slice.d.ts.map +0 -1
- package/dist/http/handlers/control/batch-slice.js +0 -40
- package/dist/http/handlers/control/batch-slice.js.map +0 -1
- package/dist/http/handlers/control/batch.d.ts +0 -23
- package/dist/http/handlers/control/batch.d.ts.map +0 -1
- package/dist/http/handlers/control/batch.js +0 -332
- package/dist/http/handlers/control/batch.js.map +0 -1
- package/dist/http/handlers/control/context-blocks.d.ts +0 -22
- package/dist/http/handlers/control/context-blocks.d.ts.map +0 -1
- package/dist/http/handlers/control/context-blocks.js +0 -111
- package/dist/http/handlers/control/context-blocks.js.map +0 -1
- package/dist/http/handlers/introspection/health.d.ts +0 -20
- package/dist/http/handlers/introspection/health.d.ts.map +0 -1
- package/dist/http/handlers/introspection/health.js +0 -18
- package/dist/http/handlers/introspection/health.js.map +0 -1
- package/dist/http/handlers/introspection/status.d.ts +0 -26
- package/dist/http/handlers/introspection/status.d.ts.map +0 -1
- package/dist/http/handlers/introspection/status.js +0 -136
- package/dist/http/handlers/introspection/status.js.map +0 -1
- package/dist/http/handlers/tools/audit.d.ts +0 -4
- package/dist/http/handlers/tools/audit.d.ts.map +0 -1
- package/dist/http/handlers/tools/audit.js +0 -43
- package/dist/http/handlers/tools/audit.js.map +0 -1
- package/dist/http/handlers/tools/debug.d.ts +0 -4
- package/dist/http/handlers/tools/debug.d.ts.map +0 -1
- package/dist/http/handlers/tools/debug.js +0 -43
- package/dist/http/handlers/tools/debug.js.map +0 -1
- package/dist/http/handlers/tools/delegate.d.ts +0 -4
- package/dist/http/handlers/tools/delegate.d.ts.map +0 -1
- package/dist/http/handlers/tools/delegate.js +0 -43
- package/dist/http/handlers/tools/delegate.js.map +0 -1
- package/dist/http/handlers/tools/execute-plan.d.ts +0 -4
- package/dist/http/handlers/tools/execute-plan.d.ts.map +0 -1
- package/dist/http/handlers/tools/execute-plan.js +0 -45
- package/dist/http/handlers/tools/execute-plan.js.map +0 -1
- package/dist/http/handlers/tools/investigate.d.ts +0 -4
- package/dist/http/handlers/tools/investigate.d.ts.map +0 -1
- package/dist/http/handlers/tools/investigate.js +0 -64
- package/dist/http/handlers/tools/investigate.js.map +0 -1
- package/dist/http/handlers/tools/journal-recall.d.ts +0 -4
- package/dist/http/handlers/tools/journal-recall.d.ts.map +0 -1
- package/dist/http/handlers/tools/journal-recall.js +0 -40
- package/dist/http/handlers/tools/journal-recall.js.map +0 -1
- package/dist/http/handlers/tools/journal-record.d.ts +0 -4
- package/dist/http/handlers/tools/journal-record.d.ts.map +0 -1
- package/dist/http/handlers/tools/journal-record.js +0 -35
- package/dist/http/handlers/tools/journal-record.js.map +0 -1
- package/dist/http/handlers/tools/research.d.ts +0 -4
- package/dist/http/handlers/tools/research.d.ts.map +0 -1
- package/dist/http/handlers/tools/research.js +0 -64
- package/dist/http/handlers/tools/research.js.map +0 -1
- package/dist/http/handlers/tools/retry.d.ts +0 -4
- package/dist/http/handlers/tools/retry.d.ts.map +0 -1
- package/dist/http/handlers/tools/retry.js +0 -73
- package/dist/http/handlers/tools/retry.js.map +0 -1
- package/dist/http/handlers/tools/review.d.ts +0 -4
- package/dist/http/handlers/tools/review.d.ts.map +0 -1
- package/dist/http/handlers/tools/review.js +0 -43
- package/dist/http/handlers/tools/review.js.map +0 -1
- package/dist/http/middleware/body-reader.d.ts +0 -16
- package/dist/http/middleware/body-reader.d.ts.map +0 -1
- package/dist/http/middleware/body-reader.js +0 -44
- package/dist/http/middleware/body-reader.js.map +0 -1
- package/dist/http/middleware/caller-identity.d.ts +0 -16
- package/dist/http/middleware/caller-identity.d.ts.map +0 -1
- package/dist/http/middleware/caller-identity.js +0 -16
- package/dist/http/middleware/caller-identity.js.map +0 -1
- package/dist/http/middleware/decompress.d.ts +0 -14
- package/dist/http/middleware/decompress.d.ts.map +0 -1
- package/dist/http/middleware/decompress.js +0 -51
- package/dist/http/middleware/decompress.js.map +0 -1
- package/dist/http/project-registry.d.ts +0 -54
- package/dist/http/project-registry.d.ts.map +0 -1
- package/dist/http/project-registry.js +0 -130
- package/dist/http/project-registry.js.map +0 -1
- package/dist/http/request-observability.d.ts +0 -8
- package/dist/http/request-observability.d.ts.map +0 -1
- package/dist/http/request-observability.js +0 -20
- package/dist/http/request-observability.js.map +0 -1
- package/dist/http/request-pipeline.d.ts +0 -16
- package/dist/http/request-pipeline.d.ts.map +0 -1
- package/dist/http/request-pipeline.js +0 -144
- package/dist/http/request-pipeline.js.map +0 -1
- package/dist/http/server.d.ts +0 -17
- package/dist/http/server.d.ts.map +0 -1
- package/dist/http/server.js +0 -300
- package/dist/http/server.js.map +0 -1
- package/dist/http/types.d.ts +0 -20
- package/dist/http/types.d.ts.map +0 -1
- package/dist/http/types.js +0 -2
- package/dist/http/types.js.map +0 -1
- package/dist/skill-install/disabled-state.d.ts +0 -35
- package/dist/skill-install/disabled-state.d.ts.map +0 -1
- package/dist/skill-install/disabled-state.js +0 -96
- package/dist/skill-install/disabled-state.js.map +0 -1
- package/dist/skill-install/discover.d.ts +0 -29
- package/dist/skill-install/discover.d.ts.map +0 -1
- package/dist/skill-install/discover.js +0 -104
- package/dist/skill-install/discover.js.map +0 -1
- package/dist/skill-install/include-utils.d.ts +0 -27
- package/dist/skill-install/include-utils.d.ts.map +0 -1
- package/dist/skill-install/include-utils.js +0 -90
- package/dist/skill-install/include-utils.js.map +0 -1
- package/dist/skill-install/manifest.d.ts +0 -82
- package/dist/skill-install/manifest.d.ts.map +0 -1
- package/dist/skill-install/manifest.js +0 -215
- package/dist/skill-install/manifest.js.map +0 -1
- package/dist/skill-install/skill-installer-common.d.ts +0 -26
- package/dist/skill-install/skill-installer-common.d.ts.map +0 -1
- package/dist/skill-install/skill-installer-common.js +0 -139
- package/dist/skill-install/skill-installer-common.js.map +0 -1
- package/dist/skill-install/skill-installers/claude-code.d.ts +0 -43
- package/dist/skill-install/skill-installers/claude-code.d.ts.map +0 -1
- package/dist/skill-install/skill-installers/claude-code.js +0 -65
- package/dist/skill-install/skill-installers/claude-code.js.map +0 -1
- package/dist/skill-install/skill-installers/codex-cli.d.ts +0 -27
- package/dist/skill-install/skill-installers/codex-cli.d.ts.map +0 -1
- package/dist/skill-install/skill-installers/codex-cli.js +0 -84
- package/dist/skill-install/skill-installers/codex-cli.js.map +0 -1
- package/dist/skill-install/skill-installers/cursor.d.ts +0 -72
- package/dist/skill-install/skill-installers/cursor.d.ts.map +0 -1
- package/dist/skill-install/skill-installers/cursor.js +0 -81
- package/dist/skill-install/skill-installers/cursor.js.map +0 -1
- package/dist/skill-install/skill-installers/gemini-cli.d.ts +0 -50
- package/dist/skill-install/skill-installers/gemini-cli.d.ts.map +0 -1
- package/dist/skill-install/skill-installers/gemini-cli.js +0 -72
- package/dist/skill-install/skill-installers/gemini-cli.js.map +0 -1
- package/dist/skill-install/skill-manifest-sync.d.ts +0 -11
- package/dist/skill-install/skill-manifest-sync.d.ts.map +0 -1
- package/dist/skill-install/skill-manifest-sync.js +0 -65
- package/dist/skill-install/skill-manifest-sync.js.map +0 -1
- package/dist/skills/_shared/auth.md +0 -41
- package/dist/skills/_shared/error-handling.md +0 -31
- package/dist/skills/_shared/polling.md +0 -88
- package/dist/skills/_shared/response-shape.md +0 -55
- package/dist/skills/_shared/review-policy.md +0 -15
- package/dist/skills/mma-audit/SKILL.md +0 -270
- package/dist/skills/mma-context-blocks/SKILL.md +0 -148
- package/dist/skills/mma-debug/SKILL.md +0 -208
- package/dist/skills/mma-delegate/SKILL.md +0 -216
- package/dist/skills/mma-execute-plan/SKILL.md +0 -214
- package/dist/skills/mma-explore/SKILL.md +0 -190
- package/dist/skills/mma-investigate/SKILL.md +0 -258
- package/dist/skills/mma-journal-recall/SKILL.md +0 -242
- package/dist/skills/mma-journal-record/SKILL.md +0 -189
- package/dist/skills/mma-research/SKILL.md +0 -223
- package/dist/skills/mma-retry/SKILL.md +0 -221
- package/dist/skills/mma-review/SKILL.md +0 -209
- package/dist/skills/multi-model-agent/SKILL.md +0 -206
- package/dist/telemetry/consent.d.ts +0 -4
- package/dist/telemetry/consent.d.ts.map +0 -1
- package/dist/telemetry/consent.js +0 -40
- package/dist/telemetry/consent.js.map +0 -1
- package/dist/telemetry/flusher.d.ts +0 -19
- package/dist/telemetry/flusher.d.ts.map +0 -1
- package/dist/telemetry/flusher.js +0 -277
- package/dist/telemetry/flusher.js.map +0 -1
- package/dist/telemetry/generation.d.ts +0 -9
- package/dist/telemetry/generation.d.ts.map +0 -1
- package/dist/telemetry/generation.js +0 -33
- package/dist/telemetry/generation.js.map +0 -1
- package/dist/telemetry/identity.d.ts +0 -9
- package/dist/telemetry/identity.d.ts.map +0 -1
- package/dist/telemetry/identity.js +0 -35
- package/dist/telemetry/identity.js.map +0 -1
- package/dist/telemetry/install-id.d.ts +0 -13
- package/dist/telemetry/install-id.d.ts.map +0 -1
- package/dist/telemetry/install-id.js +0 -49
- package/dist/telemetry/install-id.js.map +0 -1
- package/dist/telemetry/install-meta.d.ts +0 -10
- package/dist/telemetry/install-meta.d.ts.map +0 -1
- package/dist/telemetry/install-meta.js +0 -15
- package/dist/telemetry/install-meta.js.map +0 -1
- package/dist/telemetry/queue.d.ts +0 -35
- package/dist/telemetry/queue.d.ts.map +0 -1
- package/dist/telemetry/queue.js +0 -287
- package/dist/telemetry/queue.js.map +0 -1
- package/dist/telemetry/recorder.d.ts +0 -39
- package/dist/telemetry/recorder.d.ts.map +0 -1
- package/dist/telemetry/recorder.js +0 -173
- package/dist/telemetry/recorder.js.map +0 -1
- package/scripts/postinstall.js +0 -36
|
@@ -1,270 +0,0 @@
|
|
|
1
|
-
---
|
|
2
|
-
name: mma-audit
|
|
3
|
-
description: >-
|
|
4
|
-
Use when the user asks to audit a spec / plan / design doc / skill file. The
|
|
5
|
-
`subtype` field picks the criteria set. `default` (prose-coherence) is the
|
|
6
|
-
general doc auditor. `plan` verifies a code-execution plan against the actual
|
|
7
|
-
codebase — run this before any `mma-execute-plan` dispatch. `spec` audits
|
|
8
|
-
requirement prose for testability and decision-trace. `skill` audits a
|
|
9
|
-
SKILL.md against reader-effectiveness criteria.
|
|
10
|
-
when_to_use: >-
|
|
11
|
-
User asks for a doc / spec / plan / skill audit OR a methodology skill
|
|
12
|
-
(superpowers:dispatching-parallel-agents, /security-review) points at one AND
|
|
13
|
-
mmagent is running. Audit on PROSE/SPEC docs — use mma-review for source code.
|
|
14
|
-
Audit a CODE-EXECUTION PLAN against the codebase — use subtype=plan.
|
|
15
|
-
version: 4.9.1
|
|
16
|
-
---
|
|
17
|
-
|
|
18
|
-
# mma-audit
|
|
19
|
-
|
|
20
|
-
## Overview
|
|
21
|
-
|
|
22
|
-
`mma-audit` sends a prose artifact to workers for structured auditing. The `subtype` field picks WHICH criteria set the workers apply — every subtype runs through the same sequential-criteria read-only lifecycle, but each one carries its own criteria list, semantics, and prompt scaffolding.
|
|
23
|
-
|
|
24
|
-
**Four subtypes — picked by the kind of artifact, not by the lens you want:**
|
|
25
|
-
|
|
26
|
-
| You're auditing… | Use… | What it checks |
|
|
27
|
-
|---|---|---|
|
|
28
|
-
| A general prose artifact (design doc, recommendation, post-mortem, README) | `subtype: 'default'` | Comprehensive prose-coherence — would a literal-following worker produce the right outcome from this prose alone? Catches ambiguity, contradictions, missing branches, drift, scope-creep. **Does NOT verify against any codebase.** |
|
|
29
|
-
| A **code-execution PLAN** (`docs/superpowers/plans/*.md` or similar) before running it via `mma-execute-plan` | `subtype: 'plan'` | Plan-vs-codebase coherence — for every method / type / file path / signature / import / verify command the plan names, the codebase actually contains it as described. Catches the bug class the prose-coherence audit cannot see (e.g. plan says `registerBlock` but actual interface is `register`). |
|
|
30
|
-
| A **requirement spec** (what we want, why; success criteria) | `subtype: 'spec'` | Requirement-prose executability across 9 criteria — testability, scope explicitness AND decomposability, acceptance-criteria coverage, non-functional capture, requirement conflicts, decision-trace, assumption exposure, placeholder scan, and design-decomposition presence (architecture / components / data flow / error handling / testing). |
|
|
31
|
-
| A **SKILL.md** for an `mma-*` skill or comparable agent-facing playbook | `subtype: 'skill'` | Skill-file reader-effectiveness — when-to-use specificity, endpoint contract integrity, example correctness, anti-pattern coverage, link integrity. |
|
|
32
|
-
|
|
33
|
-
If you want to bias workers toward a narrow lens (security only, performance only, accessibility only), put that in the free-text `background` portion of the prompt — `subtype` is criteria machinery, not a lens selector.
|
|
34
|
-
|
|
35
|
-
## When to Use
|
|
36
|
-
|
|
37
|
-
- `subtype: 'default'` — a general prose artifact needs a critical read for internal executability (the artifact will be acted on by a worker reading the prose alone).
|
|
38
|
-
- `subtype: 'plan'` — you have a written code-execution plan on disk and you're about to dispatch tasks from it via `mma-execute-plan`. This is the ONLY subtype that grounds findings against real source files.
|
|
39
|
-
- `subtype: 'spec'` — you have a requirement / brainstorming-output spec and want to verify every requirement is testable, traceable, and unambiguous BEFORE writing the plan. Typical predecessor to `writing-plans`.
|
|
40
|
-
- `subtype: 'skill'` — you're authoring or revising an `mma-*` skill or comparable SKILL.md and want to know whether agents will actually read it the right way.
|
|
41
|
-
|
|
42
|
-
**Don't use mma-audit when:** the thing being audited is source code (→ `mma-review`); a 30-second `Read` would answer it; or you want to verify a plan that hasn't been written yet (write the plan first).
|
|
43
|
-
|
|
44
|
-
## Endpoint
|
|
45
|
-
|
|
46
|
-
`POST /audit?cwd=<abs-path>`
|
|
47
|
-
|
|
48
|
-
@include _shared/auth.md
|
|
49
|
-
|
|
50
|
-
## Request body
|
|
51
|
-
|
|
52
|
-
```json
|
|
53
|
-
{
|
|
54
|
-
"document": "inline content to audit (optional if filePaths given)",
|
|
55
|
-
"subtype": "default",
|
|
56
|
-
"filePaths": ["/project/docs/spec.md"],
|
|
57
|
-
"contextBlockIds": []
|
|
58
|
-
}
|
|
59
|
-
```
|
|
60
|
-
|
|
61
|
-
| Field | Type | Required | Notes |
|
|
62
|
-
|---|---|---|---|
|
|
63
|
-
| `document` | string | no | Inline document content |
|
|
64
|
-
| `subtype` | `'default' \| 'plan' \| 'spec' \| 'skill'` | no (defaults to `'default'`) | See "Picking subtype" below. |
|
|
65
|
-
| `filePaths` | string[] | no | Files to audit (one worker per file, parallel) |
|
|
66
|
-
| `contextBlockIds` | string[] | no | IDs from `mma-context-blocks` |
|
|
67
|
-
|
|
68
|
-
Either `document` or `filePaths` (or both) must be provided.
|
|
69
|
-
|
|
70
|
-
> Worker tier for `mma-audit` is hardcoded to `complex` and is not caller-configurable. Sending `agentType` is rejected with HTTP 400.
|
|
71
|
-
|
|
72
|
-
### Picking subtype
|
|
73
|
-
|
|
74
|
-
| Value | When to use |
|
|
75
|
-
|---|---|
|
|
76
|
-
| `default` (or omit the field) | **General prose — design doc, recommendation, post-mortem, README, brief.** Comprehensive prose-coherence audit. Does NOT verify against any codebase. |
|
|
77
|
-
| `plan` | **Code-execution plans being audited against a real codebase.** Single-file input (the plan markdown). Workers grep / read source files under `cwd` to verify every named symbol / path / signature / import / verify command. Use this BEFORE every `mma-execute-plan` dispatch. |
|
|
78
|
-
| `spec` | **Requirement spec / brainstorming-output / what-we-want prose.** 9 criteria target testability, scope explicitness + decomposability, acceptance-criteria coverage, non-functional capture, requirement conflicts, decision-trace, assumption exposure, placeholder scan, and design-decomposition presence. |
|
|
79
|
-
| `skill` | **`SKILL.md` or comparable agent-facing playbook.** Criteria target when-to-use specificity, endpoint contract integrity, example correctness, anti-pattern coverage, link integrity. |
|
|
80
|
-
|
|
81
|
-
You can run BOTH on a plan: first `spec` or `default` (prose quality), then `plan` (does the plan match the codebase?). They cover orthogonal failure modes.
|
|
82
|
-
|
|
83
|
-
The legacy `auditType` field and its `correctness` / `style` / `general` / `security` / `performance` values no longer exist. Sending `auditType` returns `400 invalid_request`. Sending unknown `subtype` values returns `400 invalid_request` with the allowed enum.
|
|
84
|
-
|
|
85
|
-
### Plan-audit specifics
|
|
86
|
-
|
|
87
|
-
When `subtype: 'plan'`:
|
|
88
|
-
|
|
89
|
-
- `filePaths` MUST contain exactly **one entry** — the plan markdown. Sending zero or 2+ entries → `400 invalid_request` with the message: *"Plan audit takes exactly one filePath (the plan markdown). The worker discovers and verifies source files itself via its tool surface — do not pre-list source files."*
|
|
90
|
-
- `document` (inline content) is not used in plan mode — the plan must be on disk so workers can reference it by `?cwd=`-relative path.
|
|
91
|
-
- The worker runs the sequential-criteria loop with the plan-audit criteria set across 12 perspectives in three groups: **EXTERNAL CODEBASE COHERENCE** (1 PATH EXISTENCE, 2 SYMBOL EXISTENCE, 3 SIGNATURE MATCH, 4 IMPORT GRAPH, 5 TEST HARNESS AVAILABILITY, 6 STEP SEQUENCE WITHIN TASK, 7 CROSS-TASK DEPENDENCIES, 8 VERIFICATION COMMAND VALIDITY), **INTRA-PLAN STRUCTURE** (9 TASK GRANULARITY, 11 PLACEHOLDER LANGUAGE, 12 PLAN SKELETON), and **SPEC ALIGNMENT** (10 SPEC COVERAGE).
|
|
92
|
-
- To enable perspective 10 (SPEC COVERAGE), register the upstream spec as a context block via `mma-context-blocks` and pass its `blockId` in `contextBlockIds`. Without a spec in context, perspective 10 emits "No findings for this criterion." and the other 11 still run.
|
|
93
|
-
- Read the findings list. Fix the plan and re-audit if any `critical` or `high` plan-audit findings remain.
|
|
94
|
-
|
|
95
|
-
## Full example
|
|
96
|
-
|
|
97
|
-
### Default audit (general prose)
|
|
98
|
-
|
|
99
|
-
```bash
|
|
100
|
-
BATCH=$(curl -f --show-error -s -X POST \
|
|
101
|
-
-H "X-MMA-Client: $MMA_CLIENT" \
|
|
102
|
-
-H "X-MMA-Main-Model: $MMA_MAIN_MODEL" \
|
|
103
|
-
-H "Authorization: Bearer $TOKEN" \
|
|
104
|
-
-H "Content-Type: application/json" \
|
|
105
|
-
-d '{"subtype":"default","filePaths":["/project/docs/api-spec.md"]}' \
|
|
106
|
-
"http://localhost:$PORT/audit?cwd=/project")
|
|
107
|
-
BATCH_ID=$(echo "$BATCH" | jq -r '.batchId')
|
|
108
|
-
```
|
|
109
|
-
|
|
110
|
-
### Spec audit (requirement prose)
|
|
111
|
-
|
|
112
|
-
```bash
|
|
113
|
-
BATCH=$(curl -f --show-error -s -X POST \
|
|
114
|
-
-H "X-MMA-Client: $MMA_CLIENT" \
|
|
115
|
-
-H "X-MMA-Main-Model: $MMA_MAIN_MODEL" \
|
|
116
|
-
-H "Authorization: Bearer $TOKEN" \
|
|
117
|
-
-H "Content-Type: application/json" \
|
|
118
|
-
-d '{"subtype":"spec","filePaths":["/project/docs/superpowers/specs/2026-05-12-feature-design.md"]}' \
|
|
119
|
-
"http://localhost:$PORT/audit?cwd=/project")
|
|
120
|
-
```
|
|
121
|
-
|
|
122
|
-
### Skill audit (SKILL.md)
|
|
123
|
-
|
|
124
|
-
```bash
|
|
125
|
-
BATCH=$(curl -f --show-error -s -X POST \
|
|
126
|
-
-H "X-MMA-Client: $MMA_CLIENT" \
|
|
127
|
-
-H "X-MMA-Main-Model: $MMA_MAIN_MODEL" \
|
|
128
|
-
-H "Authorization: Bearer $TOKEN" \
|
|
129
|
-
-H "Content-Type: application/json" \
|
|
130
|
-
-d '{"subtype":"skill","filePaths":["/project/packages/server/src/skills/mma-audit/SKILL.md"]}' \
|
|
131
|
-
"http://localhost:$PORT/audit?cwd=/project")
|
|
132
|
-
```
|
|
133
|
-
|
|
134
|
-
### Plan audit (verify a code-execution plan against the codebase)
|
|
135
|
-
|
|
136
|
-
```bash
|
|
137
|
-
BATCH=$(curl -f --show-error -s -X POST \
|
|
138
|
-
-H "X-MMA-Client: $MMA_CLIENT" \
|
|
139
|
-
-H "X-MMA-Main-Model: $MMA_MAIN_MODEL" \
|
|
140
|
-
-H "Authorization: Bearer $TOKEN" \
|
|
141
|
-
-H "Content-Type: application/json" \
|
|
142
|
-
-d '{"subtype":"plan","filePaths":["/project/docs/superpowers/plans/2026-05-10-feature.md"]}' \
|
|
143
|
-
"http://localhost:$PORT/audit?cwd=/project")
|
|
144
|
-
```
|
|
145
|
-
|
|
146
|
-
@include _shared/polling.md
|
|
147
|
-
|
|
148
|
-
@include _shared/response-shape.md
|
|
149
|
-
|
|
150
|
-
## Reading the findings
|
|
151
|
-
|
|
152
|
-
The main agent reads `completed` + `message` + `findings` — the findings are the answer. For
|
|
153
|
-
read-only routes, `filesChanged` is always `[]` and `commitSha` is always `null`.
|
|
154
|
-
|
|
155
|
-
```json
|
|
156
|
-
{
|
|
157
|
-
"completed": true,
|
|
158
|
-
"message": "Plan audit complete; 2 findings.",
|
|
159
|
-
"findings": [
|
|
160
|
-
{ "id": "F1", "severity": "high", "category": "path-existence",
|
|
161
|
-
"claim": "Step 3 names `src/utils/foo.ts` which does not exist.",
|
|
162
|
-
"evidence": "Worker grepped for the file under cwd — no match found.",
|
|
163
|
-
"suggestion": "Use `src/utils/bar.ts` instead.",
|
|
164
|
-
"source": "implementer" }
|
|
165
|
-
],
|
|
166
|
-
"filesChanged": [],
|
|
167
|
-
"commitSha": null,
|
|
168
|
-
"summary": "...",
|
|
169
|
-
"telemetry": { ... }
|
|
170
|
-
}
|
|
171
|
-
```
|
|
172
|
-
|
|
173
|
-
### Finding shape
|
|
174
|
-
|
|
175
|
-
Every finding has this shape:
|
|
176
|
-
|
|
177
|
-
| Field | Type | Notes |
|
|
178
|
-
|---|---|---|
|
|
179
|
-
| `id` | string | Worker-assigned, e.g. `F1`, `F2`. Stable across chain. |
|
|
180
|
-
| `severity` | `'critical' \| 'high' \| 'medium' \| 'low'` | 4-tier. |
|
|
181
|
-
| `category` | string | Topical bucket, e.g. `path-existence`, `prose-coherence`. |
|
|
182
|
-
| `claim` | string | One-sentence summary. |
|
|
183
|
-
| `evidence` | string ≥20 chars | Verbatim from source when grounded. |
|
|
184
|
-
| `suggestion?` | string | Optional fix recommendation. |
|
|
185
|
-
| `source` | `'implementer' \| 'reviewer'` | Who produced the finding. |
|
|
186
|
-
|
|
187
|
-
`annotatorConfidence` and `evidenceGrounded` are retired — they were v4 fields with no producers.
|
|
188
|
-
|
|
189
|
-
### Recommended rendering by the main agent
|
|
190
|
-
|
|
191
|
-
1. Show ALL findings — never silently drop. Severity and grounding are soft
|
|
192
|
-
signals, not gates.
|
|
193
|
-
2. Default sort: severity (critical → low), then `id` ascending.
|
|
194
|
-
3. `severity` is the authoritative value — use it directly.
|
|
195
|
-
4. Mark findings with `evidence` shorter than 30 chars as "low-evidence"
|
|
196
|
-
(lighter color or `(low evidence)` annotation). User decides what to do.
|
|
197
|
-
5. Severity-tier counts feed the dashboard.
|
|
198
|
-
|
|
199
|
-
## Best practices
|
|
200
|
-
|
|
201
|
-
This skill is one step in the larger flow described in `multi-model-agent` → "Best practices". Recipes that involve `mma-audit`:
|
|
202
|
-
|
|
203
|
-
- **Recipe A — Audit-iterate-clean.** `mma-audit` → fix → `mma-audit` again. Sequential rounds. Register the doc via `mma-context-blocks` before round 1 and reuse the same ID across all rounds — avoids re-inlining the same content into every audit call.
|
|
204
|
-
|
|
205
|
-
- **Recipe E — Plan-validate-execute.** Before any `mma-execute-plan` batch, run `mma-audit` with `subtype: 'plan'` on the plan file. Read the findings. If any `critical` / `high` finding survives, fix the plan and re-audit. This catches the bug class where the plan's named methods/files don't actually exist in the codebase — symbols a prose-coherence audit cannot see.
|
|
206
|
-
|
|
207
|
-
- **Recipe F — Spec-then-plan-then-execute (the canonical flow).** When working from a brainstorming spec: `mma-audit` (`subtype: 'spec'`) → fix → `writing-plans` → register the spec as a context block via `mma-context-blocks` → `mma-audit` (`subtype: 'plan'`, `contextBlockIds: [specBlockId]`) → fix → `mma-execute-plan`. Spec audit covers requirement-prose executability; plan audit covers BOTH plan-vs-codebase coherence AND plan-vs-spec coverage (perspective 10 fires only when the spec is in context, which is why the context-block step is load-bearing in this recipe).
|
|
208
|
-
|
|
209
|
-
Anti-pattern alert: **`parallel-rounds-same-target`** (AP1). Three parallel audits on the same document re-flag the same issues without seeing each other's fixes. Run rounds sequentially with a fix between each.
|
|
210
|
-
|
|
211
|
-
## Common pitfalls
|
|
212
|
-
|
|
213
|
-
❌ **Auditing source code with `mma-audit`**
|
|
214
|
-
The auditor lacks codebase context (no type info, no call-site lookup, no test awareness). Findings are speculative. **Fix:** use `mma-review` — it pulls in surrounding source context and validates against the actual types.
|
|
215
|
-
|
|
216
|
-
❌ **Single huge `document` string instead of `filePaths`**
|
|
217
|
-
Inline docs lose the file boundary, so the per-file parallel split degenerates to one worker. **Fix:** save to disk first, pass `filePaths`.
|
|
218
|
-
|
|
219
|
-
❌ **Sending the legacy `auditType` field**
|
|
220
|
-
The field was renamed to `subtype` and the value set was narrowed. **Fix:** use `subtype` with one of `default` / `plan` / `spec` / `skill`. For "security only" / "performance only" lenses, put the bias in the free-text prompt — there is no narrow-lens subtype.
|
|
221
|
-
|
|
222
|
-
❌ **Re-auditing the same files round after round without delta context**
|
|
223
|
-
Round 2 worker has no idea what round 1 found. **Fix:** register the round 1 findings as a context block (`mma-context-blocks`) and pass `contextBlockIds` to round 2.
|
|
224
|
-
|
|
225
|
-
## Terminal context block
|
|
226
|
-
|
|
227
|
-
Every completed **read-route** task (audit / review / debug / investigate / research) auto-registers a reusable terminal context block containing its report (headline + findings). The block id is returned on each per-task result as **`contextBlockId`**. Write routes (delegate / execute-plan / retry) return `contextBlockId: null` — their record is the commit, not a block. This block is immutable, lives for the session duration, and counts against the project's `maxEntries` quota (default 500).
|
|
228
|
-
|
|
229
|
-
Use it for delta follow-ups — feed prior results' block ids into a later call's `contextBlockIds`, filtering out nulls:
|
|
230
|
-
|
|
231
|
-
contextBlockIds: priorResults.map(r => r.contextBlockId).filter((id) => id !== null)
|
|
232
|
-
|
|
233
|
-
**Use cases:**
|
|
234
|
-
- Pass round-N audit findings to round N+1 via `contextBlockIds`
|
|
235
|
-
- Feed audit results into a downstream `mma-delegate` fix step
|
|
236
|
-
- Accumulate findings across iterative audit rounds
|
|
237
|
-
|
|
238
|
-
The block is registered server-side at task completion; no caller action is needed to create it. Delete it explicitly via `DELETE /context-blocks/:id` when no longer needed, or let it expire on session teardown.
|
|
239
|
-
|
|
240
|
-
## Outcome semantics
|
|
241
|
-
|
|
242
|
-
Every task result carries outcome fields that describe the audit's conclusion status:
|
|
243
|
-
|
|
244
|
-
| Field | Type | Meaning |
|
|
245
|
-
|---|---|---|
|
|
246
|
-
| `findingsOutcome` | `'found' \| 'clean' \| 'not_applicable'` | Answers the question: did the audit uncover issues? |
|
|
247
|
-
| `findingsOutcomeReason` | `string \| null` | When `findingsOutcome` is set, this explains why (e.g. "3 critical findings: broken paths, missing symbols, mismatched signatures" or "Document is clean across all audit criteria"). |
|
|
248
|
-
| `outcomeInferred` | `boolean` | `true` if the system inferred the outcome from findings count; `false` if the auditor explicitly stated it. |
|
|
249
|
-
| `outcomeMalformed` | `boolean` | `true` if the outcome line was malformed and had to be repaired; `false` otherwise. |
|
|
250
|
-
|
|
251
|
-
### Enum values
|
|
252
|
-
|
|
253
|
-
- **`found`** — the audit surfaced one or more issues (findings) in the artifact across one or more criteria. This indicates the artifact needs rework before downstream use.
|
|
254
|
-
- **`clean`** — the audit completed and found zero issues. The artifact is clear across all audit criteria and ready for downstream use.
|
|
255
|
-
- **`not_applicable`** — the audit could not proceed (e.g., wrong input type, missing preconditions, or system error). This is rare; most audits resolve to `found` or `clean`.
|
|
256
|
-
|
|
257
|
-
### Empty findings ≠ failure
|
|
258
|
-
|
|
259
|
-
A crucial semantic: **empty findings does NOT mean `completed: false` or a failed task.** Finding nothing wrong is a successful audit outcome — it means the document passed the bar. An audit with zero findings is `completed: true` with `findingsOutcome: 'clean'`.
|
|
260
|
-
|
|
261
|
-
### Per-route legal outcomes
|
|
262
|
-
|
|
263
|
-
The legal outcomes for this route are: `['found', 'clean']`
|
|
264
|
-
|
|
265
|
-
- **`found`** — one or more issues were detected across the audit criteria.
|
|
266
|
-
- **`clean`** — zero issues were detected; the artifact is ready for downstream use.
|
|
267
|
-
|
|
268
|
-
The outcome `not_applicable` is not legal for `mma-audit` (except on actual precondition failures) because an audit always produces a verdict: either issues found or clean.
|
|
269
|
-
|
|
270
|
-
@include _shared/error-handling.md
|
|
@@ -1,148 +0,0 @@
|
|
|
1
|
-
---
|
|
2
|
-
name: mma-context-blocks
|
|
3
|
-
description: >-
|
|
4
|
-
Use when a document larger than ~2 KB will be referenced by 2+ subsequent
|
|
5
|
-
mma-* calls — register once, pass the returned ID to each call instead of
|
|
6
|
-
re-uploading the same content. OR a spec / plan / error log was already
|
|
7
|
-
inlined into one task and is about to be inlined into a second — register on
|
|
8
|
-
the second reference, never the third.
|
|
9
|
-
when_to_use: >-
|
|
10
|
-
A document (spec, plan, codebase summary, prior round's findings, error log)
|
|
11
|
-
larger than ~2 KB will be referenced by two or more mma-* calls in a row.
|
|
12
|
-
Register once here, then pass the ID via `contextBlockIds` on mma-delegate /
|
|
13
|
-
mma-execute-plan / mma-audit / mma-review / mma-debug / mma-investigate.
|
|
14
|
-
Cheaper and faster than inlining the same content N times.
|
|
15
|
-
version: 4.9.1
|
|
16
|
-
---
|
|
17
|
-
|
|
18
|
-
# mma-context-blocks
|
|
19
|
-
|
|
20
|
-
## Overview
|
|
21
|
-
|
|
22
|
-
Store large documents once; reference them by ID in subsequent `mma-*` calls via `contextBlockIds`. The service prepends the block content to each task prompt that references the ID — content is transmitted ONCE to the daemon, then reused server-side.
|
|
23
|
-
|
|
24
|
-
**Core principle:** Without context blocks, the same document is sent N times for N tasks. Blocks transmit once. The savings compound on shared specs, prior-round findings, and codebase summaries.
|
|
25
|
-
|
|
26
|
-
## When to Use
|
|
27
|
-
|
|
28
|
-
**Use when:**
|
|
29
|
-
- A doc >2 KB will be referenced by ≥2 mma-* calls
|
|
30
|
-
- You're running iterative audit/review rounds (round 2 references round 1's findings)
|
|
31
|
-
- A spec or design doc is the shared input across N parallel tasks
|
|
32
|
-
- A long error log is the context for debug + delegate calls
|
|
33
|
-
|
|
34
|
-
**Don't use when:**
|
|
35
|
-
- The doc is <2 KB and used once → just inline it (registration overhead exceeds savings)
|
|
36
|
-
- The doc changes between calls → context blocks are immutable; register a new one
|
|
37
|
-
- Single task that doesn't reference any large shared content → no benefit
|
|
38
|
-
|
|
39
|
-
## Endpoints
|
|
40
|
-
|
|
41
|
-
### Register a context block
|
|
42
|
-
|
|
43
|
-
`POST /context-blocks?cwd=<abs-path>`
|
|
44
|
-
|
|
45
|
-
@include _shared/auth.md
|
|
46
|
-
|
|
47
|
-
#### Request body
|
|
48
|
-
|
|
49
|
-
```json
|
|
50
|
-
{
|
|
51
|
-
"content": "# Project spec\n...",
|
|
52
|
-
"ttlMs": 3600000
|
|
53
|
-
}
|
|
54
|
-
```
|
|
55
|
-
|
|
56
|
-
| Field | Type | Required | Notes |
|
|
57
|
-
|---|---|---|---|
|
|
58
|
-
| `content` | string | yes | Document content (min 1 char, max 50 MiB) |
|
|
59
|
-
| `ttlMs` | number | no | Time-to-live in ms; omit for idle-expiry (default 24 h idle). A block that is not referenced by any active batch for 24 h is eligible for eviction. |
|
|
60
|
-
|
|
61
|
-
#### Response (201)
|
|
62
|
-
|
|
63
|
-
```json
|
|
64
|
-
{ "id": "cb_abc123" }
|
|
65
|
-
```
|
|
66
|
-
|
|
67
|
-
Use this `id` as a `contextBlockIds` entry in any `mma-*` skill that supports it.
|
|
68
|
-
|
|
69
|
-
### Delete a context block
|
|
70
|
-
|
|
71
|
-
`DELETE /context-blocks/:id?cwd=<abs-path>`
|
|
72
|
-
|
|
73
|
-
Returns `200 { ok: true }` on success. Returns `409 pinned` if the block is held by one or more active batches — wait for those batches to complete before deleting.
|
|
74
|
-
|
|
75
|
-
## Full example
|
|
76
|
-
|
|
77
|
-
```bash
|
|
78
|
-
# Register spec document once
|
|
79
|
-
ID=$(curl -f --show-error -s -X POST \
|
|
80
|
-
-H "X-MMA-Client: $MMA_CLIENT" \
|
|
81
|
-
-H "X-MMA-Main-Model: $MMA_MAIN_MODEL" \
|
|
82
|
-
-H "Authorization: Bearer $TOKEN" \
|
|
83
|
-
-H "Content-Type: application/json" \
|
|
84
|
-
-d "{\"content\":$(jq -Rs . < /project/docs/spec.md)}" \
|
|
85
|
-
"http://localhost:$PORT/context-blocks?cwd=/project" | jq -r '.id')
|
|
86
|
-
|
|
87
|
-
# Reference from N delegate tasks
|
|
88
|
-
curl -f --show-error -s -X POST \
|
|
89
|
-
-H "X-MMA-Client: $MMA_CLIENT" \
|
|
90
|
-
-H "X-MMA-Main-Model: $MMA_MAIN_MODEL" \
|
|
91
|
-
-H "Authorization: Bearer $TOKEN" \
|
|
92
|
-
-H "Content-Type: application/json" \
|
|
93
|
-
-d "{\"tasks\":[
|
|
94
|
-
{\"prompt\":\"Implement section 3 per spec\",\"contextBlockIds\":[\"$ID\"]},
|
|
95
|
-
{\"prompt\":\"Implement section 4 per spec\",\"contextBlockIds\":[\"$ID\"]}
|
|
96
|
-
]}" \
|
|
97
|
-
"http://localhost:$PORT/delegate?cwd=/project"
|
|
98
|
-
```
|
|
99
|
-
|
|
100
|
-
## v5 wire shape (register-context-block route)
|
|
101
|
-
|
|
102
|
-
Every task result is a `ComposePayload`. For the `register-context-block` route, the envelope has one additional field beyond the standard seven:
|
|
103
|
-
|
|
104
|
-
```json
|
|
105
|
-
{
|
|
106
|
-
"completed": true,
|
|
107
|
-
"message": "Context block cb_abc123 registered (12345 bytes)",
|
|
108
|
-
"findings": [],
|
|
109
|
-
"summary": "",
|
|
110
|
-
"filesChanged": [],
|
|
111
|
-
"commitSha": null,
|
|
112
|
-
"blockId": "cb_abc123",
|
|
113
|
-
"telemetry": { ... }
|
|
114
|
-
}
|
|
115
|
-
```
|
|
116
|
-
|
|
117
|
-
`blockId` is **non-null only for the `register-context-block` route**. For every other route (`delegate`, `execute-plan`, `investigate`, etc.), `blockId` is `null`. This is the only signal that distinguishes a register-context-block result from any other route — no route-keyed discriminated union, just one extra nullable field on the shared shape.
|
|
118
|
-
|
|
119
|
-
The terminal context block (per-task, auto-registered) uses a different ID format and is separate from the `blockId` in the wire envelope.
|
|
120
|
-
|
|
121
|
-
## Best practices
|
|
122
|
-
|
|
123
|
-
This skill is the cross-cutting state mechanism described in `multi-model-agent` → "Best practices". Recipes that use context blocks:
|
|
124
|
-
|
|
125
|
-
- **Recipe A — Audit-iterate-clean.** Register the doc once before round 1; pass round-N's findings block ID into round N+1.
|
|
126
|
-
- **Recipe B — Debug-fix-verify.** Register the failing test output / reproduction log before the debug call; reuse on verify.
|
|
127
|
-
- **Recipe C — Investigate-plan-execute.** Register the plan file before `mma-execute-plan`.
|
|
128
|
-
- **Recipe D — Plan-execute-retry.** No new registration needed — `mma-retry` inherits the original batch's `contextBlockIds`.
|
|
129
|
-
|
|
130
|
-
Anti-pattern alert: **`re-inlined-shared-content`** (AP3). Pasting the same spec into 5 task prompts costs N× tokens. Register once; pass `contextBlockIds`.
|
|
131
|
-
|
|
132
|
-
## Common pitfalls
|
|
133
|
-
|
|
134
|
-
❌ **Inlining the same 50KB spec into every task prompt**
|
|
135
|
-
> tasks: [{prompt: "Implement section 3:\n[50KB spec]"}, {prompt: "Implement section 4:\n[50KB spec]"}]
|
|
136
|
-
|
|
137
|
-
N×50KB transmissions; main context burns through tokens. **Fix:** register the spec once, pass `contextBlockIds: ["cb_xxx"]` to each task.
|
|
138
|
-
|
|
139
|
-
❌ **Forgetting to delete unused blocks**
|
|
140
|
-
Blocks count against the project's context-block quota (`maxEntries` 500). **Fix:** explicitly `DELETE` after the dependent batches finish — or let idle expiry (24 h) evict them.
|
|
141
|
-
|
|
142
|
-
❌ **Trying to update a block's content**
|
|
143
|
-
Blocks are immutable. **Fix:** register a new block with the new content; switch the `contextBlockIds` to the new ID.
|
|
144
|
-
|
|
145
|
-
❌ **Deleting a block while a batch still references it**
|
|
146
|
-
Returns `409 pinned`. **Fix:** poll the dependent batches to terminal first, then delete.
|
|
147
|
-
|
|
148
|
-
@include _shared/error-handling.md
|
|
@@ -1,208 +0,0 @@
|
|
|
1
|
-
---
|
|
2
|
-
name: mma-debug
|
|
3
|
-
description: >-
|
|
4
|
-
Use when a test fails, a build breaks, or behavior is unexpected AND narrowing
|
|
5
|
-
the root cause requires reading files, reproducing the failure, or tracing
|
|
6
|
-
across multiple modules — the worker investigates so the main agent stays on
|
|
7
|
-
the hypothesis
|
|
8
|
-
when_to_use: >-
|
|
9
|
-
A failure has surfaced (test/build/runtime) AND you need investigation work —
|
|
10
|
-
read files, reproduce, trace — OR a methodology skill
|
|
11
|
-
(superpowers:systematic-debugging) points at the investigation step. Delegate
|
|
12
|
-
the read/reproduce/trace; the main agent stays on the hypothesis and the fix.
|
|
13
|
-
version: 4.9.1
|
|
14
|
-
---
|
|
15
|
-
|
|
16
|
-
# mma-debug
|
|
17
|
-
|
|
18
|
-
## Overview
|
|
19
|
-
|
|
20
|
-
Submit a problem, context, and hypothesis to a worker for focused debugging. Unlike `mma-audit` and `mma-review`, all `filePaths` are investigated TOGETHER in a single task (not parallelized per file) — debugging needs cross-file reasoning.
|
|
21
|
-
|
|
22
|
-
**Core principle:** The hypothesis is judgment (your job). Reading files and reproducing the failure is labor (the worker's job). Pass the hypothesis as input; receive structured findings.
|
|
23
|
-
|
|
24
|
-
## When to Use
|
|
25
|
-
|
|
26
|
-
**Use when:**
|
|
27
|
-
- A test fails / build breaks / runtime behavior is unexpected
|
|
28
|
-
- The root cause likely spans 2+ files
|
|
29
|
-
- You have a hypothesis to test (or want the worker to suggest one)
|
|
30
|
-
- A methodology skill (`superpowers:systematic-debugging`) routed here
|
|
31
|
-
|
|
32
|
-
**Don't use when:**
|
|
33
|
-
- The error message points at one file you can read in 30 seconds → just `Read`
|
|
34
|
-
- You don't know what's broken yet → use `mma-investigate` first to map the area
|
|
35
|
-
- You already know the fix → skip debug, dispatch `mma-delegate` with the fix
|
|
36
|
-
|
|
37
|
-
## Endpoint
|
|
38
|
-
|
|
39
|
-
`POST /debug?cwd=<abs-path>`
|
|
40
|
-
|
|
41
|
-
@include _shared/auth.md
|
|
42
|
-
|
|
43
|
-
## Request body
|
|
44
|
-
|
|
45
|
-
```json
|
|
46
|
-
{
|
|
47
|
-
"problem": "POST /login returns 500 when password contains special characters",
|
|
48
|
-
"context": "Regression introduced in commit abc123; only affects production config",
|
|
49
|
-
"hypothesis": "The bcrypt binding fails on non-ASCII input in the Docker image",
|
|
50
|
-
"subtype": "default",
|
|
51
|
-
"filePaths": [
|
|
52
|
-
"/project/src/auth/login.ts",
|
|
53
|
-
"/project/src/auth/password.ts"
|
|
54
|
-
],
|
|
55
|
-
"contextBlockIds": []
|
|
56
|
-
}
|
|
57
|
-
```
|
|
58
|
-
|
|
59
|
-
| Field | Type | Required | Notes |
|
|
60
|
-
|---|---|---|---|
|
|
61
|
-
| `problem` | string | yes | What is broken (one sentence; concrete symptom) |
|
|
62
|
-
| `context` | string | no | Background — what changed recently, what works, what doesn't |
|
|
63
|
-
| `hypothesis` | string | no | Your initial theory; worker tests it first, then explores |
|
|
64
|
-
| `subtype` | `'default'` | no (defaults to `'default'`) | Reserved for future criteria sets; only `default` is wired today. |
|
|
65
|
-
| `filePaths` | string[] | no | All files investigated together (cross-file reasoning) |
|
|
66
|
-
| `contextBlockIds` | string[] | no | IDs from `mma-context-blocks` (e.g. error logs, traces) |
|
|
67
|
-
|
|
68
|
-
> Worker tier for `mma-debug` is hardcoded to `complex` and is not caller-configurable. Sending `agentType` is rejected with HTTP 400.
|
|
69
|
-
|
|
70
|
-
## Full example
|
|
71
|
-
|
|
72
|
-
```bash
|
|
73
|
-
BATCH=$(curl -f --show-error -s -X POST \
|
|
74
|
-
-H "X-MMA-Client: $MMA_CLIENT" \
|
|
75
|
-
-H "X-MMA-Main-Model: $MMA_MAIN_MODEL" \
|
|
76
|
-
-H "Authorization: Bearer $TOKEN" \
|
|
77
|
-
-H "Content-Type: application/json" \
|
|
78
|
-
-d '{"problem":"Tests fail on CI only","hypothesis":"Missing env var","filePaths":["/project/src/config.ts"]}' \
|
|
79
|
-
"http://localhost:$PORT/debug?cwd=/project")
|
|
80
|
-
BATCH_ID=$(echo "$BATCH" | jq -r '.batchId')
|
|
81
|
-
```
|
|
82
|
-
|
|
83
|
-
@include _shared/polling.md
|
|
84
|
-
|
|
85
|
-
@include _shared/response-shape.md
|
|
86
|
-
|
|
87
|
-
## Reading the findings
|
|
88
|
-
|
|
89
|
-
The main agent reads `completed` + `message` + `findings` — the findings are the answer. For
|
|
90
|
-
read-only routes, `filesChanged` is always `[]` and `commitSha` is always `null`.
|
|
91
|
-
|
|
92
|
-
```json
|
|
93
|
-
{
|
|
94
|
-
"completed": true,
|
|
95
|
-
"message": "Investigation complete; 1 finding.",
|
|
96
|
-
"findings": [
|
|
97
|
-
{ "id": "F1", "severity": "high", "category": "root-cause",
|
|
98
|
-
"claim": "bcrypt binding fails on non-ASCII input in the Docker image.",
|
|
99
|
-
"evidence": "Worker reproduced the failure with `pass='café'`; strace shows EINVAL on encode call.",
|
|
100
|
-
"suggestion": "Normalize input to NFC form before calling bcrypt.",
|
|
101
|
-
"source": "implementer" }
|
|
102
|
-
],
|
|
103
|
-
"filesChanged": [],
|
|
104
|
-
"commitSha": null,
|
|
105
|
-
"summary": "...",
|
|
106
|
-
"telemetry": { ... }
|
|
107
|
-
}
|
|
108
|
-
```
|
|
109
|
-
|
|
110
|
-
### Finding shape
|
|
111
|
-
|
|
112
|
-
Every finding has this shape:
|
|
113
|
-
|
|
114
|
-
| Field | Type | Notes |
|
|
115
|
-
|---|---|---|
|
|
116
|
-
| `id` | string | Worker-assigned, e.g. `F1`, `F2`. Stable across chain. |
|
|
117
|
-
| `severity` | `'critical' \| 'high' \| 'medium' \| 'low'` | 4-tier. |
|
|
118
|
-
| `category` | string | Topical bucket, e.g. `root-cause`, `reproduction`. |
|
|
119
|
-
| `claim` | string | One-sentence summary. |
|
|
120
|
-
| `evidence` | string ≥20 chars | Verbatim from source when grounded. |
|
|
121
|
-
| `suggestion?` | string | Optional fix recommendation. |
|
|
122
|
-
| `source` | `'implementer' \| 'reviewer'` | Who produced the finding. |
|
|
123
|
-
|
|
124
|
-
`annotatorConfidence` and `evidenceGrounded` are retired — they were v4 fields with no producers.
|
|
125
|
-
|
|
126
|
-
### Recommended rendering by the main agent
|
|
127
|
-
|
|
128
|
-
1. Show ALL findings — never silently drop. Severity and grounding are soft
|
|
129
|
-
signals, not gates.
|
|
130
|
-
2. Default sort: severity (critical → low), then `id` ascending.
|
|
131
|
-
3. `severity` is the authoritative value — use it directly.
|
|
132
|
-
4. Mark findings with `evidence` shorter than 30 chars as "low-evidence"
|
|
133
|
-
(lighter color or `(low evidence)` annotation). User decides what to do.
|
|
134
|
-
5. Severity-tier counts feed the dashboard.
|
|
135
|
-
|
|
136
|
-
## Best practices
|
|
137
|
-
|
|
138
|
-
This skill is one step in the larger flow described in `multi-model-agent` → "Best practices". Recipes that involve `mma-debug`:
|
|
139
|
-
|
|
140
|
-
- **Recipe B — Debug-fix-review.** `mma-debug` → `mma-delegate` (apply fix) → `mma-review` with the acceptance criteria in the brief. Strict order. Register the failing test output / reproduction log as a context block before the debug call; reuse it on the review call.
|
|
141
|
-
|
|
142
|
-
Anti-pattern alert: **`inline-labor-leakage`** (AP2). If you're about to read 3+ files in main context to "understand the bug," that's the labor we delegate — call `mma-debug` with the hypothesis instead.
|
|
143
|
-
|
|
144
|
-
## Common pitfalls
|
|
145
|
-
|
|
146
|
-
❌ **Vague `problem`**
|
|
147
|
-
> "The login is broken"
|
|
148
|
-
|
|
149
|
-
Worker has no symptom to chase. **Fix:** specific reproducer — `"POST /login with body {user:'a@b.c', pass:'café'} returns 500 with 'invalid character' in stderr"`.
|
|
150
|
-
|
|
151
|
-
❌ **No `hypothesis`**
|
|
152
|
-
The worker explores blindly, often investigates the wrong area first. **Fix:** even a weak hypothesis ("might be encoding-related") narrows the search space.
|
|
153
|
-
|
|
154
|
-
❌ **Splitting one bug across multiple `mma-debug` calls**
|
|
155
|
-
Debug intentionally bundles `filePaths` for cross-file reasoning. Splitting defeats this. **Fix:** one call with all suspect files; if you really have N independent failures, use `mma-delegate` with N tasks.
|
|
156
|
-
|
|
157
|
-
❌ **Treating `mma-debug` as the fix step**
|
|
158
|
-
Debug investigates and proposes; it doesn't necessarily write the fix. **Fix:** if the worker identifies a fix, dispatch `mma-delegate` to implement it (or write it inline if you understand it).
|
|
159
|
-
|
|
160
|
-
❌ **Skipping when an error message looks self-explanatory**
|
|
161
|
-
Often the obvious cause isn't the real one. **Fix:** a 30-second debug pass costs less than a wrong fix that breaks something else.
|
|
162
|
-
|
|
163
|
-
## Terminal context block
|
|
164
|
-
|
|
165
|
-
Every completed **read-route** task (audit / review / debug / investigate / research) auto-registers a reusable terminal context block containing its report (headline + findings). The block id is returned on each per-task result as **`contextBlockId`**. Write routes (delegate / execute-plan / retry) return `contextBlockId: null` — their record is the commit, not a block. This block is immutable, lives for the session duration, and counts against the project's `maxEntries` quota (default 500).
|
|
166
|
-
|
|
167
|
-
Use it for delta follow-ups — feed prior results' block ids into a later call's `contextBlockIds`, filtering out nulls:
|
|
168
|
-
|
|
169
|
-
contextBlockIds: priorResults.map(r => r.contextBlockId).filter((id) => id !== null)
|
|
170
|
-
|
|
171
|
-
**Use cases:**
|
|
172
|
-
- Pass debug findings to a downstream `mma-delegate` fix step
|
|
173
|
-
- Feed the root-cause analysis into a follow-up `mma-review` with acceptance criteria in the brief
|
|
174
|
-
- Carry debug context forward through the debug → fix → review chain
|
|
175
|
-
|
|
176
|
-
The block is registered server-side at task completion; no caller action is needed to create it. Delete it explicitly via `DELETE /context-blocks/:id` when no longer needed, or let it expire on session teardown.
|
|
177
|
-
|
|
178
|
-
## Outcome semantics
|
|
179
|
-
|
|
180
|
-
Every task result carries outcome fields that describe the debugging investigation's conclusion status:
|
|
181
|
-
|
|
182
|
-
| Field | Type | Meaning |
|
|
183
|
-
|---|---|---|
|
|
184
|
-
| `findingsOutcome` | `'found' \| 'clean' \| 'not_applicable'` | Answers the question: did the investigation identify a root cause? |
|
|
185
|
-
| `findingsOutcomeReason` | `string \| null` | When `findingsOutcome` is set, this explains why (e.g. "Root cause identified with high confidence: bcrypt binding fails on non-ASCII input" or "No evidence supports the hypothesis; root cause remains unknown"). |
|
|
186
|
-
| `outcomeInferred` | `boolean` | `true` if the system inferred the outcome from findings count; `false` if the investigator explicitly stated it. |
|
|
187
|
-
| `outcomeMalformed` | `boolean` | `true` if the outcome line was malformed and had to be repaired; `false` otherwise. |
|
|
188
|
-
|
|
189
|
-
### Enum values
|
|
190
|
-
|
|
191
|
-
- **`found`** — the investigation identified one or more root-cause hypotheses (findings) with supporting evidence. This indicates the problem has a diagnosed cause.
|
|
192
|
-
- **`clean`** — the investigation completed but found zero root causes. This is rare for debug and indicates the failure remains unexplained despite thorough investigation.
|
|
193
|
-
- **`not_applicable`** — the investigation could not proceed (e.g., inability to reproduce the failure, missing context, or out of scope). This is the "unable to diagnose" state.
|
|
194
|
-
|
|
195
|
-
### Empty findings ≠ failure
|
|
196
|
-
|
|
197
|
-
A crucial semantic: **empty findings does NOT mean `completed: false` or a failed debug session.** An investigation that proceeds thoroughly and produces zero root-cause candidates is a valid `completed: true` outcome; it means "I looked hard and found nothing." For debug, this often surfaces a `not_applicable` outcome instead (root cause is elsewhere), but zero findings is still a success.
|
|
198
|
-
|
|
199
|
-
### Per-route legal outcomes
|
|
200
|
-
|
|
201
|
-
The legal outcomes for this route are: `['found', 'not_applicable']`
|
|
202
|
-
|
|
203
|
-
- **`found`** — one or more root-cause hypotheses were identified across the investigation criteria.
|
|
204
|
-
- **`not_applicable`** — the failure could not be diagnosed (reproduction failed, wrong area, or scope issue).
|
|
205
|
-
|
|
206
|
-
The outcome `clean` (zero findings + success) is not legal for `mma-debug` because a debug session always either identifies a root cause or cannot proceed.
|
|
207
|
-
|
|
208
|
-
@include _shared/error-handling.md
|