@zhixuan92/multi-model-agent 3.10.3 → 3.10.5
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +2 -2
- package/dist/skills/mma-audit/SKILL.md +37 -21
- package/dist/skills/mma-clarifications/SKILL.md +1 -1
- package/dist/skills/mma-context-blocks/SKILL.md +1 -1
- package/dist/skills/mma-debug/SKILL.md +37 -21
- package/dist/skills/mma-delegate/SKILL.md +1 -1
- package/dist/skills/mma-execute-plan/SKILL.md +1 -1
- package/dist/skills/mma-investigate/SKILL.md +37 -21
- package/dist/skills/mma-retry/SKILL.md +1 -1
- package/dist/skills/mma-review/SKILL.md +37 -21
- package/dist/skills/mma-verify/SKILL.md +37 -21
- package/dist/skills/multi-model-agent/SKILL.md +1 -1
- package/dist/telemetry/recorder.js +28 -13
- package/dist/telemetry/recorder.js.map +1 -1
- package/package.json +2 -2
package/README.md
CHANGED
|
@@ -84,7 +84,7 @@ Two ways — pick one:
|
|
|
84
84
|
|
|
85
85
|
```bash
|
|
86
86
|
mmagent serve # 127.0.0.1:7337 by default
|
|
87
|
-
curl -s http://localhost:7337/health # → {"ok":true,"version":"3.10.
|
|
87
|
+
curl -s http://localhost:7337/health # → {"ok":true,"version":"3.10.4",...}
|
|
88
88
|
```
|
|
89
89
|
|
|
90
90
|
For an always-on background install (survives reboots): [launchd / systemd templates](./scripts/README.md).
|
|
@@ -287,7 +287,7 @@ Full design rationale: [DIRECTION.md](https://github.com/zhixuan312/multi-model-
|
|
|
287
287
|
|
|
288
288
|
## What's new
|
|
289
289
|
|
|
290
|
-
Latest: **3.10.
|
|
290
|
+
Latest: **3.10.4** — review stages were recording the implementer's model (V3 R3 violation root cause). Now record the actual reviewer's resolved tier + model. Plus: telemetry validation is fully warn-only — events never drop, and cross-field warnings now include actual offending values (model, tokens, totals) so config issues vs lifecycle bugs are distinguishable at a glance. Full history: [CHANGELOG](https://github.com/zhixuan312/multi-model-agent/blob/master/CHANGELOG.md).
|
|
291
291
|
|
|
292
292
|
## Full documentation
|
|
293
293
|
|
|
@@ -8,7 +8,7 @@ when_to_use: >-
|
|
|
8
8
|
User asks for a doc/spec/config audit OR a methodology skill
|
|
9
9
|
(superpowers:dispatching-parallel-agents, /security-review) points at one AND
|
|
10
10
|
mmagent is running. Audit on PROSE/SPEC docs — use mma-review for source code.
|
|
11
|
-
version: 3.10.
|
|
11
|
+
version: 3.10.5
|
|
12
12
|
---
|
|
13
13
|
|
|
14
14
|
# mma-audit
|
|
@@ -72,26 +72,42 @@ BATCH_ID=$(echo "$BATCH" | jq -r '.batchId')
|
|
|
72
72
|
|
|
73
73
|
@include _shared/response-shape.md
|
|
74
74
|
|
|
75
|
-
## Reading the
|
|
76
|
-
|
|
77
|
-
The terminal envelope
|
|
78
|
-
|
|
79
|
-
|
|
80
|
-
|
|
81
|
-
|
|
82
|
-
|
|
83
|
-
|
|
84
|
-
|
|
85
|
-
|
|
86
|
-
|
|
87
|
-
|
|
88
|
-
|
|
89
|
-
|
|
90
|
-
|
|
91
|
-
|
|
92
|
-
|
|
93
|
-
- `
|
|
94
|
-
|
|
75
|
+
## Reading the findings (3.10.5+)
|
|
76
|
+
|
|
77
|
+
The terminal envelope's `results[N].annotatedFindings` is a list of structured
|
|
78
|
+
findings the reviewer extracted and scored from the implementer's narrative.
|
|
79
|
+
Every finding has the same shape:
|
|
80
|
+
|
|
81
|
+
| Field | Type | Notes |
|
|
82
|
+
|---|---|---|
|
|
83
|
+
| `id` | string | Reviewer-assigned, e.g. `F1`, `F2`. |
|
|
84
|
+
| `severity` | `'critical' \| 'high' \| 'medium' \| 'low'` | 4-tier. |
|
|
85
|
+
| `claim` | string | One-sentence summary. |
|
|
86
|
+
| `evidence` | string ≥20 chars | Quoted from worker output when grounded. |
|
|
87
|
+
| `suggestion?` | string | Optional fix recommendation. |
|
|
88
|
+
| `reviewerConfidence` | `number \| null` | 0–100 from the reviewer; `null` when emitted via deterministic fallback. |
|
|
89
|
+
| `evidenceGrounded` | boolean | True when `evidence` is a verbatim substring of worker output. |
|
|
90
|
+
|
|
91
|
+
### Verdict states (`qualityReviewVerdict`)
|
|
92
|
+
|
|
93
|
+
- `'annotated'` — every finding is structured. May be reviewer-emitted (with
|
|
94
|
+
numeric `reviewerConfidence`) or deterministic-fallback (with
|
|
95
|
+
`reviewerConfidence: null`). The route ALWAYS reaches `'annotated'` unless
|
|
96
|
+
the reviewer call itself fails transport.
|
|
97
|
+
- `'skipped'` — kill switch (`MMAGENT_READ_ONLY_REVIEW=disabled`).
|
|
98
|
+
- `'error'` — only when the reviewer call fails transport (network / 5xx).
|
|
99
|
+
|
|
100
|
+
### Recommended rendering by the main agent
|
|
101
|
+
|
|
102
|
+
1. Show ALL findings — never silently drop. Confidence and grounding are
|
|
103
|
+
soft signals, not gates.
|
|
104
|
+
2. Default sort: severity (critical → low) then `reviewerConfidence` desc
|
|
105
|
+
(nulls last).
|
|
106
|
+
3. `severity` is the reviewer's authoritative final value — use it directly.
|
|
107
|
+
4. Mark findings with `evidenceGrounded: false` or
|
|
108
|
+
`reviewerConfidence < 70` as "lower-trust" (collapsed section, lighter
|
|
109
|
+
color, or `(low confidence)` annotation). User decides what to do.
|
|
110
|
+
5. Severity-tier counts feed the dashboard via V3 `findingsBySeverity`.
|
|
95
111
|
|
|
96
112
|
## Best practices
|
|
97
113
|
|
|
@@ -12,7 +12,7 @@ when_to_use: >-
|
|
|
12
12
|
`proposedInterpretation` is a hard gate — the batch is paused, not
|
|
13
13
|
informational. The batch will not complete until the caller responds. Treating
|
|
14
14
|
it as advisory is the clarification-as-info anti-pattern (AP5).
|
|
15
|
-
version: 3.10.
|
|
15
|
+
version: 3.10.5
|
|
16
16
|
---
|
|
17
17
|
|
|
18
18
|
# mma-clarifications
|
|
@@ -12,7 +12,7 @@ when_to_use: >-
|
|
|
12
12
|
Register once here, then pass the ID via `contextBlockIds` on mma-delegate /
|
|
13
13
|
mma-execute-plan / mma-audit / mma-review / mma-verify / mma-debug /
|
|
14
14
|
mma-investigate. Cheaper and faster than inlining the same content N times.
|
|
15
|
-
version: 3.10.
|
|
15
|
+
version: 3.10.5
|
|
16
16
|
---
|
|
17
17
|
|
|
18
18
|
# mma-context-blocks
|
|
@@ -10,7 +10,7 @@ when_to_use: >-
|
|
|
10
10
|
read files, reproduce, trace — OR a methodology skill
|
|
11
11
|
(superpowers:systematic-debugging) points at the investigation step. Delegate
|
|
12
12
|
the read/reproduce/trace; the main agent stays on the hypothesis and the fix.
|
|
13
|
-
version: 3.10.
|
|
13
|
+
version: 3.10.5
|
|
14
14
|
---
|
|
15
15
|
|
|
16
16
|
# mma-debug
|
|
@@ -78,26 +78,42 @@ BATCH_ID=$(echo "$BATCH" | jq -r '.batchId')
|
|
|
78
78
|
|
|
79
79
|
@include _shared/response-shape.md
|
|
80
80
|
|
|
81
|
-
## Reading the
|
|
82
|
-
|
|
83
|
-
The terminal envelope
|
|
84
|
-
|
|
85
|
-
|
|
86
|
-
|
|
87
|
-
|
|
88
|
-
|
|
89
|
-
|
|
90
|
-
|
|
91
|
-
|
|
92
|
-
|
|
93
|
-
|
|
94
|
-
|
|
95
|
-
|
|
96
|
-
|
|
97
|
-
|
|
98
|
-
|
|
99
|
-
- `
|
|
100
|
-
|
|
81
|
+
## Reading the findings (3.10.5+)
|
|
82
|
+
|
|
83
|
+
The terminal envelope's `results[N].annotatedFindings` is a list of structured
|
|
84
|
+
findings the reviewer extracted and scored from the implementer's narrative.
|
|
85
|
+
Every finding has the same shape:
|
|
86
|
+
|
|
87
|
+
| Field | Type | Notes |
|
|
88
|
+
|---|---|---|
|
|
89
|
+
| `id` | string | Reviewer-assigned, e.g. `F1`, `F2`. |
|
|
90
|
+
| `severity` | `'critical' \| 'high' \| 'medium' \| 'low'` | 4-tier. |
|
|
91
|
+
| `claim` | string | One-sentence summary. |
|
|
92
|
+
| `evidence` | string ≥20 chars | Quoted from worker output when grounded. |
|
|
93
|
+
| `suggestion?` | string | Optional fix recommendation. |
|
|
94
|
+
| `reviewerConfidence` | `number \| null` | 0–100 from the reviewer; `null` when emitted via deterministic fallback. |
|
|
95
|
+
| `evidenceGrounded` | boolean | True when `evidence` is a verbatim substring of worker output. |
|
|
96
|
+
|
|
97
|
+
### Verdict states (`qualityReviewVerdict`)
|
|
98
|
+
|
|
99
|
+
- `'annotated'` — every finding is structured. May be reviewer-emitted (with
|
|
100
|
+
numeric `reviewerConfidence`) or deterministic-fallback (with
|
|
101
|
+
`reviewerConfidence: null`). The route ALWAYS reaches `'annotated'` unless
|
|
102
|
+
the reviewer call itself fails transport.
|
|
103
|
+
- `'skipped'` — kill switch (`MMAGENT_READ_ONLY_REVIEW=disabled`).
|
|
104
|
+
- `'error'` — only when the reviewer call fails transport (network / 5xx).
|
|
105
|
+
|
|
106
|
+
### Recommended rendering by the main agent
|
|
107
|
+
|
|
108
|
+
1. Show ALL findings — never silently drop. Confidence and grounding are
|
|
109
|
+
soft signals, not gates.
|
|
110
|
+
2. Default sort: severity (critical → low) then `reviewerConfidence` desc
|
|
111
|
+
(nulls last).
|
|
112
|
+
3. `severity` is the reviewer's authoritative final value — use it directly.
|
|
113
|
+
4. Mark findings with `evidenceGrounded: false` or
|
|
114
|
+
`reviewerConfidence < 70` as "lower-trust" (collapsed section, lighter
|
|
115
|
+
color, or `(low confidence)` annotation). User decides what to do.
|
|
116
|
+
5. Severity-tier counts feed the dashboard via V3 `findingsBySeverity`.
|
|
101
117
|
|
|
102
118
|
## Best practices
|
|
103
119
|
|
|
@@ -11,7 +11,7 @@ when_to_use: >-
|
|
|
11
11
|
and keep main context free. If a plan file exists → use mma-execute-plan. If
|
|
12
12
|
the task is audit / review / verify / debug / investigate → use the matching
|
|
13
13
|
specialized skill.
|
|
14
|
-
version: 3.10.
|
|
14
|
+
version: 3.10.5
|
|
15
15
|
---
|
|
16
16
|
|
|
17
17
|
# mma-delegate
|
|
@@ -10,7 +10,7 @@ when_to_use: >-
|
|
|
10
10
|
superpowers:subagent-driven-development / superpowers:executing-plans —
|
|
11
11
|
workers are cheaper and don't pollute main context. Task descriptors must
|
|
12
12
|
match plan headings verbatim.
|
|
13
|
-
version: 3.10.
|
|
13
|
+
version: 3.10.5
|
|
14
14
|
---
|
|
15
15
|
|
|
16
16
|
# mma-execute-plan
|
|
@@ -12,7 +12,7 @@ when_to_use: >-
|
|
|
12
12
|
git-history queries. OR you are about to read 3+ files / run any grep in main
|
|
13
13
|
context — that's the inline-labor-leakage anti-pattern (AP2); delegate to this
|
|
14
14
|
skill instead.
|
|
15
|
-
version: 3.10.
|
|
15
|
+
version: 3.10.5
|
|
16
16
|
---
|
|
17
17
|
|
|
18
18
|
# mma-investigate
|
|
@@ -124,26 +124,42 @@ Each task carries an `investigation` field on its per-task report:
|
|
|
124
124
|
|
|
125
125
|
`workerStatus` is one of `done`, `done_with_concerns`, `needs_context`, `blocked`. When `done_with_concerns`, the per-task report carries `incompleteReason` (`turn_cap`, `cost_cap`, `timeout`, or `missing_sections`). When `needs_context`, the worker flagged a `[needs_context]` bullet under `## Unresolved` — re-dispatch with extra context (anchor paths, a context block, or a clarification turn).
|
|
126
126
|
|
|
127
|
-
## Reading the
|
|
128
|
-
|
|
129
|
-
The terminal envelope
|
|
130
|
-
|
|
131
|
-
|
|
132
|
-
|
|
133
|
-
|
|
134
|
-
|
|
135
|
-
|
|
136
|
-
|
|
137
|
-
|
|
138
|
-
|
|
139
|
-
|
|
140
|
-
|
|
141
|
-
|
|
142
|
-
|
|
143
|
-
|
|
144
|
-
|
|
145
|
-
- `
|
|
146
|
-
|
|
127
|
+
## Reading the findings (3.10.5+)
|
|
128
|
+
|
|
129
|
+
The terminal envelope's `results[N].annotatedFindings` is a list of structured
|
|
130
|
+
findings the reviewer extracted and scored from the implementer's narrative.
|
|
131
|
+
Every finding has the same shape:
|
|
132
|
+
|
|
133
|
+
| Field | Type | Notes |
|
|
134
|
+
|---|---|---|
|
|
135
|
+
| `id` | string | Reviewer-assigned, e.g. `F1`, `F2`. |
|
|
136
|
+
| `severity` | `'critical' \| 'high' \| 'medium' \| 'low'` | 4-tier. |
|
|
137
|
+
| `claim` | string | One-sentence summary. |
|
|
138
|
+
| `evidence` | string ≥20 chars | Quoted from worker output when grounded. |
|
|
139
|
+
| `suggestion?` | string | Optional fix recommendation. |
|
|
140
|
+
| `reviewerConfidence` | `number \| null` | 0–100 from the reviewer; `null` when emitted via deterministic fallback. |
|
|
141
|
+
| `evidenceGrounded` | boolean | True when `evidence` is a verbatim substring of worker output. |
|
|
142
|
+
|
|
143
|
+
### Verdict states (`qualityReviewVerdict`)
|
|
144
|
+
|
|
145
|
+
- `'annotated'` — every finding is structured. May be reviewer-emitted (with
|
|
146
|
+
numeric `reviewerConfidence`) or deterministic-fallback (with
|
|
147
|
+
`reviewerConfidence: null`). The route ALWAYS reaches `'annotated'` unless
|
|
148
|
+
the reviewer call itself fails transport.
|
|
149
|
+
- `'skipped'` — kill switch (`MMAGENT_READ_ONLY_REVIEW=disabled`).
|
|
150
|
+
- `'error'` — only when the reviewer call fails transport (network / 5xx).
|
|
151
|
+
|
|
152
|
+
### Recommended rendering by the main agent
|
|
153
|
+
|
|
154
|
+
1. Show ALL findings — never silently drop. Confidence and grounding are
|
|
155
|
+
soft signals, not gates.
|
|
156
|
+
2. Default sort: severity (critical → low) then `reviewerConfidence` desc
|
|
157
|
+
(nulls last).
|
|
158
|
+
3. `severity` is the reviewer's authoritative final value — use it directly.
|
|
159
|
+
4. Mark findings with `evidenceGrounded: false` or
|
|
160
|
+
`reviewerConfidence < 70` as "lower-trust" (collapsed section, lighter
|
|
161
|
+
color, or `(low confidence)` annotation). User decides what to do.
|
|
162
|
+
5. Severity-tier counts feed the dashboard via V3 `findingsBySeverity`.
|
|
147
163
|
|
|
148
164
|
## Best practices
|
|
149
165
|
|
|
@@ -10,7 +10,7 @@ when_to_use: >-
|
|
|
10
10
|
you want to re-try the failed indices only. Prefer this over re-dispatching
|
|
11
11
|
the whole batch or inline-retrying — it's idempotent and preserves the
|
|
12
12
|
original batch's diagnostics.
|
|
13
|
-
version: 3.10.
|
|
13
|
+
version: 3.10.5
|
|
14
14
|
---
|
|
15
15
|
|
|
16
16
|
# mma-retry
|
|
@@ -10,7 +10,7 @@ when_to_use: >-
|
|
|
10
10
|
AND mmagent is running. Delegate so each file reviews on its own worker; the
|
|
11
11
|
main agent only decides what to merge. Review on SOURCE CODE — use mma-audit
|
|
12
12
|
for prose specs / configs.
|
|
13
|
-
version: 3.10.
|
|
13
|
+
version: 3.10.5
|
|
14
14
|
---
|
|
15
15
|
|
|
16
16
|
# mma-review
|
|
@@ -75,26 +75,42 @@ BATCH_ID=$(echo "$BATCH" | jq -r '.batchId')
|
|
|
75
75
|
|
|
76
76
|
@include _shared/response-shape.md
|
|
77
77
|
|
|
78
|
-
## Reading the
|
|
79
|
-
|
|
80
|
-
The terminal envelope
|
|
81
|
-
|
|
82
|
-
|
|
83
|
-
|
|
84
|
-
|
|
85
|
-
|
|
86
|
-
|
|
87
|
-
|
|
88
|
-
|
|
89
|
-
|
|
90
|
-
|
|
91
|
-
|
|
92
|
-
|
|
93
|
-
|
|
94
|
-
|
|
95
|
-
|
|
96
|
-
- `
|
|
97
|
-
|
|
78
|
+
## Reading the findings (3.10.5+)
|
|
79
|
+
|
|
80
|
+
The terminal envelope's `results[N].annotatedFindings` is a list of structured
|
|
81
|
+
findings the reviewer extracted and scored from the implementer's narrative.
|
|
82
|
+
Every finding has the same shape:
|
|
83
|
+
|
|
84
|
+
| Field | Type | Notes |
|
|
85
|
+
|---|---|---|
|
|
86
|
+
| `id` | string | Reviewer-assigned, e.g. `F1`, `F2`. |
|
|
87
|
+
| `severity` | `'critical' \| 'high' \| 'medium' \| 'low'` | 4-tier. |
|
|
88
|
+
| `claim` | string | One-sentence summary. |
|
|
89
|
+
| `evidence` | string ≥20 chars | Quoted from worker output when grounded. |
|
|
90
|
+
| `suggestion?` | string | Optional fix recommendation. |
|
|
91
|
+
| `reviewerConfidence` | `number \| null` | 0–100 from the reviewer; `null` when emitted via deterministic fallback. |
|
|
92
|
+
| `evidenceGrounded` | boolean | True when `evidence` is a verbatim substring of worker output. |
|
|
93
|
+
|
|
94
|
+
### Verdict states (`qualityReviewVerdict`)
|
|
95
|
+
|
|
96
|
+
- `'annotated'` — every finding is structured. May be reviewer-emitted (with
|
|
97
|
+
numeric `reviewerConfidence`) or deterministic-fallback (with
|
|
98
|
+
`reviewerConfidence: null`). The route ALWAYS reaches `'annotated'` unless
|
|
99
|
+
the reviewer call itself fails transport.
|
|
100
|
+
- `'skipped'` — kill switch (`MMAGENT_READ_ONLY_REVIEW=disabled`).
|
|
101
|
+
- `'error'` — only when the reviewer call fails transport (network / 5xx).
|
|
102
|
+
|
|
103
|
+
### Recommended rendering by the main agent
|
|
104
|
+
|
|
105
|
+
1. Show ALL findings — never silently drop. Confidence and grounding are
|
|
106
|
+
soft signals, not gates.
|
|
107
|
+
2. Default sort: severity (critical → low) then `reviewerConfidence` desc
|
|
108
|
+
(nulls last).
|
|
109
|
+
3. `severity` is the reviewer's authoritative final value — use it directly.
|
|
110
|
+
4. Mark findings with `evidenceGrounded: false` or
|
|
111
|
+
`reviewerConfidence < 70` as "lower-trust" (collapsed section, lighter
|
|
112
|
+
color, or `(low confidence)` annotation). User decides what to do.
|
|
113
|
+
5. Severity-tier counts feed the dashboard via V3 `findingsBySeverity`.
|
|
98
114
|
|
|
99
115
|
## Best practices
|
|
100
116
|
|
|
@@ -10,7 +10,7 @@ when_to_use: >-
|
|
|
10
10
|
against implemented work BEFORE claiming success. Delegate so each checklist
|
|
11
11
|
item gets independent evidence-gathering on a worker. Use this BEFORE saying
|
|
12
12
|
"done" — never after.
|
|
13
|
-
version: 3.10.
|
|
13
|
+
version: 3.10.5
|
|
14
14
|
---
|
|
15
15
|
|
|
16
16
|
# mma-verify
|
|
@@ -76,26 +76,42 @@ BATCH_ID=$(echo "$BATCH" | jq -r '.batchId')
|
|
|
76
76
|
|
|
77
77
|
@include _shared/response-shape.md
|
|
78
78
|
|
|
79
|
-
## Reading the
|
|
80
|
-
|
|
81
|
-
The terminal envelope
|
|
82
|
-
|
|
83
|
-
|
|
84
|
-
|
|
85
|
-
|
|
86
|
-
|
|
87
|
-
|
|
88
|
-
|
|
89
|
-
|
|
90
|
-
|
|
91
|
-
|
|
92
|
-
|
|
93
|
-
|
|
94
|
-
|
|
95
|
-
|
|
96
|
-
|
|
97
|
-
- `
|
|
98
|
-
|
|
79
|
+
## Reading the findings (3.10.5+)
|
|
80
|
+
|
|
81
|
+
The terminal envelope's `results[N].annotatedFindings` is a list of structured
|
|
82
|
+
findings the reviewer extracted and scored from the implementer's narrative.
|
|
83
|
+
Every finding has the same shape:
|
|
84
|
+
|
|
85
|
+
| Field | Type | Notes |
|
|
86
|
+
|---|---|---|
|
|
87
|
+
| `id` | string | Reviewer-assigned, e.g. `F1`, `F2`. |
|
|
88
|
+
| `severity` | `'critical' \| 'high' \| 'medium' \| 'low'` | 4-tier. |
|
|
89
|
+
| `claim` | string | One-sentence summary. |
|
|
90
|
+
| `evidence` | string ≥20 chars | Quoted from worker output when grounded. |
|
|
91
|
+
| `suggestion?` | string | Optional fix recommendation. |
|
|
92
|
+
| `reviewerConfidence` | `number \| null` | 0–100 from the reviewer; `null` when emitted via deterministic fallback. |
|
|
93
|
+
| `evidenceGrounded` | boolean | True when `evidence` is a verbatim substring of worker output. |
|
|
94
|
+
|
|
95
|
+
### Verdict states (`qualityReviewVerdict`)
|
|
96
|
+
|
|
97
|
+
- `'annotated'` — every finding is structured. May be reviewer-emitted (with
|
|
98
|
+
numeric `reviewerConfidence`) or deterministic-fallback (with
|
|
99
|
+
`reviewerConfidence: null`). The route ALWAYS reaches `'annotated'` unless
|
|
100
|
+
the reviewer call itself fails transport.
|
|
101
|
+
- `'skipped'` — kill switch (`MMAGENT_READ_ONLY_REVIEW=disabled`).
|
|
102
|
+
- `'error'` — only when the reviewer call fails transport (network / 5xx).
|
|
103
|
+
|
|
104
|
+
### Recommended rendering by the main agent
|
|
105
|
+
|
|
106
|
+
1. Show ALL findings — never silently drop. Confidence and grounding are
|
|
107
|
+
soft signals, not gates.
|
|
108
|
+
2. Default sort: severity (critical → low) then `reviewerConfidence` desc
|
|
109
|
+
(nulls last).
|
|
110
|
+
3. `severity` is the reviewer's authoritative final value — use it directly.
|
|
111
|
+
4. Mark findings with `evidenceGrounded: false` or
|
|
112
|
+
`reviewerConfidence < 70` as "lower-trust" (collapsed section, lighter
|
|
113
|
+
color, or `(low confidence)` annotation). User decides what to do.
|
|
114
|
+
5. Severity-tier counts feed the dashboard via V3 `findingsBySeverity`.
|
|
99
115
|
|
|
100
116
|
## Best practices
|
|
101
117
|
|
|
@@ -11,7 +11,7 @@ when_to_use: >-
|
|
|
11
11
|
tasks — AND mmagent is running. Read this once, pick the matching mma-* skill,
|
|
12
12
|
and delegate there. Applies equally whether the user invoked a superpowers
|
|
13
13
|
methodology skill or asked directly.
|
|
14
|
-
version: 3.10.
|
|
14
|
+
version: 3.10.5
|
|
15
15
|
---
|
|
16
16
|
|
|
17
17
|
# multi-model-agent (router)
|
|
@@ -67,29 +67,44 @@ function _buildRecorder(opts) {
|
|
|
67
67
|
if (!d.enabled)
|
|
68
68
|
return;
|
|
69
69
|
const event = buildTaskCompletedEvent(ctx);
|
|
70
|
-
//
|
|
71
|
-
//
|
|
72
|
-
//
|
|
73
|
-
//
|
|
74
|
-
//
|
|
75
|
-
// 3.10.2's drop-on-superRefine behaviour silently lost real telemetry
|
|
76
|
-
// on 1ms clock-skew (R4) and other measurement-precision edge cases.
|
|
70
|
+
// Validation is INFORMATIONAL ONLY — never block emit. Backend uses
|
|
71
|
+
// passthrough so it stores everything; if mma drops events here, that
|
|
72
|
+
// data is gone forever and the user has no visibility into what was
|
|
73
|
+
// suppressed. 3.10.2's drop-on-fail design hid real telemetry from
|
|
74
|
+
// both operator and dashboard.
|
|
77
75
|
const baseParsed = TaskCompletedEventSchema.safeParse(event);
|
|
78
76
|
if (!baseParsed.success) {
|
|
79
|
-
console.warn('mma-telemetry:
|
|
77
|
+
console.warn('mma-telemetry: schema warning (event still emitted)', {
|
|
80
78
|
eventId: event.eventId,
|
|
81
79
|
issues: baseParsed.error.issues.map((e) => ({ path: e.path.join('.'), message: e.message })),
|
|
82
80
|
});
|
|
83
|
-
return;
|
|
84
81
|
}
|
|
85
|
-
const refined = ValidatedTaskCompletedEventSchema.safeParse(
|
|
82
|
+
const refined = ValidatedTaskCompletedEventSchema.safeParse(event);
|
|
86
83
|
if (!refined.success) {
|
|
87
|
-
|
|
84
|
+
// Surface the actual offending values alongside the rule name so the
|
|
85
|
+
// operator can tell at a glance whether the cause is config (wrong
|
|
86
|
+
// values) or code (lifecycle bug). Tag the most informative
|
|
87
|
+
// top-level fields plus per-stage models for the common R3/R5/R6
|
|
88
|
+
// cross-field cases.
|
|
89
|
+
const stageModelsByName = (event.stages ?? []).reduce((acc, s) => {
|
|
90
|
+
if (s.name && s.model)
|
|
91
|
+
acc[s.name] = s.model;
|
|
92
|
+
return acc;
|
|
93
|
+
}, {});
|
|
94
|
+
console.warn('mma-telemetry: cross-field warning (event still emitted)', {
|
|
88
95
|
eventId: event.eventId,
|
|
89
|
-
|
|
96
|
+
implementerModel: event.implementerModel,
|
|
97
|
+
stageModels: stageModelsByName,
|
|
98
|
+
totalDurationMs: event.totalDurationMs,
|
|
99
|
+
inputTokens: event.inputTokens,
|
|
100
|
+
outputTokens: event.outputTokens,
|
|
101
|
+
issues: refined.error.issues.map((e) => ({
|
|
102
|
+
rule: e.message,
|
|
103
|
+
path: e.path.join('.'),
|
|
104
|
+
})),
|
|
90
105
|
});
|
|
91
106
|
}
|
|
92
|
-
enqueue(
|
|
107
|
+
enqueue(event);
|
|
93
108
|
}
|
|
94
109
|
catch {
|
|
95
110
|
dropped++;
|
|
@@ -1 +1 @@
|
|
|
1
|
-
{"version":3,"file":"recorder.js","sourceRoot":"","sources":["../../src/telemetry/recorder.ts"],"names":[],"mappings":"AAAA,OAAO,EAAE,UAAU,EAAE,UAAU,EAAE,MAAM,SAAS,CAAC;AACjD,OAAO,EAAE,IAAI,EAAE,MAAM,WAAW,CAAC;AACjC,OAAO,EAAE,MAAM,EAAE,MAAM,cAAc,CAAC;AACtC,OAAO,EAAE,mBAAmB,EAAE,MAAM,eAAe,CAAC;AACpD,OAAO,EAAE,eAAe,EAAE,MAAM,iBAAiB,CAAC;AAClD,OAAO,EAAE,gBAAgB,EAAE,MAAM,mBAAmB,CAAC;AACrD,OAAO,EAAE,KAAK,EAAE,MAAM,YAAY,CAAC;AACnC,OAAO,EAAE,cAAc,EAAE,cAAc,EAAE,MAAM,iBAAiB,CAAC;AACjE,OAAO,EAAE,cAAc,EAAE,wBAAwB,EAAE,iCAAiC,EAAE,MAAM,mDAAmD,CAAC;AAChJ,OAAO,EACL,uBAAuB,GAExB,MAAM,2DAA2D,CAAC;AASnE,IAAI,SAAS,GAAoB,IAAI,CAAC;AAEtC,MAAM,UAAU,WAAW;IACzB,IAAI,CAAC,SAAS,EAAE,CAAC;QACf,MAAM,IAAI,KAAK,CAAC,sDAAsD,CAAC,CAAC;IAC1E,CAAC;IACD,OAAO,SAAS,CAAC;AACnB,CAAC;AAED,MAAM,UAAU,kBAAkB,CAAC,CAAW;IAC5C,SAAS,GAAG,CAAC,CAAC;AAChB,CAAC;AAED,MAAM,UAAU,cAAc,CAAC,IAAiD;IAC9E,MAAM,QAAQ,GAAG,cAAc,CAAC,IAAI,CAAC,CAAC;IACtC,SAAS,GAAG,QAAQ,CAAC;IACrB,OAAO,QAAQ,CAAC;AAClB,CAAC;AAED,SAAS,cAAc,CAAC,IAAiD;IACvE,MAAM,EAAE,OAAO,EAAE,cAAc,EAAE,GAAG,IAAI,CAAC;IACzC,MAAM,KAAK,GAAG,IAAI,KAAK,CAAC,OAAO,CAAC,CAAC;IACjC,MAAM,UAAU,GAAG,IAAI,eAAe,EAAE,CAAC;IACzC,IAAI,UAAU,GAAkB,IAAI,CAAC;IACrC,IAAI,OAAO,GAAG,CAAC,CAAC;IAEhB,MAAM,gBAAgB,GAAG,GAAW,EAAE;QACpC,IAAI,CAAC,UAAU,EAAE,CAAC;YAChB,UAAU,GAAG,mBAAmB,CAAC,OAAO,CAAC,CAAC,SAAS,CAAC;QACtD,CAAC;QACD,OAAO,UAAU,CAAC;IACpB,CAAC,CAAC;IAEF,MAAM,OAAO,GAAG,CAAC,KAA8B,EAAQ,EAAE;QACvD,IAAI,CAAC;YACH,MAAM,EAAE,GAAG,gBAAgB,EAAE,CAAC;YAC9B,MAAM,IAAI,GAAG,gBAAgB,CAAC,EAAE,SAAS,EAAE,EAAE,EAAE,cAAc,EAAE,CAAC,CAAC;YACjE,MAAM,GAAG,GAAG,cAAc,CAAC,OAAO,CAAC,CAAC;YAEpC,KAAK,CAAC,MAAM,CAAC;gBACX,aAAa,EAAE,cAAc;gBAC7B,SAAS,EAAE,IAAI,CAAC,SAAS;gBACzB,cAAc,EAAE,IAAI,CAAC,cAAc;gBACnC,EAAE,EAAE,IAAI,CAAC,EAAE;gBACX,SAAS,EAAE,IAAI,CAAC,SAAS;gBACzB,UAAU,EAAE,GAAG;gBACf,MAAM,EAAE,CAAC,KAAK,CAAC;aAChB,CAAC,CAAC,KAAK,CAAC,GAAG,EAAE;gBACZ,OAAO,EAAE,CAAC;YACZ,CAAC,CAAC,CAAC;QACL,CAAC;QAAC,MAAM,CAAC;YACP,OAAO,EAAE,CAAC;QACZ,CAAC;IACH,CAAC,CAAC;IAEF,OAAO;QACL,IAAI,MAAM;YACR,OAAO,UAAU,CAAC,MAAM,CAAC;QAC3B,CAAC;QAED,OAAO;QAEP,mBAAmB,CAAC,GAAG;YACrB,IAAI,CAAC;gBACH,MAAM,CAAC,GAAG,MAAM,CAAC,OAAO,CAAC,CAAC;gBAC1B,IAAI,CAAC,CAAC,CAAC,OAAO;oBAAE,OAAO;gBACvB,MAAM,KAAK,GAAG,uBAAuB,CAAC,GAAG,CAAC,CAAC;gBAC3C,oEAAoE;gBACpE,
|
|
1
|
+
{"version":3,"file":"recorder.js","sourceRoot":"","sources":["../../src/telemetry/recorder.ts"],"names":[],"mappings":"AAAA,OAAO,EAAE,UAAU,EAAE,UAAU,EAAE,MAAM,SAAS,CAAC;AACjD,OAAO,EAAE,IAAI,EAAE,MAAM,WAAW,CAAC;AACjC,OAAO,EAAE,MAAM,EAAE,MAAM,cAAc,CAAC;AACtC,OAAO,EAAE,mBAAmB,EAAE,MAAM,eAAe,CAAC;AACpD,OAAO,EAAE,eAAe,EAAE,MAAM,iBAAiB,CAAC;AAClD,OAAO,EAAE,gBAAgB,EAAE,MAAM,mBAAmB,CAAC;AACrD,OAAO,EAAE,KAAK,EAAE,MAAM,YAAY,CAAC;AACnC,OAAO,EAAE,cAAc,EAAE,cAAc,EAAE,MAAM,iBAAiB,CAAC;AACjE,OAAO,EAAE,cAAc,EAAE,wBAAwB,EAAE,iCAAiC,EAAE,MAAM,mDAAmD,CAAC;AAChJ,OAAO,EACL,uBAAuB,GAExB,MAAM,2DAA2D,CAAC;AASnE,IAAI,SAAS,GAAoB,IAAI,CAAC;AAEtC,MAAM,UAAU,WAAW;IACzB,IAAI,CAAC,SAAS,EAAE,CAAC;QACf,MAAM,IAAI,KAAK,CAAC,sDAAsD,CAAC,CAAC;IAC1E,CAAC;IACD,OAAO,SAAS,CAAC;AACnB,CAAC;AAED,MAAM,UAAU,kBAAkB,CAAC,CAAW;IAC5C,SAAS,GAAG,CAAC,CAAC;AAChB,CAAC;AAED,MAAM,UAAU,cAAc,CAAC,IAAiD;IAC9E,MAAM,QAAQ,GAAG,cAAc,CAAC,IAAI,CAAC,CAAC;IACtC,SAAS,GAAG,QAAQ,CAAC;IACrB,OAAO,QAAQ,CAAC;AAClB,CAAC;AAED,SAAS,cAAc,CAAC,IAAiD;IACvE,MAAM,EAAE,OAAO,EAAE,cAAc,EAAE,GAAG,IAAI,CAAC;IACzC,MAAM,KAAK,GAAG,IAAI,KAAK,CAAC,OAAO,CAAC,CAAC;IACjC,MAAM,UAAU,GAAG,IAAI,eAAe,EAAE,CAAC;IACzC,IAAI,UAAU,GAAkB,IAAI,CAAC;IACrC,IAAI,OAAO,GAAG,CAAC,CAAC;IAEhB,MAAM,gBAAgB,GAAG,GAAW,EAAE;QACpC,IAAI,CAAC,UAAU,EAAE,CAAC;YAChB,UAAU,GAAG,mBAAmB,CAAC,OAAO,CAAC,CAAC,SAAS,CAAC;QACtD,CAAC;QACD,OAAO,UAAU,CAAC;IACpB,CAAC,CAAC;IAEF,MAAM,OAAO,GAAG,CAAC,KAA8B,EAAQ,EAAE;QACvD,IAAI,CAAC;YACH,MAAM,EAAE,GAAG,gBAAgB,EAAE,CAAC;YAC9B,MAAM,IAAI,GAAG,gBAAgB,CAAC,EAAE,SAAS,EAAE,EAAE,EAAE,cAAc,EAAE,CAAC,CAAC;YACjE,MAAM,GAAG,GAAG,cAAc,CAAC,OAAO,CAAC,CAAC;YAEpC,KAAK,CAAC,MAAM,CAAC;gBACX,aAAa,EAAE,cAAc;gBAC7B,SAAS,EAAE,IAAI,CAAC,SAAS;gBACzB,cAAc,EAAE,IAAI,CAAC,cAAc;gBACnC,EAAE,EAAE,IAAI,CAAC,EAAE;gBACX,SAAS,EAAE,IAAI,CAAC,SAAS;gBACzB,UAAU,EAAE,GAAG;gBACf,MAAM,EAAE,CAAC,KAAK,CAAC;aAChB,CAAC,CAAC,KAAK,CAAC,GAAG,EAAE;gBACZ,OAAO,EAAE,CAAC;YACZ,CAAC,CAAC,CAAC;QACL,CAAC;QAAC,MAAM,CAAC;YACP,OAAO,EAAE,CAAC;QACZ,CAAC;IACH,CAAC,CAAC;IAEF,OAAO;QACL,IAAI,MAAM;YACR,OAAO,UAAU,CAAC,MAAM,CAAC;QAC3B,CAAC;QAED,OAAO;QAEP,mBAAmB,CAAC,GAAG;YACrB,IAAI,CAAC;gBACH,MAAM,CAAC,GAAG,MAAM,CAAC,OAAO,CAAC,CAAC;gBAC1B,IAAI,CAAC,CAAC,CAAC,OAAO;oBAAE,OAAO;gBACvB,MAAM,KAAK,GAAG,uBAAuB,CAAC,GAAG,CAAC,CAAC;gBAC3C,oEAAoE;gBACpE,sEAAsE;gBACtE,oEAAoE;gBACpE,mEAAmE;gBACnE,+BAA+B;gBAC/B,MAAM,UAAU,GAAG,wBAAwB,CAAC,SAAS,CAAC,KAAK,CAAC,CAAC;gBAC7D,IAAI,CAAC,UAAU,CAAC,OAAO,EAAE,CAAC;oBACxB,OAAO,CAAC,IAAI,CAAC,qDAAqD,EAAE;wBAClE,OAAO,EAAE,KAAK,CAAC,OAAO;wBACtB,MAAM,EAAE,UAAU,CAAC,KAAK,CAAC,MAAM,CAAC,GAAG,CAAC,CAAC,CAAC,EAAE,EAAE,CAAC,CAAC,EAAE,IAAI,EAAE,CAAC,CAAC,IAAI,CAAC,IAAI,CAAC,GAAG,CAAC,EAAE,OAAO,EAAE,CAAC,CAAC,OAAO,EAAE,CAAC,CAAC;qBAC7F,CAAC,CAAC;gBACL,CAAC;gBACD,MAAM,OAAO,GAAG,iCAAiC,CAAC,SAAS,CAAC,KAAK,CAAC,CAAC;gBACnE,IAAI,CAAC,OAAO,CAAC,OAAO,EAAE,CAAC;oBACrB,qEAAqE;oBACrE,mEAAmE;oBACnE,4DAA4D;oBAC5D,iEAAiE;oBACjE,qBAAqB;oBACrB,MAAM,iBAAiB,GAAG,CAAC,KAAK,CAAC,MAAM,IAAI,EAAE,CAAC,CAAC,MAAM,CACnD,CAAC,GAA2B,EAAE,CAAmC,EAAE,EAAE;wBACnE,IAAI,CAAC,CAAC,IAAI,IAAI,CAAC,CAAC,KAAK;4BAAE,GAAG,CAAC,CAAC,CAAC,IAAI,CAAC,GAAG,CAAC,CAAC,KAAK,CAAC;wBAC7C,OAAO,GAAG,CAAC;oBACb,CAAC,EACD,EAAE,CACH,CAAC;oBACF,OAAO,CAAC,IAAI,CAAC,0DAA0D,EAAE;wBACvE,OAAO,EAAE,KAAK,CAAC,OAAO;wBACtB,gBAAgB,EAAE,KAAK,CAAC,gBAAgB;wBACxC,WAAW,EAAE,iBAAiB;wBAC9B,eAAe,EAAE,KAAK,CAAC,eAAe;wBACtC,WAAW,EAAE,KAAK,CAAC,WAAW;wBAC9B,YAAY,EAAE,KAAK,CAAC,YAAY;wBAChC,MAAM,EAAE,OAAO,CAAC,KAAK,CAAC,MAAM,CAAC,GAAG,CAAC,CAAC,CAAC,EAAE,EAAE,CAAC,CAAC;4BACvC,IAAI,EAAE,CAAC,CAAC,OAAO;4BACf,IAAI,EAAE,CAAC,CAAC,IAAI,CAAC,IAAI,CAAC,GAAG,CAAC;yBACvB,CAAC,CAAC;qBACJ,CAAC,CAAC;gBACL,CAAC;gBACD,OAAO,CAAC,KAA2C,CAAC,CAAC;YACvD,CAAC;YAAC,MAAM,CAAC;gBACP,OAAO,EAAE,CAAC;YACZ,CAAC;QACH,CAAC;QAED,KAAK,CAAC,cAAc,CAAC,OAAO;YAC1B,MAAM,cAAc,CAAC,OAAO,CAAC,CAAC;YAC9B,UAAU,CAAC,KAAK,EAAE,CAAC;YACnB,MAAM,SAAS,GAAG,IAAI,CAAC,OAAO,EAAE,wBAAwB,CAAC,CAAC;YAC1D,IAAI,UAAU,CAAC,SAAS,CAAC;gBAAE,UAAU,CAAC,SAAS,CAAC,CAAC;YACjD,UAAU,GAAG,IAAI,CAAC;YAClB,IAAI,OAAO,EAAE,eAAe,EAAE,CAAC;gBAC7B,eAAe,CAAC,OAAO,CAAC,CAAC;YAC3B,CAAC;QACH,CAAC;KACF,CAAC;AACJ,CAAC"}
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "@zhixuan92/multi-model-agent",
|
|
3
|
-
"version": "3.10.
|
|
3
|
+
"version": "3.10.5",
|
|
4
4
|
"type": "module",
|
|
5
5
|
"license": "MIT",
|
|
6
6
|
"description": "Standalone HTTP server for multi-model-agent. Routes tool-invocation work to Claude, Codex, or OpenAI-compatible sub-agents with async-polling REST dispatch and installable skills for Claude Code, Gemini CLI, Codex CLI, and Cursor.",
|
|
@@ -52,7 +52,7 @@
|
|
|
52
52
|
},
|
|
53
53
|
"dependencies": {
|
|
54
54
|
"@asteasolutions/zod-to-openapi": "^8.5.0",
|
|
55
|
-
"@zhixuan92/multi-model-agent-core": "^3.10.
|
|
55
|
+
"@zhixuan92/multi-model-agent-core": "^3.10.5",
|
|
56
56
|
"gray-matter": "^4.0.3",
|
|
57
57
|
"minimist": "^1.2.8",
|
|
58
58
|
"proper-lockfile": "^4.1.2",
|