@xcraftmind/mastermind 0.27.0 → 0.28.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/package.json +8 -8
- package/share/agents/mastermind-auditor.md +36 -0
- package/share/agents/mastermind-task-executor.md +41 -0
- package/share/skills/mastermind-task-executor/SKILL.md +14 -0
- package/share/skills/mastermind-task-planning/SKILL.md +93 -0
- package/share/skills/mastermind-task-planning/references/defect-taxonomy.md +287 -0
- package/share/skills/mastermind-task-planning/references/structured-report-schema.md +141 -0
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "@xcraftmind/mastermind",
|
|
3
|
-
"version": "0.
|
|
3
|
+
"version": "0.28.0",
|
|
4
4
|
"description": "Mastermind workflow CLI + mmcg codegraph for AI coding agents — verify-spec / audit-spec gates, MCP server, multi-language tree-sitter indexer (Python, TypeScript, JavaScript, Rust, C#, Go, Java, PHP, C/C++). Prebuilt native binaries via optional platform packages — no Rust toolchain required.",
|
|
5
5
|
"license": "MIT",
|
|
6
6
|
"author": "xcraftmind",
|
|
@@ -38,12 +38,12 @@
|
|
|
38
38
|
"mastermind"
|
|
39
39
|
],
|
|
40
40
|
"optionalDependencies": {
|
|
41
|
-
"@xcraftmind/mmcg-darwin-arm64": "0.
|
|
42
|
-
"@xcraftmind/mmcg-darwin-x64": "0.
|
|
43
|
-
"@xcraftmind/mmcg-linux-x64-gnu": "0.
|
|
44
|
-
"@xcraftmind/mmcg-linux-arm64-gnu": "0.
|
|
45
|
-
"@xcraftmind/mmcg-linux-x64-musl": "0.
|
|
46
|
-
"@xcraftmind/mmcg-linux-arm64-musl": "0.
|
|
47
|
-
"@xcraftmind/mmcg-win32-x64-msvc": "0.
|
|
41
|
+
"@xcraftmind/mmcg-darwin-arm64": "0.28.0",
|
|
42
|
+
"@xcraftmind/mmcg-darwin-x64": "0.28.0",
|
|
43
|
+
"@xcraftmind/mmcg-linux-x64-gnu": "0.28.0",
|
|
44
|
+
"@xcraftmind/mmcg-linux-arm64-gnu": "0.28.0",
|
|
45
|
+
"@xcraftmind/mmcg-linux-x64-musl": "0.28.0",
|
|
46
|
+
"@xcraftmind/mmcg-linux-arm64-musl": "0.28.0",
|
|
47
|
+
"@xcraftmind/mmcg-win32-x64-msvc": "0.28.0"
|
|
48
48
|
}
|
|
49
49
|
}
|
|
@@ -157,6 +157,42 @@ A markdown audit report:
|
|
|
157
157
|
|
|
158
158
|
If verdict is anything other than `contract held`, the planner must address each `❌` / `⚠️` / critical-deferred item before telling the user "done".
|
|
159
159
|
|
|
160
|
+
### Structured audit tail (REQUIRED)
|
|
161
|
+
|
|
162
|
+
After the prose verdict, emit a fenced-YAML structured audit tail wrapped in
|
|
163
|
+
`<!-- mastermind:audit-begin -->` / `<!-- mastermind:audit-end -->` sentinels.
|
|
164
|
+
The planner reads this for mechanical routing — discrepancies must use the
|
|
165
|
+
`kind:` vocabulary from the `defect-taxonomy.md` reference in the
|
|
166
|
+
`mastermind-task-planning` skill (auditor-discrepancy section). The full schema
|
|
167
|
+
lives in that same skill's references as `structured-report-schema.md`. The
|
|
168
|
+
agent has both loaded — no path lookup needed.
|
|
169
|
+
|
|
170
|
+
Minimal template:
|
|
171
|
+
|
|
172
|
+
````markdown
|
|
173
|
+
<!-- mastermind:audit-begin -->
|
|
174
|
+
```yaml
|
|
175
|
+
spec: <absolute path to spec.md>
|
|
176
|
+
verdict: held | drift | broken
|
|
177
|
+
files_in_scope: <N>
|
|
178
|
+
files_in_diff: <M>
|
|
179
|
+
scope_match: <bool>
|
|
180
|
+
discrepancies: []
|
|
181
|
+
snapshot_drift:
|
|
182
|
+
- symbol: <name>
|
|
183
|
+
pre_callers: <N>
|
|
184
|
+
post_callers: <M>
|
|
185
|
+
delta: none | gained | lost | signature_changed
|
|
186
|
+
verifications_rerun:
|
|
187
|
+
- cmd: "<command>"
|
|
188
|
+
result: pass
|
|
189
|
+
```
|
|
190
|
+
<!-- mastermind:audit-end -->
|
|
191
|
+
````
|
|
192
|
+
|
|
193
|
+
Even on `verdict: held` the tail is REQUIRED — with `discrepancies: []` and
|
|
194
|
+
`scope_match: true`. The planner relies on the sentinel block existing.
|
|
195
|
+
|
|
160
196
|
## Capture the lesson (institutional memory)
|
|
161
197
|
|
|
162
198
|
When the verdict is `⚠️ partial drift` or `❌ contract broken`, append a **one-line lesson** to `.mastermind/tasks/_lessons.md` (shared file at the top of `tasks/`, not inside any task folder) so the next planner can learn from this audit. Skip on clean `✅ contract held` verdicts — that's just normal operation, not a lesson.
|
|
@@ -83,6 +83,47 @@ A markdown execution report:
|
|
|
83
83
|
<Anything you noticed but didn't fix because it was out of scope. Hand back to planner.>
|
|
84
84
|
```
|
|
85
85
|
|
|
86
|
+
### Structured tail (REQUIRED)
|
|
87
|
+
|
|
88
|
+
After the prose report, emit a fenced-YAML structured tail wrapped in
|
|
89
|
+
`<!-- mastermind:report-begin -->` / `<!-- mastermind:report-end -->` sentinels.
|
|
90
|
+
The planner extracts and parses this block mechanically — the prose above is for
|
|
91
|
+
humans, the tail is for routing.
|
|
92
|
+
|
|
93
|
+
The full schema with field meanings lives in the `mastermind-task-planning`
|
|
94
|
+
skill's references as `structured-report-schema.md`. The closed set of
|
|
95
|
+
`defect.kind` values lives in the same skill's references as
|
|
96
|
+
`defect-taxonomy.md`. The agent has both loaded — no path lookup needed.
|
|
97
|
+
|
|
98
|
+
Minimal template (populate every field, even on a clean run):
|
|
99
|
+
|
|
100
|
+
````markdown
|
|
101
|
+
<!-- mastermind:report-begin -->
|
|
102
|
+
```yaml
|
|
103
|
+
spec: <absolute path to spec.md>
|
|
104
|
+
status: complete | partial | failed
|
|
105
|
+
phases:
|
|
106
|
+
- id: "1.1"
|
|
107
|
+
status: done
|
|
108
|
+
- id: "1.2"
|
|
109
|
+
status: done
|
|
110
|
+
files_modified:
|
|
111
|
+
- <relative path>
|
|
112
|
+
defects: []
|
|
113
|
+
verifications:
|
|
114
|
+
- cmd: "<command>"
|
|
115
|
+
result: pass
|
|
116
|
+
```
|
|
117
|
+
<!-- mastermind:report-end -->
|
|
118
|
+
````
|
|
119
|
+
|
|
120
|
+
When you stop on a defect:
|
|
121
|
+
- Populate `defects[]` with one entry whose `kind:` is from the taxonomy (or
|
|
122
|
+
`unclassified` if nothing matches — be honest about it).
|
|
123
|
+
- Set the corresponding phase's `status` to `stopped_here`.
|
|
124
|
+
- Set the top-level `status:` to `partial` (some phases done) or `failed`
|
|
125
|
+
(Phase 1 didn't land).
|
|
126
|
+
|
|
86
127
|
## Companion skill
|
|
87
128
|
|
|
88
129
|
This subagent is the runtime companion to [[mastermind-task-planning]] (the planner) and uses [[mastermind-task-executor]] (the skill body). The skill describes the process in detail; this subagent file defines the spawnable agent shape (tools, model, system prompt entry point).
|
|
@@ -120,6 +120,20 @@ Output a report in this exact shape:
|
|
|
120
120
|
<Anything you noticed but didn't fix because it was out of scope. Hand back to the planner — they decide whether to add a follow-up task.>
|
|
121
121
|
```
|
|
122
122
|
|
|
123
|
+
### Structured tail (REQUIRED)
|
|
124
|
+
|
|
125
|
+
After the prose sections above, emit a fenced-YAML structured tail wrapped in
|
|
126
|
+
`<!-- mastermind:report-begin -->` / `<!-- mastermind:report-end -->` sentinels.
|
|
127
|
+
The planner reads this block to mechanically route on defect `kind:` — see
|
|
128
|
+
[`structured-report-schema.md`](../mastermind-task-planning/references/structured-report-schema.md)
|
|
129
|
+
for the full schema and
|
|
130
|
+
[`defect-taxonomy.md`](../mastermind-task-planning/references/defect-taxonomy.md)
|
|
131
|
+
for the closed `kind:` set.
|
|
132
|
+
|
|
133
|
+
Even on a clean run (all phases done, every VERIFY passed) the tail is REQUIRED
|
|
134
|
+
— with `status: complete` and `defects: []`. Planners rely on the sentinel block
|
|
135
|
+
existing; absence is treated as a malformed reply.
|
|
136
|
+
|
|
123
137
|
## Failure modes — and how to handle them
|
|
124
138
|
|
|
125
139
|
| Situation | What to do |
|
|
@@ -349,6 +349,99 @@ Each task lives in its own folder under `.mastermind/tasks/`:
|
|
|
349
349
|
|
|
350
350
|
If `.mastermind/tasks/` doesn't exist, create it and optionally create `CONTEXT.md` with project info. `mmcg init` does this for you.
|
|
351
351
|
|
|
352
|
+
## Defect-aware retry (mechanical routing on subagent reports)
|
|
353
|
+
|
|
354
|
+
The executor and auditor subagents emit a fenced-YAML "structured tail" at the
|
|
355
|
+
end of every report — full schema at
|
|
356
|
+
[`references/structured-report-schema.md`](references/structured-report-schema.md),
|
|
357
|
+
defect-kind vocabulary at
|
|
358
|
+
[`references/defect-taxonomy.md`](references/defect-taxonomy.md).
|
|
359
|
+
|
|
360
|
+
When you (planner) receive a subagent report that isn't `status: complete` /
|
|
361
|
+
`verdict: held`, your routing flow is:
|
|
362
|
+
|
|
363
|
+
1. Locate the sentinel block (`<!-- mastermind:report-begin -->` or
|
|
364
|
+
`<!-- mastermind:audit-begin -->`) at the end of the reply.
|
|
365
|
+
2. Parse the YAML block. For each `defects[]` / `discrepancies[]` entry, read
|
|
366
|
+
`kind:`.
|
|
367
|
+
3. Look up that `kind:` in the taxonomy doc. Apply the named fix template to
|
|
368
|
+
the spec (patch the offending phase, add a missing `expected_docs[]` entry,
|
|
369
|
+
add an authorization Rule, etc.).
|
|
370
|
+
4. Re-spawn the executor with the patched spec, with a focused continuation
|
|
371
|
+
prompt that names which phases are already done and which need re-execution.
|
|
372
|
+
|
|
373
|
+
When the structured tail's `kind:` is `unclassified`, you're in unknown
|
|
374
|
+
territory:
|
|
375
|
+
- Read the verbatim `details:` field
|
|
376
|
+
- Design the fix manually
|
|
377
|
+
- After the task lands, promote the new defect into a named entry in
|
|
378
|
+
`defect-taxonomy.md` (no separate spec needed for taxonomy edits — direct
|
|
379
|
+
doc commit is fine)
|
|
380
|
+
|
|
381
|
+
The whole point of this routing is to **avoid re-reading prose reports**.
|
|
382
|
+
Before this convention landed (tasks 001 + 002), the planner read 4 executor
|
|
383
|
+
reports across two tasks and manually classified 6 defects. With the taxonomy +
|
|
384
|
+
structured tail, the planner can route the same defects in a single YAML lookup.
|
|
385
|
+
|
|
386
|
+
## Iteration budget (escalate, don't loop forever)
|
|
387
|
+
|
|
388
|
+
If you've re-spawned the executor 3 times on the same spec and it keeps
|
|
389
|
+
returning `status: partial` / `failed`, STOP. Don't issue a 4th respawn.
|
|
390
|
+
Instead:
|
|
391
|
+
- Surface the situation to the user with the cumulative defect list (all 3
|
|
392
|
+
rounds' `defects[]` entries flattened)
|
|
393
|
+
- Suggest spec redesign rather than another patch
|
|
394
|
+
- Append a one-line `[auto]` entry to `_lessons.md` of kind
|
|
395
|
+
`iteration_budget_exhausted` so future planners see this signal
|
|
396
|
+
|
|
397
|
+
Three rounds is the empirically-calibrated bound from forge's
|
|
398
|
+
`ErrorTracker.max_retries=3` default and from our task-002 experience (4 rounds
|
|
399
|
+
to land, would have been 2 with a tighter spec). Don't loosen the bound without
|
|
400
|
+
recording why in the spec's Notes section.
|
|
401
|
+
|
|
402
|
+
Since spec 004, the bound is also enforced at the CLI in `mmcg run-task`:
|
|
403
|
+
`--max-iterations N` (default 3), `--force-iteration` to bypass. The CLI gate
|
|
404
|
+
catches the case where state-resets accumulate without you noticing; the
|
|
405
|
+
self-check above is the in-conversation early-warning.
|
|
406
|
+
|
|
407
|
+
## Premature-terminal escalation tiers (self-check before declaring "done")
|
|
408
|
+
|
|
409
|
+
Before you tell the user "task complete" (or any equivalent — "all done",
|
|
410
|
+
"shipped", "ready to merge", "сделано", …), you MUST satisfy three conditions
|
|
411
|
+
in order:
|
|
412
|
+
|
|
413
|
+
1. **Auditor was spawned and returned a verdict.** The structured audit tail
|
|
414
|
+
(`<!-- mastermind:audit-begin -->` … `<!-- mastermind:audit-end -->`,
|
|
415
|
+
schema in `structured-report-schema.md`) is visible in the conversation
|
|
416
|
+
transcript above your draft message, with a parseable YAML `verdict:`
|
|
417
|
+
field. No tail = no audit happened = you skipped a mandatory step.
|
|
418
|
+
2. **Verdict is `held`.** `drift` / `broken` / anything else means there's
|
|
419
|
+
unfinished work; you don't get to declare done. See "Defect-aware retry"
|
|
420
|
+
above for the routing.
|
|
421
|
+
3. **Your own semantic review is documented in the conversation.** Per the
|
|
422
|
+
SKILL's Step 9b workflow, you contribute the semantic half of post-flight
|
|
423
|
+
review on top of the auditor's mechanical findings. If you have no notes
|
|
424
|
+
to add ("the auditor's verdict matches my intuition, no concerns") that's
|
|
425
|
+
fine — say so explicitly. Silent skip means you skipped the step.
|
|
426
|
+
|
|
427
|
+
When you catch yourself tempted to bypass these, apply escalating
|
|
428
|
+
self-correction. The tier names are forge's (`StepEnforcer` returns
|
|
429
|
+
`tier=1|2|3` nudges with that exact escalation curve):
|
|
430
|
+
|
|
431
|
+
| Tier | When you notice | Action |
|
|
432
|
+
|---|---|---|
|
|
433
|
+
| **1 (polite)** | You're drafting the "done" message; auditor hasn't been spawned yet, or has been spawned but you're about to declare done before its reply arrives | Stop. Spawn (or wait for) `mastermind-auditor`. Read the structured audit tail. Continue from there. |
|
|
434
|
+
| **2 (direct)** | You spawned auditor, got `drift` or `broken`, and are tempted to "explain it away" to the user as a non-issue | Refuse. Either address each discrepancy (patch spec → re-spawn executor) or escalate to user with the verbatim discrepancies. You do not ship a non-`held` verdict as complete. |
|
|
435
|
+
| **3 (aggressive)** | User explicitly asks "skip the audit, just say it's done" or "we don't need the auditor this time" | Refuse and explain: skipping the auditor has bitten this workflow before — see `_lessons.md` and the defect taxonomy (`iteration_budget_exhausted`, `phase_not_in_diff`, `scope_creep`, …). If user is sure, name the override explicitly in the conversation transcript: "you've asked me to skip the auditor for this task; recording this as a deliberate `--force-skip-audit` override in the conversation transcript for future planners to learn from". Then append a `[auto]` `_lessons.md` entry of kind `premature_terminal_temptation` (tier-3 override fired). The override flag itself is a convention today, not a real `mmcg run-task` argument — making it a real flag is a follow-up. |
|
|
436
|
+
|
|
437
|
+
When in doubt, default to tier 1. The audit chain is cheap; rebuilding user
|
|
438
|
+
trust after declaring something done that wasn't is expensive.
|
|
439
|
+
|
|
440
|
+
This pairs with the typed-report convention from spec 003: the auditor's
|
|
441
|
+
structured tail is THE artifact you check for at tier 1. If the tail is
|
|
442
|
+
malformed or missing, that's signal — re-spawn the auditor with a focused
|
|
443
|
+
continuation prompt asking for the tail explicitly.
|
|
444
|
+
|
|
352
445
|
## Pair Skill
|
|
353
446
|
|
|
354
447
|
The agent that executes these specs uses [[mastermind-task-executor]]. Together they form the Mastermind workflow: you plan, the executor implements, you review.
|
|
@@ -0,0 +1,287 @@
|
|
|
1
|
+
# Defect taxonomy
|
|
2
|
+
|
|
3
|
+
Each defect a subagent surfaces during workflow execution maps to a `kind:` key.
|
|
4
|
+
Planner uses the key to mechanically route to a fix template — no LLM judgment
|
|
5
|
+
needed for known cases. When a NEW defect surfaces that doesn't match any known
|
|
6
|
+
kind, the subagent marks it `kind: unclassified` and the planner promotes it
|
|
7
|
+
into a named entry here as part of the follow-up.
|
|
8
|
+
|
|
9
|
+
The full structured-report schema (what `kind:` slots into) lives alongside this
|
|
10
|
+
file as [`structured-report-schema.md`](structured-report-schema.md).
|
|
11
|
+
|
|
12
|
+
## Executor stop kinds
|
|
13
|
+
|
|
14
|
+
### `envelope_drift`
|
|
15
|
+
|
|
16
|
+
- **What**: Test asserts on the raw return value of `handle_tools_call`, but the
|
|
17
|
+
dispatcher wraps every successful payload in
|
|
18
|
+
`{ "content": [{ "type": "text", "text": "<serialized JSON>" }] }`. Field
|
|
19
|
+
comparisons against the wrapper always fail.
|
|
20
|
+
- **Surfaced as**: `assertion left == right` panic where `left` is a JSON object
|
|
21
|
+
and `right` is a sub-field that lives inside `content[0].text`.
|
|
22
|
+
- **Fix template**: Reuse the `unwrap_content` helper that lives in
|
|
23
|
+
`mcp/servers/mmcg/src/mcp.rs::tests` from task 001. Wrap every `handle_tools_call`
|
|
24
|
+
return with `unwrap_content(&v)` before asserting on fields. Do NOT redefine
|
|
25
|
+
the helper.
|
|
26
|
+
- **First observed**: Task 001 Phase 2.3.
|
|
27
|
+
|
|
28
|
+
### `doc_surface_gap`
|
|
29
|
+
|
|
30
|
+
- **What**: Spec's Phase 3 (docs) covers fewer files than
|
|
31
|
+
`scripts/validate.py::validate_mmcg_tool_drift` enforces. Validator finds tool
|
|
32
|
+
names in `mcp.rs` but missing from one or more of: mmcg README, repo README,
|
|
33
|
+
`.claude-plugin/marketplace.json`, `plugins/mmcg/.claude-plugin/plugin.json`.
|
|
34
|
+
- **Surfaced as**: `python3 scripts/validate.py` exits non-zero with
|
|
35
|
+
`tool 'mmcg_X' missing — declared in mcp/servers/mmcg/src/mcp.rs but absent
|
|
36
|
+
from this file` (one error per missing surface).
|
|
37
|
+
- **Fix template**: Add the three missing surfaces to `expected_docs[]` in the
|
|
38
|
+
spec frontmatter, then add three Phase 3.x sub-steps with FIND/CHANGE TO blocks.
|
|
39
|
+
Pattern: `marketplace.json` and `plugin.json` each carry ONE prose `description`
|
|
40
|
+
string with the comma-separated tool list and `N tools` count; the repo
|
|
41
|
+
`README.md` carries TWO occurrences (table cell + standalone-crate paragraph).
|
|
42
|
+
Insert the new tool name before the trailing `status` entry in each list and
|
|
43
|
+
bump the count by 1.
|
|
44
|
+
- **First observed**: Task 001 Phase 4.
|
|
45
|
+
|
|
46
|
+
### `zero_filter_verify`
|
|
47
|
+
|
|
48
|
+
- **What**: VERIFY command uses `cargo test --lib <module>::` (trailing `::`)
|
|
49
|
+
which cargo treats as a literal path that no test matches. Command exits 0
|
|
50
|
+
with zero tests run — false-positive "pass".
|
|
51
|
+
- **Surfaced as**: `cargo test ... <module>::` output reads `0 passed; 0 failed;
|
|
52
|
+
N filtered out` even though the module HAS tests.
|
|
53
|
+
- **Fix template**: Drop the trailing `::`. Use the bare module name as the
|
|
54
|
+
substring filter: `cargo test --lib <module>`. Cargo matches any test whose
|
|
55
|
+
path contains the substring.
|
|
56
|
+
- **First observed**: Task 001 Phase 1.3.
|
|
57
|
+
|
|
58
|
+
### `verify_grep_false_positive`
|
|
59
|
+
|
|
60
|
+
- **What**: A VERIFY command pipes test output into `grep -q "test result: ok"`
|
|
61
|
+
(or similar benign-line match) to gate pass/fail. But that line can co-exist
|
|
62
|
+
with a failure: a module with both unit tests AND doctests prints one
|
|
63
|
+
`test result: ok` per harness, so a `grep -q "ok"` can match the passing
|
|
64
|
+
doctest summary while a unit-test `FAILED` line sits elsewhere in the output.
|
|
65
|
+
The gate reports pass even though a test failed. Most dangerous in stress
|
|
66
|
+
loops (`for i in seq …; grep -q ok`) where one masked failure per iteration
|
|
67
|
+
is invisible.
|
|
68
|
+
- **Surfaced as**: An auditor independently re-running the loop with an explicit
|
|
69
|
+
`grep "FAILED"` / non-zero-exit check finds failures the spec's own loop
|
|
70
|
+
missed; or a flaky test "passes" the loop but fails in CI.
|
|
71
|
+
- **Fix template**: Gate on the ABSENCE of failure, not the presence of a
|
|
72
|
+
benign line. Prefer `cargo test … && echo ok || fails=$((fails+1))` (uses
|
|
73
|
+
cargo's own exit code — non-zero on any failure), or grep for the failure
|
|
74
|
+
marker: `cargo test … 2>&1 | grep -q "FAILED" && fails=$((fails+1))`. Never
|
|
75
|
+
gate a loop solely on a positive `grep -q "ok"`.
|
|
76
|
+
- **First observed**: Task 006 Phase 3 (planner's stress loop used
|
|
77
|
+
`grep -q "test result: ok"`; auditor caught the maskability and re-verified
|
|
78
|
+
with an explicit FAILED-marker hunt — fix held, but the VERIFY pattern was
|
|
79
|
+
latently unsound).
|
|
80
|
+
|
|
81
|
+
### `stale_pre_edit_snapshot`
|
|
82
|
+
|
|
83
|
+
- **What**: Spec's Pre-edit symbol snapshot or a Phase's FIND block claims a
|
|
84
|
+
function has visibility / signature X, but the on-disk function already has
|
|
85
|
+
visibility / signature Y. The FIND text doesn't appear in the file.
|
|
86
|
+
- **Surfaced as**: Executor returns `find_block_mismatch: <file> doesn't contain
|
|
87
|
+
the FIND text` for a phase that's nominally just a visibility change or
|
|
88
|
+
signature tweak.
|
|
89
|
+
- **Fix template**: Either (a) drop the phase entirely if the change is already
|
|
90
|
+
in place (the more common case — re-check whether the goal is satisfied by
|
|
91
|
+
the current state), or (b) update the FIND/CHANGE TO blocks to match the
|
|
92
|
+
actual current state. Re-capture the snapshot via
|
|
93
|
+
`./mcp/servers/mmcg/target/debug/mmcg query symbols-in-file <path>` before
|
|
94
|
+
rewriting.
|
|
95
|
+
- **First observed**: Task 002 Phase 1.5.
|
|
96
|
+
|
|
97
|
+
### `seed_extractor_mismatch`
|
|
98
|
+
|
|
99
|
+
- **What**: Integration test hand-crafts an intermediate type (e.g. `PendingFile`
|
|
100
|
+
with placeholder `kind: "fn"`, hand-written `signature: "fn foo()"`) to seed
|
|
101
|
+
storage. The consumer-under-test re-derives the same type from real input via
|
|
102
|
+
a parser (e.g. tree-sitter via `extractor_for_path` + `parse_one`), which
|
|
103
|
+
produces a structurally-equivalent but byte-different shape
|
|
104
|
+
(`kind: "function"`, fully-qualified signature). Hash/compare assertions fail
|
|
105
|
+
even on semantically-identical input.
|
|
106
|
+
- **Surfaced as**: A round-trip test that should be a no-op returns a
|
|
107
|
+
"structural change" / "different" verdict; classifier or comparator is
|
|
108
|
+
correct, the seeding path is the bug.
|
|
109
|
+
- **Fix template**: Seed via the same pipeline the consumer uses. For mmcg
|
|
110
|
+
fingerprint / structural tests, call `crate::indexer::extractor_for_path`
|
|
111
|
+
followed by `crate::indexer::parse_one` on a real on-disk fixture, then pass
|
|
112
|
+
the resulting `PendingFile` to `commit_file`. Never construct intermediate
|
|
113
|
+
parser-output types by hand.
|
|
114
|
+
- **First observed**: Task 002 Phase 2.4.
|
|
115
|
+
|
|
116
|
+
### `fmt_tension`
|
|
117
|
+
|
|
118
|
+
- **What**: Spec's verbatim Rust code blocks are line-wrapped for documentation
|
|
119
|
+
readability (e.g. multi-line `Vec::with_capacity(…)` calls, broken-out
|
|
120
|
+
`std::fs::write(…)` arg lists). Rustfmt collapses these. `cargo fmt --check`
|
|
121
|
+
fails even though `cargo test` passes — the diffs are cosmetic only.
|
|
122
|
+
- **Surfaced as**: `cargo fmt --check` exits non-zero with format-only diffs in
|
|
123
|
+
files the executor just wrote from spec FIND/CHANGE TO blocks; no semantic
|
|
124
|
+
divergence.
|
|
125
|
+
- **Fix template**: Default to (b) — add an explicit Rule to the spec
|
|
126
|
+
authorizing one `cargo fmt` normalization pass on touched files, with a note
|
|
127
|
+
that fmt may only collapse/expand whitespace and must not change logic. Use
|
|
128
|
+
(a) — re-author the spec blocks in rustfmt style preemptively — only for
|
|
129
|
+
surgical edits to a single function. Future planners SHOULD include the fmt
|
|
130
|
+
authorization Rule from the start on any spec that emits >50 LOC of Rust.
|
|
131
|
+
- **First observed**: Task 002 Phase 2.4.
|
|
132
|
+
|
|
133
|
+
## Auditor discrepancy kinds
|
|
134
|
+
|
|
135
|
+
### `scope_creep`
|
|
136
|
+
|
|
137
|
+
- **What**: `git diff --name-only HEAD` shows files NOT in the spec's `touches[]`
|
|
138
|
+
+ `expected_docs[]` union.
|
|
139
|
+
- **Surfaced as**: Auditor's diff-vs-spec check enumerates files outside scope.
|
|
140
|
+
- **Fix template**: Either revert the out-of-scope edits or extend the spec's
|
|
141
|
+
scope (with rationale) and re-spawn the audit. Zero tolerance unless the
|
|
142
|
+
planner explicitly excepted (e.g. authorized `cargo fmt` normalize affects
|
|
143
|
+
format-only).
|
|
144
|
+
- **First observed**: (none — all 001/002 audits clean. Listed for completeness.)
|
|
145
|
+
|
|
146
|
+
### `phase_not_in_diff`
|
|
147
|
+
|
|
148
|
+
- **What**: Executor marked Phase X as `[x]` complete but the phase's CHANGE TO
|
|
149
|
+
content isn't present in the file.
|
|
150
|
+
- **Surfaced as**: Auditor greps for canonical anchor strings from the CHANGE TO
|
|
151
|
+
block and finds nothing.
|
|
152
|
+
- **Fix template**: Investigate whether the executor lied or a later phase
|
|
153
|
+
reverted the change. Re-run that phase's VERIFY command in isolation.
|
|
154
|
+
|
|
155
|
+
### `verify_failed_on_rerun`
|
|
156
|
+
|
|
157
|
+
- **What**: Auditor re-ran a VERIFY command that the executor reported as passing,
|
|
158
|
+
and it now fails.
|
|
159
|
+
- **Surfaced as**: Discrepancy entry with the verbatim re-run output.
|
|
160
|
+
- **Fix template**: Snapshot the environment diff (env vars, working directory,
|
|
161
|
+
locked dependencies). Almost always a flake or env-specific behavior; if not,
|
|
162
|
+
the executor's claim is suspect.
|
|
163
|
+
|
|
164
|
+
### `snapshot_caller_drift`
|
|
165
|
+
|
|
166
|
+
- **What**: Pre-edit snapshot in spec said symbol X had N callers; post-execution
|
|
167
|
+
`mmcg query callers X` returns M ≠ N.
|
|
168
|
+
- **Surfaced as**: Auditor's drift check enumerates the delta.
|
|
169
|
+
- **Fix template**: Either the executor changed something out of scope (check
|
|
170
|
+
the diff for new call sites involving X), or the snapshot was wrong to start
|
|
171
|
+
with. If the latter, drop the snapshot's per-symbol claim and re-run the audit.
|
|
172
|
+
|
|
173
|
+
### `snapshot_signature_drift`
|
|
174
|
+
|
|
175
|
+
- **What**: Symbol X's signature changed but the spec didn't authorize it (e.g.
|
|
176
|
+
spec said "public signature stays unchanged" but the diff shows a parameter
|
|
177
|
+
added).
|
|
178
|
+
- **Surfaced as**: Auditor compares pre-edit `mmcg query search X` signature
|
|
179
|
+
against post-edit.
|
|
180
|
+
- **Fix template**: Almost always a real contract violation. Stop, revert the
|
|
181
|
+
signature change, re-issue the phase preserving the original signature.
|
|
182
|
+
|
|
183
|
+
### `validator_link_policy_gap`
|
|
184
|
+
|
|
185
|
+
- **What**: Spec's CHANGE TO content adds a relative markdown link from an
|
|
186
|
+
installable artifact (e.g. `agents/subagents/foo.md`) to a target that
|
|
187
|
+
escapes its installable package (e.g. `../../skills/workflow/bar/refs/x.md`).
|
|
188
|
+
`scripts/validate.py` warns: `installable file escapes package — link goes
|
|
189
|
+
N levels up (max 0 for this file class). Reference the artifact by name
|
|
190
|
+
instead`. Subagents and CLAUDE.md templates are flat-installed to
|
|
191
|
+
`~/.claude/agents/` and can't follow `../`-style paths there.
|
|
192
|
+
- **Surfaced as**: `python3 scripts/validate.py` exits 0 (errors clean) but
|
|
193
|
+
emits one warning per offending link. The spec's Phase 5 / Phase N VERIFY
|
|
194
|
+
treats `≥ 1 warning` as a failure depending on the spec's strictness rule.
|
|
195
|
+
- **Fix template**: Replace each cross-package relative markdown link with a
|
|
196
|
+
bare-name reference using the convention from
|
|
197
|
+
`feedback_artifact_references.md` in user memory: subagent → `name`, skill →
|
|
198
|
+
`/name`, doc reference → "X.md in <skill>'s references" or similar prose.
|
|
199
|
+
The LLM agent has the referenced artifact loaded; no path lookup needed.
|
|
200
|
+
Same-package relative links (within one skill tree, e.g.
|
|
201
|
+
`../mastermind-task-planning/references/…` from `mastermind-task-executor/SKILL.md`)
|
|
202
|
+
stay inside the installable package and pass the validator — only links
|
|
203
|
+
CROSSING the `agents/`↔`skills/` boundary or going > 0 levels up from a
|
|
204
|
+
subagent fall foul.
|
|
205
|
+
- **First observed**: Task 003 Phase 5.1 (executor stopped, planner promoted
|
|
206
|
+
the kind into the taxonomy in the same flight).
|
|
207
|
+
|
|
208
|
+
### `verify_grep_window_too_small`
|
|
209
|
+
|
|
210
|
+
- **What**: Spec's VERIFY command uses `grep -A N "anchor" file | grep -c "phrase"`
|
|
211
|
+
to confirm a phrase landed inside an "anchor + first few lines" window, but
|
|
212
|
+
`N` is sized for the spec author's mental layout (e.g. "header, blank, heading,
|
|
213
|
+
one bullet" = 4 lines) while the on-disk file has more pre-existing content
|
|
214
|
+
between the anchor and the new phrase (e.g. multiple prior bullets in the
|
|
215
|
+
same group). The phrase is correctly added to the file but lives outside the
|
|
216
|
+
`-A N` window.
|
|
217
|
+
- **Surfaced as**: `grep -c` prints `0` even though the file contains the
|
|
218
|
+
phrase exactly as specified. `grep <phrase> <file>` confirms presence.
|
|
219
|
+
- **Fix template**: Drop the windowed grep and use the bare
|
|
220
|
+
`grep -c "<unique phrase>" <file>` form. Pick a phrase that's unique to the
|
|
221
|
+
new content so the count remains 1 even on whole-file scan. Only keep `-A N`
|
|
222
|
+
when the anchor → phrase distance is short AND constant across spec authors
|
|
223
|
+
(e.g. immediately-following H2 with first paragraph).
|
|
224
|
+
- **First observed**: Task 003 Phase 6.1 (planner sized `-A 4` for 2 bullets;
|
|
225
|
+
by execution time there were already 2 prior bullets pushing the new one to
|
|
226
|
+
line 6).
|
|
227
|
+
|
|
228
|
+
### `verify_count_vs_change_to_mismatch`
|
|
229
|
+
|
|
230
|
+
- **What**: Spec's VERIFY command expects N occurrences of a phrase
|
|
231
|
+
(e.g. `grep -c "foo" file ≥ 2`), justified by a comment like "one in the
|
|
232
|
+
heading, one in the body". But the spec's prescribed CHANGE TO body
|
|
233
|
+
actually contains the phrase fewer times — the body refers to it
|
|
234
|
+
pronoun-style (`this kind:`, `it`, `the same`) instead of respelling. After
|
|
235
|
+
verbatim application, grep returns < N and VERIFY fails on a self-consistency
|
|
236
|
+
bug.
|
|
237
|
+
- **Surfaced as**: `grep -c` count is less than spec's claimed minimum;
|
|
238
|
+
CHANGE TO body was applied byte-for-byte; planner-side prose self-mismatch.
|
|
239
|
+
- **Fix template**: Prefer (a) — edit the CHANGE TO body to spell the phrase
|
|
240
|
+
explicitly where the VERIFY count assumed it would appear (e.g. swap
|
|
241
|
+
`with this \`kind:\`` for `of kind \`foo\``). That's also more useful for
|
|
242
|
+
taxonomy / doc consumers grepping for the literal term. Alternative (b) —
|
|
243
|
+
relax the VERIFY count to match the actual prescribed body. Default to (a)
|
|
244
|
+
for taxonomy / reference docs where literal grep-ability is valuable.
|
|
245
|
+
- **First observed**: Task 005 Phase 2.1 (planner wrote "this `kind:`" in
|
|
246
|
+
Fix template body but VERIFY expected 2 occurrences of the kind name).
|
|
247
|
+
|
|
248
|
+
### `premature_terminal_temptation`
|
|
249
|
+
|
|
250
|
+
- **What**: Planner is drafting a "task done" / "shipped" / "сделано"
|
|
251
|
+
message to the user without first ensuring (a) `mastermind-auditor` was
|
|
252
|
+
spawned, (b) verdict tail is in conversation with `verdict: held`, and
|
|
253
|
+
(c) planner's semantic review is documented. Often paired with the
|
|
254
|
+
rationalization "the executor's report looks clean, the audit will
|
|
255
|
+
surely pass".
|
|
256
|
+
- **Surfaced as**:
|
|
257
|
+
- Planner catches themselves mid-draft (tier 1)
|
|
258
|
+
- Planner has Drift/Broken verdict and is tempted to "explain it away"
|
|
259
|
+
as a non-issue to user (tier 2)
|
|
260
|
+
- User explicitly asks the planner to skip the auditor and just declare
|
|
261
|
+
done (tier 3 — refusal + explicit override recording)
|
|
262
|
+
- **Fix template**: Apply the SKILL's `Premature-terminal escalation tiers`
|
|
263
|
+
section. Tier 1 → just spawn the auditor and wait. Tier 2 → refuse, fix
|
|
264
|
+
the discrepancies, re-audit. Tier 3 → refuse, explain `_lessons.md`
|
|
265
|
+
precedent, name the override explicitly in the conversation transcript,
|
|
266
|
+
and append a `[auto]` `_lessons.md` entry of kind `premature_terminal_temptation`.
|
|
267
|
+
- **First observed**: Workflow convention — pre-emptively named in task 005
|
|
268
|
+
ahead of real instances, so planners have a routing key on the day they
|
|
269
|
+
catch themselves. If you (future planner) ARE the first real instance,
|
|
270
|
+
update `First observed` here to your task number.
|
|
271
|
+
|
|
272
|
+
### `unclassified`
|
|
273
|
+
|
|
274
|
+
- **What**: A defect that doesn't match any kind above.
|
|
275
|
+
- **Surfaced as**: Subagent emits `kind: unclassified` with a verbatim `details:`
|
|
276
|
+
description.
|
|
277
|
+
- **Fix template**: Read the verbatim details, design the fix manually. After
|
|
278
|
+
the task lands, promote this defect into a named entry in this taxonomy via a
|
|
279
|
+
follow-up spec (or a direct doc PR — taxonomy edits don't need their own
|
|
280
|
+
spec). The `[auto]` `_lessons.md` entry from `mmcg audit-spec` is a good
|
|
281
|
+
starting point for the writeup.
|
|
282
|
+
|
|
283
|
+
## Status (no defect)
|
|
284
|
+
|
|
285
|
+
When NO defect applies → `kind: clean` and the workflow proceeds normally.
|
|
286
|
+
Empty `defects: []` / `discrepancies: []` arrays in the structured tail also
|
|
287
|
+
indicate the clean case.
|
|
@@ -0,0 +1,141 @@
|
|
|
1
|
+
# Structured report schema
|
|
2
|
+
|
|
3
|
+
Every `mastermind-task-executor` and `mastermind-auditor` reply emits a
|
|
4
|
+
fenced-YAML "structured tail" alongside its markdown prose. The tail is wrapped
|
|
5
|
+
in HTML-comment sentinels so the planner can extract it deterministically with
|
|
6
|
+
a single regex.
|
|
7
|
+
|
|
8
|
+
The defect `kind:` vocabulary is the closed set defined in
|
|
9
|
+
[`defect-taxonomy.md`](defect-taxonomy.md). Subagents MUST pick a listed kind
|
|
10
|
+
or use `kind: unclassified` as the escape hatch.
|
|
11
|
+
|
|
12
|
+
## Executor tail
|
|
13
|
+
|
|
14
|
+
Emitted at the very end of the executor's reply, after the prose sections
|
|
15
|
+
(Phases completed / Verification results / Files modified / Stopped because /
|
|
16
|
+
What I did NOT do). Format:
|
|
17
|
+
|
|
18
|
+
````markdown
|
|
19
|
+
<!-- mastermind:report-begin -->
|
|
20
|
+
```yaml
|
|
21
|
+
spec: .mastermind/tasks/<NNN>-<name>/spec.md
|
|
22
|
+
status: complete | partial | failed
|
|
23
|
+
phases:
|
|
24
|
+
- id: "1.1"
|
|
25
|
+
status: done # done | pending | stopped_here | skipped
|
|
26
|
+
- id: "1.2"
|
|
27
|
+
status: done
|
|
28
|
+
- id: "2.4"
|
|
29
|
+
status: stopped_here
|
|
30
|
+
files_modified:
|
|
31
|
+
- mcp/servers/mmcg/src/store.rs
|
|
32
|
+
- mcp/servers/mmcg/src/fingerprint.rs
|
|
33
|
+
defects:
|
|
34
|
+
- kind: envelope_drift
|
|
35
|
+
phase: "2.4"
|
|
36
|
+
details: |
|
|
37
|
+
Test asserted on the raw `handle_tools_call` return, but the dispatcher
|
|
38
|
+
wraps every payload in `{ "content": [{ "type": "text", "text": <json> }] }`.
|
|
39
|
+
`cosmetic["class"]` is therefore not the field the assertion expects.
|
|
40
|
+
remediation_hint: |
|
|
41
|
+
Reuse `unwrap_content` from `mcp.rs::tests` (task 001). Replace
|
|
42
|
+
`let cosmetic = read_env;` with `let cosmetic = unwrap_content(&read_env);`.
|
|
43
|
+
verifications:
|
|
44
|
+
- cmd: "cd mcp/servers/mmcg && cargo test --locked --lib"
|
|
45
|
+
result: pass
|
|
46
|
+
- cmd: "cd mcp/servers/mmcg && cargo test --locked --lib change_class"
|
|
47
|
+
result: fail
|
|
48
|
+
output_excerpt: "thread '...' panicked at ..."
|
|
49
|
+
```
|
|
50
|
+
<!-- mastermind:report-end -->
|
|
51
|
+
````
|
|
52
|
+
|
|
53
|
+
### Field meanings
|
|
54
|
+
|
|
55
|
+
- `spec`: absolute path to the spec file the executor is implementing.
|
|
56
|
+
- `status`:
|
|
57
|
+
- `complete` — every phase landed, every Final-verification command exited 0
|
|
58
|
+
- `partial` — at least one phase done, executor stopped before reaching Phase N
|
|
59
|
+
- `failed` — Phase 1 couldn't even start (FIND mismatch on the first
|
|
60
|
+
sub-step, environment broken, etc.)
|
|
61
|
+
- `phases[].status`:
|
|
62
|
+
- `done` — phase's CHANGE TO content is in the file AND its VERIFY exited 0
|
|
63
|
+
- `pending` — not yet attempted in this execution
|
|
64
|
+
- `stopped_here` — the executor halted at this phase; populate the matching
|
|
65
|
+
`defects[]` entry with details
|
|
66
|
+
- `skipped` — planner explicitly dropped this phase mid-flight (e.g. Phase
|
|
67
|
+
1.5 in task 002); list it for traceability
|
|
68
|
+
- `files_modified`: every path the executor's edits touched, relative to repo
|
|
69
|
+
root. Must match `git diff --name-only HEAD` + untracked-new-files; this is
|
|
70
|
+
the auditor's scope-creep anchor.
|
|
71
|
+
- `defects[]`: zero or more defects. Empty array = clean run. Each entry MUST
|
|
72
|
+
populate `kind` from the closed set in `defect-taxonomy.md` (or
|
|
73
|
+
`unclassified`), `phase` of the failure, verbatim `details`, and a
|
|
74
|
+
`remediation_hint` the planner can apply.
|
|
75
|
+
- `verifications[]`: every VERIFY command run, in execution order. Truncate
|
|
76
|
+
`output_excerpt` to ~5 lines of the relevant error/diff.
|
|
77
|
+
|
|
78
|
+
## Auditor tail
|
|
79
|
+
|
|
80
|
+
Emitted at the very end of the auditor's reply. Format:
|
|
81
|
+
|
|
82
|
+
````markdown
|
|
83
|
+
<!-- mastermind:audit-begin -->
|
|
84
|
+
```yaml
|
|
85
|
+
spec: .mastermind/tasks/<NNN>-<name>/spec.md
|
|
86
|
+
verdict: held | drift | broken
|
|
87
|
+
files_in_scope: 7
|
|
88
|
+
files_in_diff: 7
|
|
89
|
+
scope_match: true
|
|
90
|
+
discrepancies:
|
|
91
|
+
- kind: snapshot_caller_drift
|
|
92
|
+
symbol: SessionStore
|
|
93
|
+
spec_says: 45
|
|
94
|
+
index_says: 38
|
|
95
|
+
evidence: "git diff shows 7 callsites removed in src/api/*"
|
|
96
|
+
snapshot_drift:
|
|
97
|
+
- symbol: commit_file
|
|
98
|
+
pre_callers: 2
|
|
99
|
+
post_callers: 2
|
|
100
|
+
pre_signature: "pub fn commit_file(&mut self, pending: PendingFile) -> SqlResult<()>"
|
|
101
|
+
post_signature: "pub fn commit_file(&mut self, pending: PendingFile) -> SqlResult<()>"
|
|
102
|
+
delta: none
|
|
103
|
+
verifications_rerun:
|
|
104
|
+
- cmd: "cd mcp/servers/mmcg && cargo test --locked --lib"
|
|
105
|
+
result: pass
|
|
106
|
+
```
|
|
107
|
+
<!-- mastermind:audit-end -->
|
|
108
|
+
````
|
|
109
|
+
|
|
110
|
+
### Field meanings
|
|
111
|
+
|
|
112
|
+
- `verdict`:
|
|
113
|
+
- `held` — every claim in the executor report survived independent
|
|
114
|
+
verification; zero discrepancies
|
|
115
|
+
- `drift` — partial drift; at least one discrepancy, none critical (warnings,
|
|
116
|
+
minor scope creep, snapshot deltas with explanation)
|
|
117
|
+
- `broken` — at least one critical discrepancy (scope creep without
|
|
118
|
+
explanation, verify failed on re-run, signature drift that contradicts the
|
|
119
|
+
spec's stated invariants)
|
|
120
|
+
- `discrepancies[]`: every finding that contributed to a non-`held` verdict.
|
|
121
|
+
Each MUST use a `kind:` from the auditor section of `defect-taxonomy.md`.
|
|
122
|
+
- `snapshot_drift[]`: one entry per symbol in the spec's Pre-edit symbol
|
|
123
|
+
snapshot, with pre/post caller counts and signatures and a `delta:` summary
|
|
124
|
+
(`none` | `gained` | `lost` | `signature_changed`).
|
|
125
|
+
|
|
126
|
+
## Planner consumption
|
|
127
|
+
|
|
128
|
+
The planner (running `mastermind-task-planning` SKILL) extracts the tail with a
|
|
129
|
+
simple regex on the chat reply:
|
|
130
|
+
|
|
131
|
+
```text
|
|
132
|
+
<!-- mastermind:report-begin -->\n```yaml\n(?P<body>.*?)\n```\n<!-- mastermind:report-end -->
|
|
133
|
+
```
|
|
134
|
+
|
|
135
|
+
Then parses `body` as YAML. For each `defects[]` entry, the planner reads the
|
|
136
|
+
`kind:`, looks up the matching entry in `defect-taxonomy.md`, applies the named
|
|
137
|
+
fix template, and re-spawns the executor with the patched spec. This replaces
|
|
138
|
+
the manual prose-reading the planner did in tasks 001 and 002.
|
|
139
|
+
|
|
140
|
+
When `defects: []` and `status: complete`, the planner proceeds to spawn the
|
|
141
|
+
auditor.
|