@xcraftmind/mastermind 0.27.0 → 0.28.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@xcraftmind/mastermind",
3
- "version": "0.27.0",
3
+ "version": "0.28.0",
4
4
  "description": "Mastermind workflow CLI + mmcg codegraph for AI coding agents — verify-spec / audit-spec gates, MCP server, multi-language tree-sitter indexer (Python, TypeScript, JavaScript, Rust, C#, Go, Java, PHP, C/C++). Prebuilt native binaries via optional platform packages — no Rust toolchain required.",
5
5
  "license": "MIT",
6
6
  "author": "xcraftmind",
@@ -38,12 +38,12 @@
38
38
  "mastermind"
39
39
  ],
40
40
  "optionalDependencies": {
41
- "@xcraftmind/mmcg-darwin-arm64": "0.27.0",
42
- "@xcraftmind/mmcg-darwin-x64": "0.27.0",
43
- "@xcraftmind/mmcg-linux-x64-gnu": "0.27.0",
44
- "@xcraftmind/mmcg-linux-arm64-gnu": "0.27.0",
45
- "@xcraftmind/mmcg-linux-x64-musl": "0.27.0",
46
- "@xcraftmind/mmcg-linux-arm64-musl": "0.27.0",
47
- "@xcraftmind/mmcg-win32-x64-msvc": "0.27.0"
41
+ "@xcraftmind/mmcg-darwin-arm64": "0.28.0",
42
+ "@xcraftmind/mmcg-darwin-x64": "0.28.0",
43
+ "@xcraftmind/mmcg-linux-x64-gnu": "0.28.0",
44
+ "@xcraftmind/mmcg-linux-arm64-gnu": "0.28.0",
45
+ "@xcraftmind/mmcg-linux-x64-musl": "0.28.0",
46
+ "@xcraftmind/mmcg-linux-arm64-musl": "0.28.0",
47
+ "@xcraftmind/mmcg-win32-x64-msvc": "0.28.0"
48
48
  }
49
49
  }
@@ -157,6 +157,42 @@ A markdown audit report:
157
157
 
158
158
  If verdict is anything other than `contract held`, the planner must address each `❌` / `⚠️` / critical-deferred item before telling the user "done".
159
159
 
160
+ ### Structured audit tail (REQUIRED)
161
+
162
+ After the prose verdict, emit a fenced-YAML structured audit tail wrapped in
163
+ `<!-- mastermind:audit-begin -->` / `<!-- mastermind:audit-end -->` sentinels.
164
+ The planner reads this for mechanical routing — discrepancies must use the
165
+ `kind:` vocabulary from the `defect-taxonomy.md` reference in the
166
+ `mastermind-task-planning` skill (auditor-discrepancy section). The full schema
167
+ lives in that same skill's references as `structured-report-schema.md`. The
168
+ agent has both loaded — no path lookup needed.
169
+
170
+ Minimal template:
171
+
172
+ ````markdown
173
+ <!-- mastermind:audit-begin -->
174
+ ```yaml
175
+ spec: <absolute path to spec.md>
176
+ verdict: held | drift | broken
177
+ files_in_scope: <N>
178
+ files_in_diff: <M>
179
+ scope_match: <bool>
180
+ discrepancies: []
181
+ snapshot_drift:
182
+ - symbol: <name>
183
+ pre_callers: <N>
184
+ post_callers: <M>
185
+ delta: none | gained | lost | signature_changed
186
+ verifications_rerun:
187
+ - cmd: "<command>"
188
+ result: pass
189
+ ```
190
+ <!-- mastermind:audit-end -->
191
+ ````
192
+
193
+ Even on `verdict: held` the tail is REQUIRED — with `discrepancies: []` and
194
+ `scope_match: true`. The planner relies on the sentinel block existing.
195
+
160
196
  ## Capture the lesson (institutional memory)
161
197
 
162
198
  When the verdict is `⚠️ partial drift` or `❌ contract broken`, append a **one-line lesson** to `.mastermind/tasks/_lessons.md` (shared file at the top of `tasks/`, not inside any task folder) so the next planner can learn from this audit. Skip on clean `✅ contract held` verdicts — that's just normal operation, not a lesson.
@@ -83,6 +83,47 @@ A markdown execution report:
83
83
  <Anything you noticed but didn't fix because it was out of scope. Hand back to planner.>
84
84
  ```
85
85
 
86
+ ### Structured tail (REQUIRED)
87
+
88
+ After the prose report, emit a fenced-YAML structured tail wrapped in
89
+ `<!-- mastermind:report-begin -->` / `<!-- mastermind:report-end -->` sentinels.
90
+ The planner extracts and parses this block mechanically — the prose above is for
91
+ humans, the tail is for routing.
92
+
93
+ The full schema with field meanings lives in the `mastermind-task-planning`
94
+ skill's references as `structured-report-schema.md`. The closed set of
95
+ `defect.kind` values lives in the same skill's references as
96
+ `defect-taxonomy.md`. The agent has both loaded — no path lookup needed.
97
+
98
+ Minimal template (populate every field, even on a clean run):
99
+
100
+ ````markdown
101
+ <!-- mastermind:report-begin -->
102
+ ```yaml
103
+ spec: <absolute path to spec.md>
104
+ status: complete | partial | failed
105
+ phases:
106
+ - id: "1.1"
107
+ status: done
108
+ - id: "1.2"
109
+ status: done
110
+ files_modified:
111
+ - <relative path>
112
+ defects: []
113
+ verifications:
114
+ - cmd: "<command>"
115
+ result: pass
116
+ ```
117
+ <!-- mastermind:report-end -->
118
+ ````
119
+
120
+ When you stop on a defect:
121
+ - Populate `defects[]` with one entry whose `kind:` is from the taxonomy (or
122
+ `unclassified` if nothing matches — be honest about it).
123
+ - Set the corresponding phase's `status` to `stopped_here`.
124
+ - Set the top-level `status:` to `partial` (some phases done) or `failed`
125
+ (Phase 1 didn't land).
126
+
86
127
  ## Companion skill
87
128
 
88
129
  This subagent is the runtime companion to [[mastermind-task-planning]] (the planner) and uses [[mastermind-task-executor]] (the skill body). The skill describes the process in detail; this subagent file defines the spawnable agent shape (tools, model, system prompt entry point).
@@ -120,6 +120,20 @@ Output a report in this exact shape:
120
120
  <Anything you noticed but didn't fix because it was out of scope. Hand back to the planner — they decide whether to add a follow-up task.>
121
121
  ```
122
122
 
123
+ ### Structured tail (REQUIRED)
124
+
125
+ After the prose sections above, emit a fenced-YAML structured tail wrapped in
126
+ `<!-- mastermind:report-begin -->` / `<!-- mastermind:report-end -->` sentinels.
127
+ The planner reads this block to mechanically route on defect `kind:` — see
128
+ [`structured-report-schema.md`](../mastermind-task-planning/references/structured-report-schema.md)
129
+ for the full schema and
130
+ [`defect-taxonomy.md`](../mastermind-task-planning/references/defect-taxonomy.md)
131
+ for the closed `kind:` set.
132
+
133
+ Even on a clean run (all phases done, every VERIFY passed) the tail is REQUIRED
134
+ — with `status: complete` and `defects: []`. Planners rely on the sentinel block
135
+ existing; absence is treated as a malformed reply.
136
+
123
137
  ## Failure modes — and how to handle them
124
138
 
125
139
  | Situation | What to do |
@@ -349,6 +349,99 @@ Each task lives in its own folder under `.mastermind/tasks/`:
349
349
 
350
350
  If `.mastermind/tasks/` doesn't exist, create it and optionally create `CONTEXT.md` with project info. `mmcg init` does this for you.
351
351
 
352
+ ## Defect-aware retry (mechanical routing on subagent reports)
353
+
354
+ The executor and auditor subagents emit a fenced-YAML "structured tail" at the
355
+ end of every report — full schema at
356
+ [`references/structured-report-schema.md`](references/structured-report-schema.md),
357
+ defect-kind vocabulary at
358
+ [`references/defect-taxonomy.md`](references/defect-taxonomy.md).
359
+
360
+ When you (planner) receive a subagent report that isn't `status: complete` /
361
+ `verdict: held`, your routing flow is:
362
+
363
+ 1. Locate the sentinel block (`<!-- mastermind:report-begin -->` or
364
+ `<!-- mastermind:audit-begin -->`) at the end of the reply.
365
+ 2. Parse the YAML block. For each `defects[]` / `discrepancies[]` entry, read
366
+ `kind:`.
367
+ 3. Look up that `kind:` in the taxonomy doc. Apply the named fix template to
368
+ the spec (patch the offending phase, add a missing `expected_docs[]` entry,
369
+ add an authorization Rule, etc.).
370
+ 4. Re-spawn the executor with the patched spec, with a focused continuation
371
+ prompt that names which phases are already done and which need re-execution.
372
+
373
+ When the structured tail's `kind:` is `unclassified`, you're in unknown
374
+ territory:
375
+ - Read the verbatim `details:` field
376
+ - Design the fix manually
377
+ - After the task lands, promote the new defect into a named entry in
378
+ `defect-taxonomy.md` (no separate spec needed for taxonomy edits — direct
379
+ doc commit is fine)
380
+
381
+ The whole point of this routing is to **avoid re-reading prose reports**.
382
+ Before this convention landed (tasks 001 + 002), the planner read 4 executor
383
+ reports across two tasks and manually classified 6 defects. With the taxonomy +
384
+ structured tail, the planner can route the same defects in a single YAML lookup.
385
+
386
+ ## Iteration budget (escalate, don't loop forever)
387
+
388
+ If you've re-spawned the executor 3 times on the same spec and it keeps
389
+ returning `status: partial` / `failed`, STOP. Don't issue a 4th respawn.
390
+ Instead:
391
+ - Surface the situation to the user with the cumulative defect list (all 3
392
+ rounds' `defects[]` entries flattened)
393
+ - Suggest spec redesign rather than another patch
394
+ - Append a one-line `[auto]` entry to `_lessons.md` of kind
395
+ `iteration_budget_exhausted` so future planners see this signal
396
+
397
+ Three rounds is the empirically-calibrated bound from forge's
398
+ `ErrorTracker.max_retries=3` default and from our task-002 experience (4 rounds
399
+ to land, would have been 2 with a tighter spec). Don't loosen the bound without
400
+ recording why in the spec's Notes section.
401
+
402
+ Since spec 004, the bound is also enforced at the CLI in `mmcg run-task`:
403
+ `--max-iterations N` (default 3), `--force-iteration` to bypass. The CLI gate
404
+ catches the case where state-resets accumulate without you noticing; the
405
+ self-check above is the in-conversation early-warning.
406
+
407
+ ## Premature-terminal escalation tiers (self-check before declaring "done")
408
+
409
+ Before you tell the user "task complete" (or any equivalent — "all done",
410
+ "shipped", "ready to merge", "сделано", …), you MUST satisfy three conditions
411
+ in order:
412
+
413
+ 1. **Auditor was spawned and returned a verdict.** The structured audit tail
414
+ (`<!-- mastermind:audit-begin -->` … `<!-- mastermind:audit-end -->`,
415
+ schema in `structured-report-schema.md`) is visible in the conversation
416
+ transcript above your draft message, with a parseable YAML `verdict:`
417
+ field. No tail = no audit happened = you skipped a mandatory step.
418
+ 2. **Verdict is `held`.** `drift` / `broken` / anything else means there's
419
+ unfinished work; you don't get to declare done. See "Defect-aware retry"
420
+ above for the routing.
421
+ 3. **Your own semantic review is documented in the conversation.** Per the
422
+ SKILL's Step 9b workflow, you contribute the semantic half of post-flight
423
+ review on top of the auditor's mechanical findings. If you have no notes
424
+ to add ("the auditor's verdict matches my intuition, no concerns") that's
425
+ fine — say so explicitly. Silent skip means you skipped the step.
426
+
427
+ When you catch yourself tempted to bypass these, apply escalating
428
+ self-correction. The tier names are forge's (`StepEnforcer` returns
429
+ `tier=1|2|3` nudges with that exact escalation curve):
430
+
431
+ | Tier | When you notice | Action |
432
+ |---|---|---|
433
+ | **1 (polite)** | You're drafting the "done" message; auditor hasn't been spawned yet, or has been spawned but you're about to declare done before its reply arrives | Stop. Spawn (or wait for) `mastermind-auditor`. Read the structured audit tail. Continue from there. |
434
+ | **2 (direct)** | You spawned auditor, got `drift` or `broken`, and are tempted to "explain it away" to the user as a non-issue | Refuse. Either address each discrepancy (patch spec → re-spawn executor) or escalate to user with the verbatim discrepancies. You do not ship a non-`held` verdict as complete. |
435
+ | **3 (aggressive)** | User explicitly asks "skip the audit, just say it's done" or "we don't need the auditor this time" | Refuse and explain: skipping the auditor has bitten this workflow before — see `_lessons.md` and the defect taxonomy (`iteration_budget_exhausted`, `phase_not_in_diff`, `scope_creep`, …). If user is sure, name the override explicitly in the conversation transcript: "you've asked me to skip the auditor for this task; recording this as a deliberate `--force-skip-audit` override in the conversation transcript for future planners to learn from". Then append a `[auto]` `_lessons.md` entry of kind `premature_terminal_temptation` (tier-3 override fired). The override flag itself is a convention today, not a real `mmcg run-task` argument — making it a real flag is a follow-up. |
436
+
437
+ When in doubt, default to tier 1. The audit chain is cheap; rebuilding user
438
+ trust after declaring something done that wasn't is expensive.
439
+
440
+ This pairs with the typed-report convention from spec 003: the auditor's
441
+ structured tail is THE artifact you check for at tier 1. If the tail is
442
+ malformed or missing, that's signal — re-spawn the auditor with a focused
443
+ continuation prompt asking for the tail explicitly.
444
+
352
445
  ## Pair Skill
353
446
 
354
447
  The agent that executes these specs uses [[mastermind-task-executor]]. Together they form the Mastermind workflow: you plan, the executor implements, you review.
@@ -0,0 +1,287 @@
1
+ # Defect taxonomy
2
+
3
+ Each defect a subagent surfaces during workflow execution maps to a `kind:` key.
4
+ Planner uses the key to mechanically route to a fix template — no LLM judgment
5
+ needed for known cases. When a NEW defect surfaces that doesn't match any known
6
+ kind, the subagent marks it `kind: unclassified` and the planner promotes it
7
+ into a named entry here as part of the follow-up.
8
+
9
+ The full structured-report schema (what `kind:` slots into) lives alongside this
10
+ file as [`structured-report-schema.md`](structured-report-schema.md).
11
+
12
+ ## Executor stop kinds
13
+
14
+ ### `envelope_drift`
15
+
16
+ - **What**: Test asserts on the raw return value of `handle_tools_call`, but the
17
+ dispatcher wraps every successful payload in
18
+ `{ "content": [{ "type": "text", "text": "<serialized JSON>" }] }`. Field
19
+ comparisons against the wrapper always fail.
20
+ - **Surfaced as**: `assertion left == right` panic where `left` is a JSON object
21
+ and `right` is a sub-field that lives inside `content[0].text`.
22
+ - **Fix template**: Reuse the `unwrap_content` helper that lives in
23
+ `mcp/servers/mmcg/src/mcp.rs::tests` from task 001. Wrap every `handle_tools_call`
24
+ return with `unwrap_content(&v)` before asserting on fields. Do NOT redefine
25
+ the helper.
26
+ - **First observed**: Task 001 Phase 2.3.
27
+
28
+ ### `doc_surface_gap`
29
+
30
+ - **What**: Spec's Phase 3 (docs) covers fewer files than
31
+ `scripts/validate.py::validate_mmcg_tool_drift` enforces. Validator finds tool
32
+ names in `mcp.rs` but missing from one or more of: mmcg README, repo README,
33
+ `.claude-plugin/marketplace.json`, `plugins/mmcg/.claude-plugin/plugin.json`.
34
+ - **Surfaced as**: `python3 scripts/validate.py` exits non-zero with
35
+ `tool 'mmcg_X' missing — declared in mcp/servers/mmcg/src/mcp.rs but absent
36
+ from this file` (one error per missing surface).
37
+ - **Fix template**: Add the three missing surfaces to `expected_docs[]` in the
38
+ spec frontmatter, then add three Phase 3.x sub-steps with FIND/CHANGE TO blocks.
39
+ Pattern: `marketplace.json` and `plugin.json` each carry ONE prose `description`
40
+ string with the comma-separated tool list and `N tools` count; the repo
41
+ `README.md` carries TWO occurrences (table cell + standalone-crate paragraph).
42
+ Insert the new tool name before the trailing `status` entry in each list and
43
+ bump the count by 1.
44
+ - **First observed**: Task 001 Phase 4.
45
+
46
+ ### `zero_filter_verify`
47
+
48
+ - **What**: VERIFY command uses `cargo test --lib <module>::` (trailing `::`)
49
+ which cargo treats as a literal path that no test matches. Command exits 0
50
+ with zero tests run — false-positive "pass".
51
+ - **Surfaced as**: `cargo test ... <module>::` output reads `0 passed; 0 failed;
52
+ N filtered out` even though the module HAS tests.
53
+ - **Fix template**: Drop the trailing `::`. Use the bare module name as the
54
+ substring filter: `cargo test --lib <module>`. Cargo matches any test whose
55
+ path contains the substring.
56
+ - **First observed**: Task 001 Phase 1.3.
57
+
58
+ ### `verify_grep_false_positive`
59
+
60
+ - **What**: A VERIFY command pipes test output into `grep -q "test result: ok"`
61
+ (or similar benign-line match) to gate pass/fail. But that line can co-exist
62
+ with a failure: a module with both unit tests AND doctests prints one
63
+ `test result: ok` per harness, so a `grep -q "ok"` can match the passing
64
+ doctest summary while a unit-test `FAILED` line sits elsewhere in the output.
65
+ The gate reports pass even though a test failed. Most dangerous in stress
66
+ loops (`for i in seq …; grep -q ok`) where one masked failure per iteration
67
+ is invisible.
68
+ - **Surfaced as**: An auditor independently re-running the loop with an explicit
69
+ `grep "FAILED"` / non-zero-exit check finds failures the spec's own loop
70
+ missed; or a flaky test "passes" the loop but fails in CI.
71
+ - **Fix template**: Gate on the ABSENCE of failure, not the presence of a
72
+ benign line. Prefer `cargo test … && echo ok || fails=$((fails+1))` (uses
73
+ cargo's own exit code — non-zero on any failure), or grep for the failure
74
+ marker: `cargo test … 2>&1 | grep -q "FAILED" && fails=$((fails+1))`. Never
75
+ gate a loop solely on a positive `grep -q "ok"`.
76
+ - **First observed**: Task 006 Phase 3 (planner's stress loop used
77
+ `grep -q "test result: ok"`; auditor caught the maskability and re-verified
78
+ with an explicit FAILED-marker hunt — fix held, but the VERIFY pattern was
79
+ latently unsound).
80
+
81
+ ### `stale_pre_edit_snapshot`
82
+
83
+ - **What**: Spec's Pre-edit symbol snapshot or a Phase's FIND block claims a
84
+ function has visibility / signature X, but the on-disk function already has
85
+ visibility / signature Y. The FIND text doesn't appear in the file.
86
+ - **Surfaced as**: Executor returns `find_block_mismatch: <file> doesn't contain
87
+ the FIND text` for a phase that's nominally just a visibility change or
88
+ signature tweak.
89
+ - **Fix template**: Either (a) drop the phase entirely if the change is already
90
+ in place (the more common case — re-check whether the goal is satisfied by
91
+ the current state), or (b) update the FIND/CHANGE TO blocks to match the
92
+ actual current state. Re-capture the snapshot via
93
+ `./mcp/servers/mmcg/target/debug/mmcg query symbols-in-file <path>` before
94
+ rewriting.
95
+ - **First observed**: Task 002 Phase 1.5.
96
+
97
+ ### `seed_extractor_mismatch`
98
+
99
+ - **What**: Integration test hand-crafts an intermediate type (e.g. `PendingFile`
100
+ with placeholder `kind: "fn"`, hand-written `signature: "fn foo()"`) to seed
101
+ storage. The consumer-under-test re-derives the same type from real input via
102
+ a parser (e.g. tree-sitter via `extractor_for_path` + `parse_one`), which
103
+ produces a structurally-equivalent but byte-different shape
104
+ (`kind: "function"`, fully-qualified signature). Hash/compare assertions fail
105
+ even on semantically-identical input.
106
+ - **Surfaced as**: A round-trip test that should be a no-op returns a
107
+ "structural change" / "different" verdict; classifier or comparator is
108
+ correct, the seeding path is the bug.
109
+ - **Fix template**: Seed via the same pipeline the consumer uses. For mmcg
110
+ fingerprint / structural tests, call `crate::indexer::extractor_for_path`
111
+ followed by `crate::indexer::parse_one` on a real on-disk fixture, then pass
112
+ the resulting `PendingFile` to `commit_file`. Never construct intermediate
113
+ parser-output types by hand.
114
+ - **First observed**: Task 002 Phase 2.4.
115
+
116
+ ### `fmt_tension`
117
+
118
+ - **What**: Spec's verbatim Rust code blocks are line-wrapped for documentation
119
+ readability (e.g. multi-line `Vec::with_capacity(…)` calls, broken-out
120
+ `std::fs::write(…)` arg lists). Rustfmt collapses these. `cargo fmt --check`
121
+ fails even though `cargo test` passes — the diffs are cosmetic only.
122
+ - **Surfaced as**: `cargo fmt --check` exits non-zero with format-only diffs in
123
+ files the executor just wrote from spec FIND/CHANGE TO blocks; no semantic
124
+ divergence.
125
+ - **Fix template**: Default to (b) — add an explicit Rule to the spec
126
+ authorizing one `cargo fmt` normalization pass on touched files, with a note
127
+ that fmt may only collapse/expand whitespace and must not change logic. Use
128
+ (a) — re-author the spec blocks in rustfmt style preemptively — only for
129
+ surgical edits to a single function. Future planners SHOULD include the fmt
130
+ authorization Rule from the start on any spec that emits >50 LOC of Rust.
131
+ - **First observed**: Task 002 Phase 2.4.
132
+
133
+ ## Auditor discrepancy kinds
134
+
135
+ ### `scope_creep`
136
+
137
+ - **What**: `git diff --name-only HEAD` shows files NOT in the spec's `touches[]`
138
+ + `expected_docs[]` union.
139
+ - **Surfaced as**: Auditor's diff-vs-spec check enumerates files outside scope.
140
+ - **Fix template**: Either revert the out-of-scope edits or extend the spec's
141
+ scope (with rationale) and re-spawn the audit. Zero tolerance unless the
142
+ planner explicitly excepted (e.g. authorized `cargo fmt` normalize affects
143
+ format-only).
144
+ - **First observed**: (none — all 001/002 audits clean. Listed for completeness.)
145
+
146
+ ### `phase_not_in_diff`
147
+
148
+ - **What**: Executor marked Phase X as `[x]` complete but the phase's CHANGE TO
149
+ content isn't present in the file.
150
+ - **Surfaced as**: Auditor greps for canonical anchor strings from the CHANGE TO
151
+ block and finds nothing.
152
+ - **Fix template**: Investigate whether the executor lied or a later phase
153
+ reverted the change. Re-run that phase's VERIFY command in isolation.
154
+
155
+ ### `verify_failed_on_rerun`
156
+
157
+ - **What**: Auditor re-ran a VERIFY command that the executor reported as passing,
158
+ and it now fails.
159
+ - **Surfaced as**: Discrepancy entry with the verbatim re-run output.
160
+ - **Fix template**: Snapshot the environment diff (env vars, working directory,
161
+ locked dependencies). Almost always a flake or env-specific behavior; if not,
162
+ the executor's claim is suspect.
163
+
164
+ ### `snapshot_caller_drift`
165
+
166
+ - **What**: Pre-edit snapshot in spec said symbol X had N callers; post-execution
167
+ `mmcg query callers X` returns M ≠ N.
168
+ - **Surfaced as**: Auditor's drift check enumerates the delta.
169
+ - **Fix template**: Either the executor changed something out of scope (check
170
+ the diff for new call sites involving X), or the snapshot was wrong to start
171
+ with. If the latter, drop the snapshot's per-symbol claim and re-run the audit.
172
+
173
+ ### `snapshot_signature_drift`
174
+
175
+ - **What**: Symbol X's signature changed but the spec didn't authorize it (e.g.
176
+ spec said "public signature stays unchanged" but the diff shows a parameter
177
+ added).
178
+ - **Surfaced as**: Auditor compares pre-edit `mmcg query search X` signature
179
+ against post-edit.
180
+ - **Fix template**: Almost always a real contract violation. Stop, revert the
181
+ signature change, re-issue the phase preserving the original signature.
182
+
183
+ ### `validator_link_policy_gap`
184
+
185
+ - **What**: Spec's CHANGE TO content adds a relative markdown link from an
186
+ installable artifact (e.g. `agents/subagents/foo.md`) to a target that
187
+ escapes its installable package (e.g. `../../skills/workflow/bar/refs/x.md`).
188
+ `scripts/validate.py` warns: `installable file escapes package — link goes
189
+ N levels up (max 0 for this file class). Reference the artifact by name
190
+ instead`. Subagents and CLAUDE.md templates are flat-installed to
191
+ `~/.claude/agents/` and can't follow `../`-style paths there.
192
+ - **Surfaced as**: `python3 scripts/validate.py` exits 0 (errors clean) but
193
+ emits one warning per offending link. The spec's Phase 5 / Phase N VERIFY
194
+ treats `≥ 1 warning` as a failure depending on the spec's strictness rule.
195
+ - **Fix template**: Replace each cross-package relative markdown link with a
196
+ bare-name reference using the convention from
197
+ `feedback_artifact_references.md` in user memory: subagent → `name`, skill →
198
+ `/name`, doc reference → "X.md in <skill>'s references" or similar prose.
199
+ The LLM agent has the referenced artifact loaded; no path lookup needed.
200
+ Same-package relative links (within one skill tree, e.g.
201
+ `../mastermind-task-planning/references/…` from `mastermind-task-executor/SKILL.md`)
202
+ stay inside the installable package and pass the validator — only links
203
+ CROSSING the `agents/`↔`skills/` boundary or going > 0 levels up from a
204
+ subagent fall foul.
205
+ - **First observed**: Task 003 Phase 5.1 (executor stopped, planner promoted
206
+ the kind into the taxonomy in the same flight).
207
+
208
+ ### `verify_grep_window_too_small`
209
+
210
+ - **What**: Spec's VERIFY command uses `grep -A N "anchor" file | grep -c "phrase"`
211
+ to confirm a phrase landed inside an "anchor + first few lines" window, but
212
+ `N` is sized for the spec author's mental layout (e.g. "header, blank, heading,
213
+ one bullet" = 4 lines) while the on-disk file has more pre-existing content
214
+ between the anchor and the new phrase (e.g. multiple prior bullets in the
215
+ same group). The phrase is correctly added to the file but lives outside the
216
+ `-A N` window.
217
+ - **Surfaced as**: `grep -c` prints `0` even though the file contains the
218
+ phrase exactly as specified. `grep <phrase> <file>` confirms presence.
219
+ - **Fix template**: Drop the windowed grep and use the bare
220
+ `grep -c "<unique phrase>" <file>` form. Pick a phrase that's unique to the
221
+ new content so the count remains 1 even on whole-file scan. Only keep `-A N`
222
+ when the anchor → phrase distance is short AND constant across spec authors
223
+ (e.g. immediately-following H2 with first paragraph).
224
+ - **First observed**: Task 003 Phase 6.1 (planner sized `-A 4` for 2 bullets;
225
+ by execution time there were already 2 prior bullets pushing the new one to
226
+ line 6).
227
+
228
+ ### `verify_count_vs_change_to_mismatch`
229
+
230
+ - **What**: Spec's VERIFY command expects N occurrences of a phrase
231
+ (e.g. `grep -c "foo" file ≥ 2`), justified by a comment like "one in the
232
+ heading, one in the body". But the spec's prescribed CHANGE TO body
233
+ actually contains the phrase fewer times — the body refers to it
234
+ pronoun-style (`this kind:`, `it`, `the same`) instead of respelling. After
235
+ verbatim application, grep returns < N and VERIFY fails on a self-consistency
236
+ bug.
237
+ - **Surfaced as**: `grep -c` count is less than spec's claimed minimum;
238
+ CHANGE TO body was applied byte-for-byte; planner-side prose self-mismatch.
239
+ - **Fix template**: Prefer (a) — edit the CHANGE TO body to spell the phrase
240
+ explicitly where the VERIFY count assumed it would appear (e.g. swap
241
+ `with this \`kind:\`` for `of kind \`foo\``). That's also more useful for
242
+ taxonomy / doc consumers grepping for the literal term. Alternative (b) —
243
+ relax the VERIFY count to match the actual prescribed body. Default to (a)
244
+ for taxonomy / reference docs where literal grep-ability is valuable.
245
+ - **First observed**: Task 005 Phase 2.1 (planner wrote "this `kind:`" in
246
+ Fix template body but VERIFY expected 2 occurrences of the kind name).
247
+
248
+ ### `premature_terminal_temptation`
249
+
250
+ - **What**: Planner is drafting a "task done" / "shipped" / "сделано"
251
+ message to the user without first ensuring (a) `mastermind-auditor` was
252
+ spawned, (b) verdict tail is in conversation with `verdict: held`, and
253
+ (c) planner's semantic review is documented. Often paired with the
254
+ rationalization "the executor's report looks clean, the audit will
255
+ surely pass".
256
+ - **Surfaced as**:
257
+ - Planner catches themselves mid-draft (tier 1)
258
+ - Planner has Drift/Broken verdict and is tempted to "explain it away"
259
+ as a non-issue to user (tier 2)
260
+ - User explicitly asks the planner to skip the auditor and just declare
261
+ done (tier 3 — refusal + explicit override recording)
262
+ - **Fix template**: Apply the SKILL's `Premature-terminal escalation tiers`
263
+ section. Tier 1 → just spawn the auditor and wait. Tier 2 → refuse, fix
264
+ the discrepancies, re-audit. Tier 3 → refuse, explain `_lessons.md`
265
+ precedent, name the override explicitly in the conversation transcript,
266
+ and append a `[auto]` `_lessons.md` entry of kind `premature_terminal_temptation`.
267
+ - **First observed**: Workflow convention — pre-emptively named in task 005
268
+ ahead of real instances, so planners have a routing key on the day they
269
+ catch themselves. If you (future planner) ARE the first real instance,
270
+ update `First observed` here to your task number.
271
+
272
+ ### `unclassified`
273
+
274
+ - **What**: A defect that doesn't match any kind above.
275
+ - **Surfaced as**: Subagent emits `kind: unclassified` with a verbatim `details:`
276
+ description.
277
+ - **Fix template**: Read the verbatim details, design the fix manually. After
278
+ the task lands, promote this defect into a named entry in this taxonomy via a
279
+ follow-up spec (or a direct doc PR — taxonomy edits don't need their own
280
+ spec). The `[auto]` `_lessons.md` entry from `mmcg audit-spec` is a good
281
+ starting point for the writeup.
282
+
283
+ ## Status (no defect)
284
+
285
+ When NO defect applies → `kind: clean` and the workflow proceeds normally.
286
+ Empty `defects: []` / `discrepancies: []` arrays in the structured tail also
287
+ indicate the clean case.
@@ -0,0 +1,141 @@
1
+ # Structured report schema
2
+
3
+ Every `mastermind-task-executor` and `mastermind-auditor` reply emits a
4
+ fenced-YAML "structured tail" alongside its markdown prose. The tail is wrapped
5
+ in HTML-comment sentinels so the planner can extract it deterministically with
6
+ a single regex.
7
+
8
+ The defect `kind:` vocabulary is the closed set defined in
9
+ [`defect-taxonomy.md`](defect-taxonomy.md). Subagents MUST pick a listed kind
10
+ or use `kind: unclassified` as the escape hatch.
11
+
12
+ ## Executor tail
13
+
14
+ Emitted at the very end of the executor's reply, after the prose sections
15
+ (Phases completed / Verification results / Files modified / Stopped because /
16
+ What I did NOT do). Format:
17
+
18
+ ````markdown
19
+ <!-- mastermind:report-begin -->
20
+ ```yaml
21
+ spec: .mastermind/tasks/<NNN>-<name>/spec.md
22
+ status: complete | partial | failed
23
+ phases:
24
+ - id: "1.1"
25
+ status: done # done | pending | stopped_here | skipped
26
+ - id: "1.2"
27
+ status: done
28
+ - id: "2.4"
29
+ status: stopped_here
30
+ files_modified:
31
+ - mcp/servers/mmcg/src/store.rs
32
+ - mcp/servers/mmcg/src/fingerprint.rs
33
+ defects:
34
+ - kind: envelope_drift
35
+ phase: "2.4"
36
+ details: |
37
+ Test asserted on the raw `handle_tools_call` return, but the dispatcher
38
+ wraps every payload in `{ "content": [{ "type": "text", "text": <json> }] }`.
39
+ `cosmetic["class"]` is therefore not the field the assertion expects.
40
+ remediation_hint: |
41
+ Reuse `unwrap_content` from `mcp.rs::tests` (task 001). Replace
42
+ `let cosmetic = read_env;` with `let cosmetic = unwrap_content(&read_env);`.
43
+ verifications:
44
+ - cmd: "cd mcp/servers/mmcg && cargo test --locked --lib"
45
+ result: pass
46
+ - cmd: "cd mcp/servers/mmcg && cargo test --locked --lib change_class"
47
+ result: fail
48
+ output_excerpt: "thread '...' panicked at ..."
49
+ ```
50
+ <!-- mastermind:report-end -->
51
+ ````
52
+
53
+ ### Field meanings
54
+
55
+ - `spec`: absolute path to the spec file the executor is implementing.
56
+ - `status`:
57
+ - `complete` — every phase landed, every Final-verification command exited 0
58
+ - `partial` — at least one phase done, executor stopped before reaching Phase N
59
+ - `failed` — Phase 1 couldn't even start (FIND mismatch on the first
60
+ sub-step, environment broken, etc.)
61
+ - `phases[].status`:
62
+ - `done` — phase's CHANGE TO content is in the file AND its VERIFY exited 0
63
+ - `pending` — not yet attempted in this execution
64
+ - `stopped_here` — the executor halted at this phase; populate the matching
65
+ `defects[]` entry with details
66
+ - `skipped` — planner explicitly dropped this phase mid-flight (e.g. Phase
67
+ 1.5 in task 002); list it for traceability
68
+ - `files_modified`: every path the executor's edits touched, relative to repo
69
+ root. Must match `git diff --name-only HEAD` + untracked-new-files; this is
70
+ the auditor's scope-creep anchor.
71
+ - `defects[]`: zero or more defects. Empty array = clean run. Each entry MUST
72
+ populate `kind` from the closed set in `defect-taxonomy.md` (or
73
+ `unclassified`), `phase` of the failure, verbatim `details`, and a
74
+ `remediation_hint` the planner can apply.
75
+ - `verifications[]`: every VERIFY command run, in execution order. Truncate
76
+ `output_excerpt` to ~5 lines of the relevant error/diff.
77
+
78
+ ## Auditor tail
79
+
80
+ Emitted at the very end of the auditor's reply. Format:
81
+
82
+ ````markdown
83
+ <!-- mastermind:audit-begin -->
84
+ ```yaml
85
+ spec: .mastermind/tasks/<NNN>-<name>/spec.md
86
+ verdict: held | drift | broken
87
+ files_in_scope: 7
88
+ files_in_diff: 7
89
+ scope_match: true
90
+ discrepancies:
91
+ - kind: snapshot_caller_drift
92
+ symbol: SessionStore
93
+ spec_says: 45
94
+ index_says: 38
95
+ evidence: "git diff shows 7 callsites removed in src/api/*"
96
+ snapshot_drift:
97
+ - symbol: commit_file
98
+ pre_callers: 2
99
+ post_callers: 2
100
+ pre_signature: "pub fn commit_file(&mut self, pending: PendingFile) -> SqlResult<()>"
101
+ post_signature: "pub fn commit_file(&mut self, pending: PendingFile) -> SqlResult<()>"
102
+ delta: none
103
+ verifications_rerun:
104
+ - cmd: "cd mcp/servers/mmcg && cargo test --locked --lib"
105
+ result: pass
106
+ ```
107
+ <!-- mastermind:audit-end -->
108
+ ````
109
+
110
+ ### Field meanings
111
+
112
+ - `verdict`:
113
+ - `held` — every claim in the executor report survived independent
114
+ verification; zero discrepancies
115
+ - `drift` — partial drift; at least one discrepancy, none critical (warnings,
116
+ minor scope creep, snapshot deltas with explanation)
117
+ - `broken` — at least one critical discrepancy (scope creep without
118
+ explanation, verify failed on re-run, signature drift that contradicts the
119
+ spec's stated invariants)
120
+ - `discrepancies[]`: every finding that contributed to a non-`held` verdict.
121
+ Each MUST use a `kind:` from the auditor section of `defect-taxonomy.md`.
122
+ - `snapshot_drift[]`: one entry per symbol in the spec's Pre-edit symbol
123
+ snapshot, with pre/post caller counts and signatures and a `delta:` summary
124
+ (`none` | `gained` | `lost` | `signature_changed`).
125
+
126
+ ## Planner consumption
127
+
128
+ The planner (running `mastermind-task-planning` SKILL) extracts the tail with a
129
+ simple regex on the chat reply:
130
+
131
+ ```text
132
+ <!-- mastermind:report-begin -->\n```yaml\n(?P<body>.*?)\n```\n<!-- mastermind:report-end -->
133
+ ```
134
+
135
+ Then parses `body` as YAML. For each `defects[]` entry, the planner reads the
136
+ `kind:`, looks up the matching entry in `defect-taxonomy.md`, applies the named
137
+ fix template, and re-spawns the executor with the patched spec. This replaces
138
+ the manual prose-reading the planner did in tasks 001 and 002.
139
+
140
+ When `defects: []` and `status: complete`, the planner proceeds to spawn the
141
+ auditor.