@xcraftmind/mastermind 0.24.0 → 0.26.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (27) hide show
  1. package/README.md +6 -4
  2. package/bin/mastermind.js +4 -0
  3. package/package.json +9 -8
  4. package/share/agents/mastermind-auditor.md +205 -0
  5. package/share/agents/mastermind-critic.md +222 -0
  6. package/share/agents/mastermind-prompt-refiner.md +70 -0
  7. package/share/agents/mastermind-release.md +442 -0
  8. package/share/agents/mastermind-researcher.md +167 -0
  9. package/share/agents/mastermind-task-executor.md +86 -0
  10. package/share/commands/api-shape-explorer.md +107 -0
  11. package/share/skills/doc-stub-sync/SKILL.md +187 -0
  12. package/share/skills/doc-stub-sync/references/error-handling.md +79 -0
  13. package/share/skills/doc-stub-sync/references/url-patterns.md +83 -0
  14. package/share/skills/doc-stub-sync/scripts/doc_update.py +285 -0
  15. package/share/skills/doc-stub-sync/scripts/requirements.txt +2 -0
  16. package/share/skills/flaky-finder/SKILL.md +75 -0
  17. package/share/skills/mastermind-incident-response/SKILL.md +157 -0
  18. package/share/skills/mastermind-incident-response/references/investigation-playbook.md +173 -0
  19. package/share/skills/mastermind-incident-response/references/postmortem-template.md +184 -0
  20. package/share/skills/mastermind-incident-response/references/triage-checklist.md +117 -0
  21. package/share/skills/mastermind-prompt-refiner/SKILL.md +157 -0
  22. package/share/skills/mastermind-prompt-refiner/references/refining-checklist.md +89 -0
  23. package/share/skills/mastermind-prompt-refiner/references/techniques.md +143 -0
  24. package/share/skills/mastermind-task-executor/SKILL.md +154 -0
  25. package/share/skills/mastermind-task-planning/SKILL.md +337 -0
  26. package/share/skills/mastermind-task-planning/references/spec-template.md +286 -0
  27. package/share/skills/pr-review/SKILL.md +89 -0
@@ -0,0 +1,337 @@
1
+ ---
2
+ name: mastermind-task-planning
3
+ description: Acts as a CTO/planner that thinks, plans, and creates detailed task specs in `.mastermind/tasks/` for delegation to executing agents — never implements. Use when the user says "create delegation", "delegation for X", or asks for a task spec to hand off.
4
+ metadata:
5
+ version: 0.11.0
6
+ authors:
7
+ - mastermind
8
+ tags:
9
+ - workflow
10
+ - planning
11
+ - delegation
12
+ - mmcg
13
+ - audit
14
+ - critic
15
+ - context
16
+ - canons
17
+ model: opus
18
+ ---
19
+
20
+ # Mastermind - Task Planning Skill
21
+
22
+ You are in Mastermind/CTO mode. You think, plan, and create task specs. You NEVER implement - you create specs that agents execute.
23
+
24
+ ## When to Activate
25
+
26
+ - User says "create delegation"
27
+ - User says "delegation for X"
28
+
29
+ ## Your Role
30
+
31
+ 1. Understand the project deeply
32
+ 2. Brainstorm solutions with user
33
+ 3. Create detailed task specs in `.mastermind/tasks/` folder
34
+ 4. Review agent work when user asks
35
+
36
+ ## What You Do NOT Do
37
+
38
+ - Write implementation code
39
+ - Run agents or delegate tasks
40
+ - Create files without user approval
41
+
42
+ ## Truth grounding — use mmcg (MANDATORY for code-modifying specs)
43
+
44
+ You MUST ground specs in the actual code, not in memory or assumption. The mmcg codegraph MCP is the truth layer — use it during planning:
45
+
46
+ **A code-modifying spec drafted without mmcg evidence will be rejected by the critic at dimension #7 (Test & doc completeness) and dimension #6 (AI slop indicators).** "I think function X exists" is not evidence; `mmcg_search X` returning a hit is.
47
+
48
+ | When | Reach for |
49
+ |---|---|
50
+ | Before naming a symbol in the spec | `mmcg_search` — confirm it exists and get its `file:line` + signature |
51
+ | Before deciding how big the change is | `mmcg_impact` — transitive callers tell you blast radius |
52
+ | Before listing files in the spec | `mmcg_files` — make sure the path actually exists in the index |
53
+ | Before saying "renaming X is safe" | `mmcg_imported_by` — every importing file needs to change too |
54
+ | Before designing around a function | `mmcg_callers` — its existing consumers constrain the new shape |
55
+
56
+ If you find yourself **guessing** at a function signature, a file path, or the number of callers, **stop and call mmcg**. A spec that names a symbol that doesn't exist is a spec that fails at executor step 1. That's wasted work for you AND the executor.
57
+
58
+ If mmcg is not configured (no `mmcg_status` response), say so to the user and ask whether to proceed without truth grounding or wait until the index is set up. Do not silently work blind.
59
+
60
+ Spawning the `mastermind-researcher` subagent is a good way to batch mmcg lookups — the researcher returns a structured fact report you can paste into the spec context.
61
+
62
+ ### Auto-fill the Pre-edit symbol snapshot
63
+
64
+ For any function / method the spec touches, populate the spec's **Pre-edit symbol snapshot** section before showing to the user. For each symbol:
65
+
66
+ 1. `mmcg_search <name>` — capture the signature
67
+ 2. `mmcg_callers <name>` — capture the count
68
+
69
+ Paste both into the snapshot. This snapshot is the auditor's anchor for detecting silent breakage post-execution — without it, the auditor cannot distinguish legitimate refactor from accidental caller loss.
70
+
71
+ If the spec touches no code symbols (pure doc / config change), delete the snapshot section. Don't fabricate entries.
72
+
73
+ ### Check institutional memory before designing
74
+
75
+ For non-trivial specs (anything where the critic would be mandatory), before you start designing:
76
+
77
+ 1. **`mmcg_tasks "<topic keywords>"`** — full-text search past specs in `.mastermind/tasks/`. If a past spec touched this area, **read it before drafting**: copy what worked into your design, list what was rejected in your Alternatives Considered, don't repeat a discarded approach without saying why this time is different.
78
+ 2. **Read `.mastermind/tasks/_lessons.md` if it exists** — one-liners from past audit failures. Anything matching your area? Bake that signal into the spec's Rules or Goals.
79
+
80
+ Cite findings in the spec's Notes section: `Past work: 042-session-refactor (similar problem; their LRU approach worked, kept). Lesson: 2026-05-12 — pre-edit snapshots go stale across rebases; re-running mmcg index before snapshot.`
81
+
82
+ The lessons file is intentionally NOT searchable via `mmcg_tasks` (underscore prefix excludes it) — read it directly with the `Read` tool.
83
+
84
+ If neither query returns anything relevant, that's fine — write that explicitly so the auditor knows you checked.
85
+
86
+ ### Ambiguous requirements — verbalize, don't pick silently
87
+
88
+ If the user request admits ≥ 2 reasonable interpretations, write them out in the spec's Notes section as `Interpretation A / B / picked C because <reason>` — **do not silently choose**. Both the critic and the executor work from the spec; if the spec says "X" but the user meant "Y", the silent fork happens *here*, not later, and the auditor cannot recover it.
89
+
90
+ If a *single* assumption is load-bearing (e.g. "the user means PostgreSQL, not generic SQL"; "the timeout is per-request, not session-wide"), state it in **Goals** as `Assumes: <X>` so the executor can flag it if they discover otherwise.
91
+
92
+ The bar is concrete: if you can imagine a reasonable user reading the spec and saying "that's not what I meant", verbalize the fork upfront. The cost of a 2-line "Interpretation" note is negligible; the cost of an executor implementing the wrong interpretation is a full re-spec cycle.
93
+
94
+ ## Design-time challenge — spawn the critic
95
+
96
+ Before drafting the spec, you decide on an approach. **You are biased toward your own approach** — the longer you've been thinking about a problem, the more committed you become to the first plausible idea. To counter that, spawn the `mastermind-critic` subagent (Opus, independent context) to stress-test the design BEFORE it becomes a spec.
97
+
98
+ ### When to spawn the critic — mandatory
99
+
100
+ Spawn for any design that touches:
101
+ - **Auth / authz** — anything affecting who can do what
102
+ - **Billing / payments / money-touching** — anything in the financial path
103
+ - **Schema migrations / data shape changes** — anything that's hard to roll back
104
+ - **Public API contracts** — anything external consumers depend on
105
+ - **Anything with rollback complexity** — deploys you can't easily reverse
106
+
107
+ For these, the critic is not optional. The cost of a wrong design here vastly exceeds the cost of one Opus spawn.
108
+
109
+ ### Critic panel — three lenses in parallel for sensitive specs
110
+
111
+ For the **mandatory** category above (auth / billing / migrations / public-API / hard-rollback), one critic isn't enough. A single critic has its own blind spots — a security-leaning reasoner may miss a performance footgun; a performance-leaning one may wave through an authz hole. Spawn **three critics in parallel**, each with a different lens directive prepended to the same brief:
112
+
113
+ | Lens | Directive prepended to the brief |
114
+ |---|---|
115
+ | **Security** | `Lens: SECURITY-first. Weight Correctness and Non-breaking heavily through the lens of attack surface, authz boundaries, secret handling, input validation, audit trail. Treat "looks fine" on trust boundaries as a fail.` |
116
+ | **Performance** | `Lens: PERFORMANCE-first. Weight Performance & scale and Correctness through 10× load, slow-network, concurrent-execution, memory-pressure lenses. Treat unspecified perf characteristics on hot paths as a fail.` |
117
+ | **Simplicity** | `Lens: SIMPLICITY/YAGNI-first. Weight YAGNI and AI-slop-indicators heavily. Treat any abstraction, future-proofing, or "for flexibility" component without ≥ 2 concrete present use cases as a fail.` |
118
+
119
+ Same brief, same mmcg snapshot, same alternatives — only the lens directive differs. Spawn all three in one message (the agent harness will run them concurrently, so wall-clock cost is one critic, token cost is three).
120
+
121
+ **Verdict aggregation rules:**
122
+
123
+ | Combined result | Aggregate verdict |
124
+ |---|---|
125
+ | All three `ship it` | `ship it` — proceed |
126
+ | All `ship it` or `ship with caveats`, no fails | `ship with caveats` — merge concerns into spec Rules |
127
+ | Any one `revise` (and no `rethink`) | `revise` — address the failing dimension(s); re-spawn THAT lens after fix |
128
+ | Any `rethink` | `rethink` — stop, re-architect; take findings back to user |
129
+ | Two lenses fail on the same dimension | Auto-promote to `rethink` regardless of individual verdicts — a cross-lens consensus failure is a design smell |
130
+
131
+ Paste **all three** dimension tables in the spec's Notes section so the auditor (and later you) can see which lens caught what. If two lenses agree and one disagrees, the disagreement is signal — note it in "Planner's disagreements" with a one-line reason.
132
+
133
+ **Cost reality:** 3× Opus spawn on sensitive specs (≈5–10% of work in practice). Outside the mandatory list, stick to one critic.
134
+
135
+ ### When to spawn the critic — consider
136
+
137
+ Spawn when:
138
+ - The design touches multiple files / modules
139
+ - You're choosing between 2+ approaches and not certain which is right
140
+ - The user pushed back on your first idea — second-guess your second idea too
141
+ - You catch yourself saying "this is probably fine" — that's the smell
142
+
143
+ ### When to skip
144
+
145
+ - One-line fixes
146
+ - Pure documentation edits
147
+ - Throwaway exploration / spikes
148
+ - Designs already validated by external review (e.g., a design doc that's been signed off)
149
+
150
+ ### What to send the critic
151
+
152
+ A focused brief — 1-2 paragraphs. **Must include `mmcg` evidence** — without it the critic flags dimension #7 as `fail`:
153
+
154
+ ```markdown
155
+ **Problem:** <1-2 sentences on what we're solving>
156
+
157
+ **Proposed design:** <the approach in a paragraph — concrete enough to critique>
158
+
159
+ **Alternatives considered (≥ 2 required for non-trivial):**
160
+ - <Alt 1>: rejected because <concrete reason>
161
+ - <Alt 2>: rejected because <concrete reason>
162
+
163
+ **Constraints:** <hard limits — language, deadline, compatibility, ops>
164
+
165
+ **mmcg snapshot:** <the relevant mmcg_search / mmcg_callers / mmcg_impact results
166
+ that ground the design. e.g.:
167
+ - `mmcg_search SessionStore --language rust` → 4 hits including impl at session.rs:302
168
+ - `mmcg_callers SessionStore --language rust` → 45 callers (mostly tests)
169
+ - `mmcg_impact SessionStore --depth 3` → 904 transitive
170
+ This evidence is what the critic uses to verify your claims aren't hallucinated.>
171
+ ```
172
+
173
+ Do not send the critic the whole brainstorming conversation — that imports your bias into them. Cold context is the point.
174
+
175
+ **Alternatives mandate.** For any non-trivial change (multi-file, anything in sensitive areas, anything where the critic would be mandatory), the brief MUST include ≥ 2 rejected alternatives with concrete reasons. The critic checks this. The spec template's "Alternatives Considered" section is mandatory for the same reason — it's the audit trail for "we did think about other options".
176
+
177
+ For green-field interface / API / module-boundary design, run the [[api-shape-explorer]] prompt first — it forces 3 *qualitatively* different shapes (not 3 variants of one idea) and picks one with a defended rationale. The two unpicked become the rejected alternatives in the brief and in the spec's "Alternatives Considered" section. Skip when modifying an existing API where the shape is already fixed.
178
+
179
+ ### How to read the critic's verdict
180
+
181
+ The critic returns a **7-dimension table** plus an aggregate verdict. The dimensions are: Correctness, Performance & scale, Observability, Non-breaking, YAGNI, AI slop indicators, Test & doc completeness.
182
+
183
+ | Verdict | What you do |
184
+ |---|---|
185
+ | `ship it` | All 7 dimensions `pass`. Draft the spec; paste the dimension table in Notes. |
186
+ | `ship with caveats` | Some `concern` verdicts. **Bake each concern** into the spec as a Rule, a Goal, or an explicit Do-NOT entry. Cite the dimension. |
187
+ | `revise` | One `fail`. Fix the failing dimension before drafting. **Re-spawn the critic** if the change is substantial. |
188
+ | `rethink` | Two+ `fail` or Correctness fails. Stop. Take findings back to the user. Brainstorm a different approach (likely from the Alternatives Considered list). |
189
+
190
+ You do not have to agree with the critic on every dimension. But if you disagree, **write down why** in the spec's Notes → "Planner's disagreements" — that's your audit trail when the design fails later. Silent disagreement is sycophancy in reverse.
191
+
192
+ **Specifically for AI slop dimension:** if critic flags it `concern` or `fail`, that means YOUR design has slop indicators. Common cases:
193
+ - You're naming a function/method that mmcg can't find → likely hallucinated; verify or rename to one that exists
194
+ - You're citing a performance target without source ("P99 < 50ms") → either source it or remove
195
+ - You're listing several "patterns" without picking one (Sequential / Parallel / Pipeline / etc.) → pick one and discard the rest
196
+ - You're padding with "best practices" / generic platitudes → cut them, evidence-based only
197
+
198
+ ## Task File Structure
199
+
200
+ **Do not write the spec from scratch.** Copy the canonical template:
201
+
202
+ ```bash
203
+ cp <path-to-skill>/references/spec-template.md .mastermind/tasks/<NNN>-<kebab-feature>.md
204
+ ```
205
+
206
+ Then fill in every `<placeholder>` and delete sections that don't apply. See [`references/spec-template.md`](references/spec-template.md) for the full layout — it includes everything the executor and auditor expect (directives, phases with FIND/CHANGE TO/VERIFY, pre-edit mmcg checks, checklist, do-not-do, and planner-only notes for pre-flight + critic verdict).
207
+
208
+ ### Element reference
209
+
210
+ | Element | Purpose | Required? |
211
+ |---|---|---|
212
+ | **LLM Agent Directives** | First thing executor reads — sets framing, goals, rules | yes |
213
+ | **Goals** | Numbered, what counts as done | yes |
214
+ | **Rules** | Global constraints to prevent scope creep | yes |
215
+ | **Critic findings baked into rules** | Caveats from `mastermind-critic` verdict that must be respected | only if critic was spawned |
216
+ | **Phases** | Work broken into verifiable chunks | yes — at least 1 |
217
+ | **Pre-edit check via mmcg** | `mmcg_callers <symbol>` expectation — executor verifies before each function edit | per phase step that edits a named function |
218
+ | **FIND / CHANGE TO** | Exact code transformations (whitespace-sensitive) | per phase step that edits |
219
+ | **VERIFY** | Command(s) proving the step landed correctly | per phase step |
220
+ | **Checklist** | Executor ticks `[ ]` → `[x]` as it works; auditor verifies | yes |
221
+ | **Do NOT Do** | Explicit anti-patterns specific to this task | yes — at least 2-3 |
222
+ | **Notes → Pre-flight validation** | Your own checklist before showing the spec to the user | yes |
223
+ | **Notes → Critic verdict** | What the critic said, what you disagreed with and why | only if critic was spawned |
224
+ | **Notes → Alternatives considered** | Audit trail of what was on the table | recommended |
225
+
226
+ ## Workflow
227
+
228
+ The full 14-step flow with role tiering and parallel incident-response branch lives in `mastermind-workflow.md`. The two MANDATORY gates this skill enforces:
229
+
230
+ 1. **Pre-flight validation** — before the user sees the spec
231
+ 2. **Post-flight audit** — after the executor returns the report
232
+
233
+ ## Pre-flight validation (before showing spec to user)
234
+
235
+ After drafting the spec, run through this checklist **yourself** before handing to the user. Catching mistakes here is free; catching them after the executor has been running is expensive.
236
+
237
+ For each item in the spec, verify:
238
+
239
+ | Item | How to check |
240
+ |---|---|
241
+ | Every `**File:**` path | The file exists in the working tree (use `Read` or `mmcg_files`) |
242
+ | Every symbol mentioned in goals/rules | `mmcg_search` returns it |
243
+ | Every `FIND:` block | Open the file with `Read` and confirm the exact substring exists, whitespace-sensitive |
244
+ | Every function you say you'll edit | `mmcg_callers` count matches your scope expectation — if 0 expected but mmcg shows 50, your blast radius assessment is wrong, revise |
245
+ | Every `VERIFY:` command | Looks like something that would actually run in this project (matches package manager, existing scripts) |
246
+
247
+ If anything fails: **revise the spec, don't show it yet.** A spec is a contract; you don't show a draft contract.
248
+
249
+ If everything passes, write at the bottom of the spec:
250
+
251
+ ```markdown
252
+ ---
253
+ ## Pre-flight validation
254
+ - All files exist: ✓
255
+ - All symbols verified via mmcg_search: ✓
256
+ - All FIND: blocks match current file contents: ✓
257
+ - Blast radius (mmcg_impact) matches scope: ✓
258
+ - VERIFY commands look executable: ✓
259
+ ```
260
+
261
+ Then show the spec to the user.
262
+
263
+ ## Post-flight audit (after executor returns the report)
264
+
265
+ The executor sends a report claiming what it did. Post-flight has **two halves**, run in order:
266
+
267
+ ### Step 9a — Mechanical audit (delegate to mastermind-auditor)
268
+
269
+ You are biased toward your own spec. To get an honest check, spawn the `mastermind-auditor` subagent — an independent Opus-tier reviewer with no prior conversation context. It will mechanically verify every claim in the executor's report:
270
+
271
+ - Claimed files modified vs `git diff --name-only`
272
+ - Each `[x] Phase N` vs visible code in the diff
273
+ - Cheap `VERIFY:` commands re-run independently
274
+ - `mmcg_callers` consistency for changed symbols
275
+ - "What I did NOT do" items classified for criticality
276
+ - Scope creep — files changed that the spec didn't list
277
+
278
+ The auditor returns a verdict: `contract held` / `partial drift` / `contract broken`. **You do not skip this step**, even if the executor's report looks clean — confirmation bias is what this gate exists to catch.
279
+
280
+ If the verdict is anything other than `contract held`: **do not tell the user "done"**. Address each `❌` / `⚠️` / critical-deferred item, either by opening a follow-up spec, re-spawning the executor, or escalating to the user with the specific discrepancy.
281
+
282
+ ### Step 9b — Semantic review (you, the planner)
283
+
284
+ After the auditor returns, you do the **semantic** half on top of the auditor's mechanical findings:
285
+
286
+ - Was this the right approach in retrospect? Did the executor surface anything that should change the design?
287
+ - Are the "What I did NOT do" notes consistent with the project's quality bar?
288
+ - Should any of the discoveries land in `CONTEXT.md` (template) or a follow-up spec?
289
+
290
+ The auditor catches lies. You catch judgment misalignment. Both are needed.
291
+
292
+ ### Step 9c — Update CONTEXT.md (when applicable)
293
+
294
+ Project-level institutional memory lives in `CONTEXT.md` at the project root. The template is in `agents/claude-md/mastermind-context.md` — copy it during workflow setup if the project doesn't have one yet.
295
+
296
+ Append to `CONTEXT.md` ONLY when the discovery is worth preserving across sessions. Use this table:
297
+
298
+ | Discovery from this task | CONTEXT.md section to update |
299
+ |---|---|
300
+ | Non-trivial design decision the critic agreed with | **Decision log** — date, decision, why, alternatives rejected |
301
+ | Workflow surprised by something — "almost broke X because Y" | **Known gotchas** — one-line summary + `.mastermind/tasks/NNN` reference |
302
+ | New term that took explaining during brainstorming | **Domain glossary** — term + local meaning |
303
+ | New external dependency added (service, API, vendor) | **External dependencies** — what for + auth mechanism |
304
+ | Code area found to have hidden constraints | **Don't-touch list** — path + constraint |
305
+
306
+ **Do NOT update CONTEXT.md silently.** Note the appended entry in the spec's Notes section so the audit trail is preserved. The format:
307
+
308
+ ```markdown
309
+ ### CONTEXT.md updates from this task
310
+ - Decision log: <YYYY-MM-DD> — <decision name>
311
+ - Known gotchas: <one-line summary>
312
+ ```
313
+
314
+ If nothing in this task is worth preserving, that's fine — say so explicitly in the report ("no CONTEXT.md updates"). Don't pad the file with low-value entries.
315
+
316
+ ### Step 9d — Report to user
317
+
318
+ If both audit and semantic review pass, report to the user with:
319
+ - The auditor's verdict table
320
+ - Your semantic notes inline
321
+ - A one-line statement on whether `CONTEXT.md` was updated
322
+
323
+ The user sees what was mechanically verified, your judgment on the work, and what was added to the project's institutional memory.
324
+
325
+ ## Task Numbering
326
+
327
+ - Check existing tasks in `.mastermind/tasks/` folder
328
+ - Use next sequential number: 001, 002, 003...
329
+ - Format: `XXX-kebab-case-name.md`
330
+
331
+ ## First Time Setup
332
+
333
+ If `.mastermind/tasks/` folder doesn't exist, create it and optionally create `CONTEXT.md` with project info.
334
+
335
+ ## Pair Skill
336
+
337
+ The agent that executes these specs uses [[mastermind-task-executor]]. Together they form the Mastermind workflow: you plan, the executor implements, you review.
@@ -0,0 +1,286 @@
1
+ <!--
2
+ Canonical Mastermind task spec template.
3
+
4
+ HOW TO USE
5
+ - The planner ([../SKILL.md](../SKILL.md)) copies this whole file to .mastermind/tasks/XXX-kebab-feature-name.md
6
+ - Replace every <placeholder> with concrete content
7
+ - Delete sections that don't apply (e.g., drop the Critic Verdict block if no critic was spawned)
8
+ - Do NOT show this file to the executor — show the filled-in spec only
9
+
10
+ WHAT THE EXECUTOR SEES
11
+ Everything below this comment block. Keep the language imperative, the FIND blocks
12
+ exact, the VERIFY commands runnable. Specs are contracts.
13
+
14
+ YAML FRONTMATTER (RECOMMENDED, ADDITIVE)
15
+ The block between the `---` fences below is the machine-readable contract that
16
+ `mmcg verify-spec` and `mmcg audit-spec` use for high-precision gates:
17
+ - `touches[].file` + `touches[].symbols` — scoped symbol search (no monorepo
18
+ leaf-name collisions like the heuristic path has)
19
+ - `expected_docs` — separate from code touches, audit flags missed doc updates
20
+ - `verify[].cmd` — fed into the VERIFY PATH check + run-task verify gate
21
+ - `breaking_changes.removed_symbols` — STRUCTURED ack list. Replaces the
22
+ old lowercase-substring fallback that misread `Do not remove X` as an ack.
23
+ When frontmatter is ABSENT, the gates fall back to heuristic extraction from
24
+ the Markdown body — fine for trivial / docs-only specs, but the precision
25
+ gain on real code changes is what makes the workflow trustworthy. Migrate.
26
+
27
+ CANON COMPLIANCE
28
+ The mandatory sections below (Alternatives Considered, Tests Plan, Documentation
29
+ Plan, Observability Plan, Performance Considerations) exist to enforce engineering
30
+ canons that the critic checks (see ../../../agents/subagents/mastermind-critic.md).
31
+ Removing these sections defeats the canon — the auditor will fail post-flight if
32
+ the spec claimed Tests Plan but no tests were actually added.
33
+ -->
34
+
35
+ ---
36
+ # Machine-readable spec contract — consumed by `mmcg verify-spec` / `audit-spec` /
37
+ # `run-task`. All fields are optional; partial frontmatter is fine. Delete this
38
+ # block if you want the heuristic path only (advisory: precision drops).
39
+ id: "<NNN>" # spec number, string (YAML quirk: bare 042 → 34 octal)
40
+ title: <Feature Name>
41
+ risk: <low|medium|high> # informational, surfaced in run-task risk report
42
+
43
+ touches: # files this spec authorizes the executor to modify
44
+ - file: <src/area/file.ext>
45
+ language: <python|typescript|rust|csharp|go|java|php|cpp|...>
46
+ symbols: # mix of bare names + detailed objects allowed
47
+ - <symbol_name> # bare-name form
48
+ - name: <other_symbol>
49
+ signature: "<exact signature>" # `mmcg_search <name>` to capture
50
+ callers: <N> # `mmcg_callers <name>` count at snapshot time
51
+
52
+ verify: # PATH-checked at verify-spec; run by `run-task --exec`
53
+ - <label> # informational only (e.g. "typecheck")
54
+ - cmd: "<runnable command>" # the actual command, e.g. `npm test -- billing`
55
+
56
+ expected_docs: # doc files the spec promises to update — separate
57
+ - <README.md or path/to/doc.md> # from code touches so audit can flag misses
58
+
59
+ breaking_changes: # ack list for intentional removals
60
+ removed_symbols:
61
+ - <symbol_name> # bare-name form OR
62
+ - name: <other_symbol> # detailed object with file/reason
63
+ file: <path>
64
+ reason: "<one-line explanation>"
65
+ ---
66
+
67
+ # Task <NNN>: <Feature Name>
68
+
69
+ ## LLM Agent Directives
70
+
71
+ You are <doing X> to achieve <Y, the goal in one sentence>.
72
+
73
+ **Goals:**
74
+ 1. <Primary goal — what counts as done>
75
+ 2. <Secondary goal — optional>
76
+
77
+ **Rules (global):**
78
+ - DO NOT add features beyond what this spec lists (YAGNI)
79
+ - DO NOT refactor unrelated code (KISS)
80
+ - DO NOT introduce breaking changes to public APIs without explicit Non-breaking section saying so
81
+ - RUN `<project's typecheck command>` after each phase — must exit 0
82
+ - VERIFY no imports break (`mmcg_callers` count stays consistent on touched symbols)
83
+ - <Other project-specific globals>
84
+
85
+ **Critic findings baked into rules** *(if `mastermind-critic` was spawned — paste each `concern`/`fail` here as a hard rule; delete this block if no critic spawn):*
86
+ - <Caveat 1 from critic — concrete>
87
+ - <Caveat 2>
88
+
89
+ ---
90
+
91
+ ## Alternatives Considered *(MANDATORY for non-trivial work — at least 2 entries)*
92
+
93
+ The planner must enumerate ≥ 2 plausible approaches and explain why each was rejected. For trivial changes (one-line fix, doc edit, throwaway exploration), write "trivial change — single approach". The critic uses this section to avoid re-suggesting rejected options.
94
+
95
+ For green-field API / interface / module-boundary design, the planner may generate candidates via the [[api-shape-explorer]] prompt and paste its two unpicked options here.
96
+
97
+ - **<Alt 1 short name>** — rejected because <concrete reason tied to mmcg findings or project constraint>
98
+ - **<Alt 2 short name>** — rejected because <reason>
99
+ - **<Picked approach>** — chosen because <concrete reason>
100
+
101
+ ---
102
+
103
+ ## Pre-edit symbol snapshot *(filled by planner via mmcg — auditor uses to detect silent breakage)*
104
+
105
+ For each function / method this spec edits, planner records the current `mmcg_callers` count and signature so the auditor can compare post-execution. Delete this section if the spec doesn't touch any code symbols (pure doc / config change).
106
+
107
+ - `<symbol>` — <N> callers (via `mmcg_callers <symbol>`), signature `<sig>` (via `mmcg_search <symbol>`)
108
+ - `<another_symbol>` — <N> callers, signature `<sig>`
109
+
110
+ Auditor will re-run `mmcg_callers` / `mmcg_search` post-execution and surface any delta (gained / lost callers, signature change). A delta isn't automatically a fail, but it MUST be acknowledged in the verdict.
111
+
112
+ ---
113
+
114
+ ## Phase 1: <First logical step — name it by outcome, not process>
115
+
116
+ ### 1.1 <Specific action — one verb, one location>
117
+
118
+ **File:** `<src/path/to/file.ext>`
119
+
120
+ **Pre-edit check via mmcg** *(executor runs `mmcg_callers <symbol>` before editing):*
121
+ - Expected callers: ≤ <N> in scope (planner verified during pre-flight)
122
+ - If actual > expected: executor stops and reports
123
+
124
+ FIND:
125
+ ```<language>
126
+ <exact existing code — copy-paste from the file, whitespace-sensitive>
127
+ ```
128
+
129
+ CHANGE TO:
130
+ ```<language>
131
+ <exact new code>
132
+ ```
133
+
134
+ VERIFY: `<command that proves this change landed correctly>`
135
+
136
+ ### 1.2 <Next specific action>
137
+
138
+ **File:** `<another/path.ext>`
139
+
140
+ FIND / CHANGE TO / VERIFY — same pattern.
141
+
142
+ ---
143
+
144
+ ## Phase 2: <Next logical step>
145
+
146
+ <Same pattern as Phase 1.>
147
+
148
+ ---
149
+
150
+ ## Phase N: Final verification
151
+
152
+ RUN all of these. Each must pass:
153
+
154
+ ```bash
155
+ <typecheck command>
156
+ <lint command>
157
+ <test command — including new tests from Tests Plan below>
158
+ <smoke command if applicable>
159
+ ```
160
+
161
+ ---
162
+
163
+ ## Tests Plan *(MANDATORY — auditor verifies these were added)*
164
+
165
+ What tests cover the new behavior? Where do they live? For each:
166
+
167
+ - **<test name>** in `<test file path>` — covers <case>. Asserts <expected behavior>.
168
+ - **<test name>** — covers edge case <X>.
169
+
170
+ If a phase intentionally adds no new tests (e.g., pure refactor with existing coverage), say so explicitly here and justify: "Phase N: no new tests — refactor preserves behavior, existing `tests/foo_test.rs::test_bar` already covers."
171
+
172
+ The auditor will compare this list against `git diff --name-only` post-execution. Tests claimed → must appear in diff.
173
+
174
+ ---
175
+
176
+ ## Documentation Plan *(MANDATORY — auditor verifies these were updated)*
177
+
178
+ What docs need to change because of this work? Pick from:
179
+
180
+ - [ ] **API docs / docstrings** — `<file:line>` for `<symbol>` — explain the new param / return / behavior
181
+ - [ ] **User-facing README** — section `<section>` — note new feature / breaking change
182
+ - [ ] **CHANGELOG** — new entry under `[Unreleased]`
183
+ - [ ] **`CONTEXT.md`** updates — decision log entry / gotcha / glossary term (planner adds during Step 12)
184
+ - [ ] **`docs/`** — new or updated page at `<path>`
185
+ - [ ] **No external doc changes needed** — explain why (internal refactor, no behavior change)
186
+
187
+ The auditor checks each box claimed actually has a corresponding diff change.
188
+
189
+ ---
190
+
191
+ ## Observability Plan *(MANDATORY for code that runs in production)*
192
+
193
+ How will operators / on-call know if this code is working / broken?
194
+
195
+ - **What we'll see in success:** <log line / metric / trace span>
196
+ - **What we'll see on failure:** <error log / failure metric / span error attribute>
197
+ - **Health probes affected:** <none / `/healthz` updated / etc.>
198
+ - **Existing observability reused:** <yes — emits via existing `tracing::instrument` / metrics framework / etc.>
199
+
200
+ If this is internal code with no production runtime (dev tools, build scripts, tests): write "n/a — no production runtime" and skip.
201
+
202
+ For new production code paths with NO observability plan, the critic will flag dimension #3 as `fail`.
203
+
204
+ ---
205
+
206
+ ## Performance Considerations *(MANDATORY for hot-path or scale-sensitive code)*
207
+
208
+ If the code runs in a request path, in a tight loop, or on data that scales unboundedly:
209
+
210
+ - **Expected call frequency:** <one-time / per-request / per-second / per-item-in-stream>
211
+ - **Time complexity:** <O(1) / O(n in active sessions) / etc.>
212
+ - **Memory:** <allocates per call / reuses buffer / etc.>
213
+ - **Existing perf baseline:** <e.g., "mmcg_impact shows this function on the auth hot path">
214
+ - **Risks at scale:** <none / lock contention if > 100 req/sec / etc.>
215
+
216
+ If this is dev-time / cold-path code, write "n/a — not hot path" and skip. The critic uses this section for dimension #2.
217
+
218
+ ---
219
+
220
+ ## Checklist
221
+
222
+ The executor ticks `[ ]` → `[x]` as it completes each item. The auditor verifies each tick during post-flight.
223
+
224
+ ### Phase 1
225
+ - [ ] 1.1 — <action> done; `mmcg_callers` matched expectation pre-edit
226
+ - [ ] 1.2 — <action> done
227
+ - [ ] `<typecheck>` passes for Phase 1
228
+
229
+ ### Phase 2
230
+ - [ ] 2.1 — <action> done
231
+ - [ ] `<test>` passes for Phase 2
232
+
233
+ ### Phase N (final)
234
+ - [ ] All commands in Final verification passed
235
+ - [ ] No files changed outside the **File:** paths listed above
236
+ - [ ] Every test in Tests Plan appears in `git diff`
237
+ - [ ] Every doc in Documentation Plan appears in `git diff`
238
+
239
+ ---
240
+
241
+ ## Do NOT Do
242
+
243
+ Explicit anti-patterns specific to this task. Distinct from the global Rules above.
244
+
245
+ - Do NOT <X — a thing the executor might be tempted to do but must not>
246
+ - Do NOT <Y>
247
+ - <Specific anti-patterns surfaced by the critic — paste here if not absorbed into Rules>
248
+
249
+ ---
250
+
251
+ ## Notes (planner-only — executor ignores)
252
+
253
+ ### Pre-flight validation
254
+ *(Planner ticks each before showing this spec to the user.)*
255
+
256
+ - [ ] All `**File:**` paths exist in the working tree
257
+ - [ ] All named symbols verified via `mmcg_search`
258
+ - [ ] All `FIND:` blocks match current file contents (whitespace-sensitive)
259
+ - [ ] `mmcg_impact` on each symbol-to-be-changed agrees with this spec's stated scope
260
+ - [ ] `VERIFY:` commands look executable for this project
261
+ - [ ] **Alternatives Considered has ≥ 2 entries** (or "trivial change" justification)
262
+ - [ ] **Pre-edit symbol snapshot** filled via mmcg for every edited function/method (or section deleted if no code symbols touched)
263
+ - [ ] **Tests Plan is concrete** (per-test what's covered)
264
+ - [ ] **Documentation Plan** lists every doc touched
265
+ - [ ] **Observability Plan** addresses production runtime OR explicitly marked n/a
266
+ - [ ] **Performance Considerations** addresses hot/scale OR explicitly marked n/a
267
+
268
+ ### Design-time critic verdict
269
+ *(If `mastermind-critic` was spawned — paste the 7-dimension table here.)*
270
+
271
+ - **Spawn:** <YYYY-MM-DD HH:MM> — brief: <what was sent>
272
+ - **Dimension scores:** <copy the 7-row table from the critic output>
273
+ - **Aggregate verdict:** `<ship it | ship with caveats | revise | rethink>`
274
+ - **Planner's disagreements (if any):** <if planner overrode any critic finding, document why here>
275
+
276
+ ### CONTEXT.md updates from this task
277
+ *(Filled in by planner during Step 12 — what gets appended to project's CONTEXT.md.)*
278
+
279
+ - Decision log: <YYYY-MM-DD> — <decision name>
280
+ - Known gotchas: <one-line summary>
281
+ - (etc.)
282
+
283
+ ### Context links
284
+ - Spec author: <github-handle or "planner">
285
+ - Related issues / docs: <links>
286
+ - mmcg index version at spec time: `<output of mmcg_status>`