@xcraftmind/mastermind 0.23.1 → 0.25.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (26) hide show
  1. package/README.md +6 -4
  2. package/bin/mastermind.js +4 -0
  3. package/package.json +9 -8
  4. package/share/agents/mastermind-auditor.md +205 -0
  5. package/share/agents/mastermind-critic.md +222 -0
  6. package/share/agents/mastermind-prompt-refiner.md +70 -0
  7. package/share/agents/mastermind-release.md +442 -0
  8. package/share/agents/mastermind-researcher.md +167 -0
  9. package/share/agents/mastermind-task-executor.md +86 -0
  10. package/share/skills/doc-stub-sync/SKILL.md +187 -0
  11. package/share/skills/doc-stub-sync/references/error-handling.md +79 -0
  12. package/share/skills/doc-stub-sync/references/url-patterns.md +83 -0
  13. package/share/skills/doc-stub-sync/scripts/doc_update.py +285 -0
  14. package/share/skills/doc-stub-sync/scripts/requirements.txt +2 -0
  15. package/share/skills/flaky-finder/SKILL.md +75 -0
  16. package/share/skills/mastermind-incident-response/SKILL.md +157 -0
  17. package/share/skills/mastermind-incident-response/references/investigation-playbook.md +173 -0
  18. package/share/skills/mastermind-incident-response/references/postmortem-template.md +184 -0
  19. package/share/skills/mastermind-incident-response/references/triage-checklist.md +117 -0
  20. package/share/skills/mastermind-prompt-refiner/SKILL.md +157 -0
  21. package/share/skills/mastermind-prompt-refiner/references/refining-checklist.md +89 -0
  22. package/share/skills/mastermind-prompt-refiner/references/techniques.md +143 -0
  23. package/share/skills/mastermind-task-executor/SKILL.md +154 -0
  24. package/share/skills/mastermind-task-planning/SKILL.md +337 -0
  25. package/share/skills/mastermind-task-planning/references/spec-template.md +286 -0
  26. package/share/skills/pr-review/SKILL.md +89 -0
@@ -0,0 +1,117 @@
1
+ # Triage checklist — first 5 minutes
2
+
3
+ Reference for the [`mastermind-incident-response`](../SKILL.md) skill, Phase 1. Run through these questions / commands in parallel with talking to the user. Goal: enough information to choose a mitigation in Phase 2.
4
+
5
+ ---
6
+
7
+ ## What to ask the user (priority order)
8
+
9
+ 1. **Symptom — what's observable?**
10
+ - "What error are you / users seeing?"
11
+ - Paste actual error message, not a paraphrase
12
+ - Single sentence, ideally
13
+ - Example: `❌ "the app is slow"` → `✓ "/api/messages returning 500 with 'connection refused' since 14:32"`
14
+
15
+ 2. **Scope**
16
+ - "How many users / what % of traffic / which surfaces?"
17
+ - Distinguish: 1 user vs 1 region vs everyone
18
+ - If unclear → ask: do we know yet?
19
+
20
+ 3. **Severity classification** (pick one; if unclear, default to one level higher):
21
+ - **sev0** — total outage / safety / data loss; immediate
22
+ - **sev1** — major surface broken; act within minutes
23
+ - **sev2** — partial degradation; act within hours
24
+ - **sev3** — cosmetic / single user; act within days
25
+ - Severity drives whether to ask "should we page?" before doing anything else
26
+
27
+ 4. **Timeline — when did this start?**
28
+ - "What's the first timestamp you have?"
29
+ - Convert to UTC immediately, write it down
30
+ - This is what you'll correlate against deploys / git log
31
+
32
+ 5. **What's been tried?**
33
+ - Avoid duplicate effort
34
+ - If user already rolled back / restarted / flipped a flag — note it
35
+ - **Critical:** if a previous action made things WORSE, the next action shouldn't be the same kind
36
+
37
+ ---
38
+
39
+ ## What to gather in parallel (mmcg / git / files)
40
+
41
+ Run these via `mastermind-researcher` subagent or directly — should take < 1 minute:
42
+
43
+ ```bash
44
+ # What committed recently (might have shipped the issue)
45
+ git log --since='2 hours ago' --oneline
46
+
47
+ # What's deployed (if there's a deploy marker file or git tag)
48
+ git tag --sort=-creatordate | head -5
49
+ git log -10 --oneline
50
+
51
+ # What specs were finished recently
52
+ ls -lt .mastermind/tasks/ | head -10
53
+
54
+ # Are there in-progress specs that might be related?
55
+ git status -s
56
+
57
+ # Is the index reachable (am I working from stale info?)
58
+ mmcg_status
59
+
60
+ # Quick scan for the symptom — has this been seen before?
61
+ grep -i "<error string>" CONTEXT.md 2>/dev/null
62
+ ```
63
+
64
+ ---
65
+
66
+ ## Severity-driven branching
67
+
68
+ After triage, the severity determines what comes next:
69
+
70
+ | Severity | What you do next |
71
+ |---|---|
72
+ | **sev0** | Ask user: "should we page on-call before continuing?" Then go to Phase 2 with the most conservative mitigation. |
73
+ | **sev1** | Go directly to Phase 2. Watch the clock — if no improvement in 10 min, escalate. |
74
+ | **sev2** | Phase 2 with more deliberation — rollback is still preferred if available. |
75
+ | **sev3** | Skip Phase 2's "stop bleeding" urgency. Go to Phase 3 investigation. The postmortem (Phase 4) may even be lightweight (a CONTEXT.md gotcha entry, not a full doc). |
76
+
77
+ ---
78
+
79
+ ## What to write down (timeline starts now)
80
+
81
+ For the postmortem later, you'll need a timeline. Start it during triage:
82
+
83
+ ```
84
+ 14:32 UTC — first failure observed (per <source>)
85
+ 14:36 UTC — user reported in #ops
86
+ 14:38 UTC — incident response engaged
87
+ 14:39 UTC — triage complete; sev1 declared; investigating recent deploys
88
+ ...
89
+ ```
90
+
91
+ Even rough timestamps are useful. The postmortem can refine them from logs later, but having any timeline beats reconstructing from memory.
92
+
93
+ ---
94
+
95
+ ## Anti-patterns during triage
96
+
97
+ - **Don't speculate about root cause yet.** Triage answers WHAT and WHEN, not WHY. WHY comes in Phase 3.
98
+ - **Don't write a fix yet.** Even if you're sure you know what's wrong, Phase 2 prefers rollback over hot patches.
99
+ - **Don't change scope unilaterally.** "While I'm here let me also fix X" is exactly the wrong instinct under time pressure.
100
+ - **Don't blame the deploy author** — the system allowed the bad deploy; that's the systemic issue, not the human.
101
+
102
+ ---
103
+
104
+ ## When to call this phase done
105
+
106
+ You're done with triage when you can answer all five:
107
+ 1. ✓ What's the symptom? (concrete)
108
+ 2. ✓ Who/what is affected? (scope)
109
+ 3. ✓ Severity?
110
+ 4. ✓ When did it start?
111
+ 5. ✓ What's been tried?
112
+
113
+ …and you have either:
114
+ - A candidate mitigation in mind, OR
115
+ - An explicit "I don't know yet, going to Phase 3 first" decision
116
+
117
+ Move to Phase 2.
@@ -0,0 +1,157 @@
1
+ ---
2
+ name: mastermind-prompt-refiner
3
+ description: Refines a user's rough, vague, or under-specified prompt into a clean, executable one before handing it off to another agent or skill. Use as a front-stage filter in delegation workflows, or when the user says "improve this prompt", "rewrite this prompt for an agent", "make this clearer".
4
+ metadata:
5
+ version: 0.1.0
6
+ authors:
7
+ - mastermind
8
+ tags:
9
+ - prompt-engineering
10
+ - workflow
11
+ model: sonnet
12
+ ---
13
+
14
+ # Prompt Refiner
15
+
16
+ Sits between the user and a downstream agent (planner, executor, reviewer, …) and rewrites the raw user input into a refined prompt. The downstream agent sees the refined version, not the user's brain dump.
17
+
18
+ This is a **one-pass** skill: input goes in, refined prompt comes out. Not a tutorial on prompt engineering, not a general-purpose advisor. If the user wants to learn prompt engineering, point them at [`references/techniques.md`](references/techniques.md) instead.
19
+
20
+ ## When to use
21
+
22
+ - User's request is rough, vague, missing context, or bundles multiple intents
23
+ - About to spawn a subagent and you want the input to be a clean prompt, not a raw idea
24
+ - User explicitly asks "improve this prompt", "rewrite this prompt", "make this clearer for an agent"
25
+ - Do NOT use to refine prompts that the user wants *you* to answer right now — only when the next step is another agent/skill. If they want an answer, just answer.
26
+
27
+ ## Process
28
+
29
+ ### 1. Read the input. Identify three things.
30
+ - **Goal** — what does the user actually want to accomplish?
31
+ - **Next consumer** — who reads the refined prompt next? (planner / executor / reviewer / unspecified)
32
+ - **Gaps** — what's vague, missing, or contradictory?
33
+
34
+ ### 2. Decide: refine inline, or ask first?
35
+
36
+ | Situation | What to do |
37
+ |---|---|
38
+ | Goal is clear, 1-3 small gaps (format, length, edge case) | Refine inline. Mark unresolvable gaps with `<NEEDS: …>` placeholders. |
39
+ | Goal itself is ambiguous (multiple plausible interpretations → different prompts) | Ask 1-3 targeted questions. Stop. Do not guess. |
40
+ | Prompt is already tight | Return the original with a one-line "no changes needed". |
41
+
42
+ Do not stack: max 3 clarifying questions, max one refinement pass per call.
43
+
44
+ ### 3. Apply the refinement.
45
+
46
+ Walk the [refining checklist](references/refining-checklist.md) and pick the smallest set of fixes that close the gaps. Common fixes (full list in the checklist):
47
+
48
+ - Lead with a specific verb tied to a deliverable
49
+ - State the output shape (length, structure, format)
50
+ - Surface 1-2 constraints (what NOT to do)
51
+ - Add a success criterion
52
+ - Replace hardcoded values with `{{PLACEHOLDERS}}` if the prompt will be reused
53
+
54
+ For technique-level decisions (when to add CoT, few-shot, XML structure, role framing), see [`references/techniques.md`](references/techniques.md). Pick the smallest set — don't stack techniques to show off.
55
+
56
+ ### 4. Hand off.
57
+
58
+ Output in this exact shape. The spawner copies the `## Refined prompt` block into the next agent's input:
59
+
60
+ ```markdown
61
+ ## Refined prompt
62
+
63
+ <the rewritten prompt, ready to paste verbatim into the next agent>
64
+
65
+ ## What I changed and why
66
+
67
+ - <change 1> — <reason>
68
+ - <change 2> — <reason>
69
+
70
+ ## Gaps the user still needs to fill
71
+
72
+ - <NEEDS: gap 1>
73
+ - <NEEDS: gap 2>
74
+ ```
75
+
76
+ Omit the "Gaps" section entirely if there are none.
77
+
78
+ ## What you do NOT do
79
+
80
+ - Add scope or features the user didn't ask for
81
+ - Invent details (deadlines, file paths, names, numbers) — mark with `<NEEDS:>`
82
+ - Refine prompts that are already tight ("no changes needed" is a valid output)
83
+ - Stack multiple refinement passes in one call
84
+ - Execute the prompt — that's the next agent's job
85
+ - Critique the user's writing style — only fix what affects machine consumption
86
+
87
+ ## Output examples
88
+
89
+ ### Vague request → refined
90
+
91
+ **Input:** "help me write a better prompt for analyzing customer feedback"
92
+
93
+ **Output:**
94
+ ```markdown
95
+ ## Refined prompt
96
+
97
+ You are analyzing customer feedback messages. For each message:
98
+ 1. Classify sentiment as `positive`, `negative`, or `neutral`.
99
+ 2. Extract up to 3 themes (1-3 words each).
100
+ 3. Extract up to 3 actionable issues (short imperative phrases).
101
+
102
+ Output JSON:
103
+ {"sentiment": "...", "themes": [...], "actions": [...]}
104
+
105
+ Constraints: handle 50-500 word messages. If the message is empty or off-topic, return {"sentiment": "neutral", "themes": [], "actions": []}.
106
+
107
+ Feedback message:
108
+ <NEEDS: actual message text>
109
+
110
+ ## What I changed and why
111
+
112
+ - Added concrete classification labels — original "analyze" was undefined
113
+ - Added JSON output shape — downstream tools need parseable output
114
+ - Added empty-input handling — production prompts must define edge behavior
115
+
116
+ ## Gaps the user still needs to fill
117
+
118
+ - <NEEDS: actual feedback message to analyze>
119
+ ```
120
+
121
+ ### Already-tight prompt → passthrough
122
+
123
+ **Input:** A 200-word prompt with role, format, constraints, and success criterion already in place.
124
+
125
+ **Output:**
126
+ ```markdown
127
+ ## Refined prompt
128
+
129
+ <original prompt verbatim>
130
+
131
+ ## What I changed and why
132
+
133
+ No changes needed — prompt has clear role, format, constraints, and success criterion.
134
+ ```
135
+
136
+ ### Ambiguous goal → ask
137
+
138
+ **Input:** "improve my onboarding"
139
+
140
+ **Output:**
141
+ ```markdown
142
+ I need 2 clarifications before I can refine this:
143
+
144
+ 1. Improve onboarding for whom — end users, new employees, API consumers, something else?
145
+ 2. What's the next consumer of the refined prompt — are you handing this to a planner agent to scope work, or to an executor to write code?
146
+
147
+ (Optional) Anything you already know is broken about the current onboarding?
148
+ ```
149
+
150
+ ## References
151
+
152
+ - [`references/techniques.md`](references/techniques.md) — when to apply CoT, few-shot, XML, role-based, prefilling, chaining
153
+ - [`references/refining-checklist.md`](references/refining-checklist.md) — anti-patterns to fix + before/after examples
154
+
155
+ ## Pair pieces
156
+
157
+ The runtime companion is the `mastermind-prompt-refiner` subagent. Mounted as an optional preprocessor in the `mastermind-workflow` CLAUDE.md.
@@ -0,0 +1,89 @@
1
+ # Refining checklist — what to look for, how to fix it
2
+
3
+ Reference for the [`mastermind-prompt-refiner`](../SKILL.md) skill. When refining a prompt, walk this list and apply the matching fix.
4
+
5
+ ---
6
+
7
+ ## Anti-patterns and fixes
8
+
9
+ | Smell | Why it bites | Fix |
10
+ |---|---|---|
11
+ | Vague verb (`help me with X`, `look at this`, `analyze`) | Model picks an arbitrary verb that may not match user's intent | Replace with a specific verb tied to the deliverable: `classify`, `extract`, `summarize in 3 bullets`, `produce a JSON with fields …` |
12
+ | No format specified | Output is unparseable / inconsistent across runs | State the output shape: length, structure, fields, schema |
13
+ | Multiple bundled intents (`analyze, then summarize, then recommend`) | Model rushes through each, none gets full attention | Split into a chain, or pick the primary intent and drop the rest |
14
+ | Hardcoded values that should vary (`for Q3 sales`) | Prompt isn't reusable; breaks next time | Replace with `{{PLACEHOLDER}}` and document the variable |
15
+ | No success criterion | Reviewer can't tell if output is good | Add one sentence: "Good output has X, Y, Z." or "Score ≥ N." |
16
+ | Over-constrained (10+ rules) | Constraints contradict, model becomes timid | Cut to the 3 that actually matter. Move the rest to "soft preferences" or drop. |
17
+ | Contradicts itself (`be detailed but keep it short`) | Model picks one and ignores the other | Surface the contradiction and ask the user, do not guess |
18
+ | Asks for hallucination (`predict what will happen`) | Model invents confident wrong answers | Reframe: "Give 2-3 plausible scenarios. For each: assumptions, expected outcome, confidence (high/med/low), what would invalidate it." |
19
+ | Encourages refusal (`How do I manipulate people?`) | Model refuses or hedges, real intent unmet | Clarify legitimate use case in the prompt itself |
20
+ | Wall of context, real task buried in the middle | Middle gets skipped; output ignores the task | Move task to the start AND restate at the end; long context goes in between |
21
+
22
+ ---
23
+
24
+ ## Before / after
25
+
26
+ ### Vague → specific deliverable
27
+ **Before:** "Help me write a better prompt for analyzing feedback."
28
+ **After:** "Refine this prompt so it classifies feedback as positive/negative/neutral, extracts up to 3 themes, and outputs JSON `{sentiment, themes[], actions[]}`. Handles 50-500 word messages. Original prompt: <…>"
29
+
30
+ ### Bundled → chained
31
+ **Before:** "Analyze this data, summarize it, and give me recommendations."
32
+ **After:**
33
+ - Stage 1: "Extract key metrics from the data as a markdown table."
34
+ - Stage 2: "Given the table from Stage 1, identify the top 3 trends with supporting numbers."
35
+ - Stage 3: "Given the trends, recommend up to 3 actions, each with expected impact and effort."
36
+
37
+ ### No format → exact format
38
+ **Before:** "List the top products."
39
+ **After:** "List the top 5 products as JSON: `[{"name": string, "revenue_usd": int, "growth_pct": float}]`. Sort by `revenue_usd` desc."
40
+
41
+ ### Encourages hallucination → calibrates honestly
42
+ **Before:** "What will the market do next year?"
43
+ **After:** "Based on the data provided, give 2-3 plausible scenarios for the market next year. For each: key assumptions, expected outcome, confidence (high/med/low), what would invalidate the scenario."
44
+
45
+ ### Hardcoded → parameterized
46
+ **Before:** "Analyze Q3 2025 sales for the EMEA region."
47
+ **After:** "Analyze {{PERIOD}} sales for the {{REGION}} region." plus a Variables table documenting both placeholders.
48
+
49
+ ### Wall of context → instruction-sandwiched
50
+ **Before:**
51
+ ```
52
+ <3000 lines of code>
53
+ What's wrong with this?
54
+ ```
55
+ **After:**
56
+ ```
57
+ You are reviewing the following code for correctness, security, and design issues. Report findings in priority order: must-fix, should-fix, consider.
58
+
59
+ <3000 lines of code>
60
+
61
+ Now produce the review of the code above. Use the priority categories described.
62
+ ```
63
+
64
+ ---
65
+
66
+ ## Decision tree: refine inline or ask first?
67
+
68
+ ```
69
+ Is the user's goal clear?
70
+ ├─ NO → Ask 1-3 targeted clarifying questions. STOP. Don't refine yet.
71
+ └─ YES → Are there <= 3 small gaps (format / edge case / minor constraint)?
72
+ ├─ YES → Refine inline. Mark unresolvable gaps with <NEEDS:>.
73
+ └─ NO → Ask 1-3 targeted questions about the biggest gaps. STOP.
74
+
75
+ Is the prompt already tight (verb + format + constraint + success criterion)?
76
+ └─ YES → Output unchanged. "No changes needed."
77
+ ```
78
+
79
+ ### Good clarifying questions
80
+
81
+ - **Specific.** "Improve onboarding for whom — end users, new employees, API consumers?" beats "What kind of onboarding?"
82
+ - **Decision-bearing.** The question's answer must change the refined prompt. Don't ask cosmetic questions.
83
+ - **Limited.** 1-3 questions. Wall-of-questions = give up and refine with `<NEEDS:>` markers instead.
84
+
85
+ ### Bad clarifying questions
86
+
87
+ - "Can you tell me more about your use case?" (too vague)
88
+ - "What's the context?" (no decision attached)
89
+ - 5+ questions at once (the user came to be helped, not interrogated)
@@ -0,0 +1,143 @@
1
+ # Refinement techniques — when to reach for each
2
+
3
+ Reference for the [`mastermind-prompt-refiner`](../SKILL.md) skill. Apply the smallest set of techniques that closes the gaps in the input prompt — do not stack them just because they exist.
4
+
5
+ ---
6
+
7
+ ## Chain-of-Thought (CoT)
8
+
9
+ **Reach for it when:** the task needs multi-step reasoning that models otherwise short-circuit. Math, multi-clause logic, comparative analysis.
10
+
11
+ **How:** add an explicit step list before the answer.
12
+
13
+ ```
14
+ Work through this step by step:
15
+ 1. Identify <X>
16
+ 2. Compute <Y> from X
17
+ 3. Cross-check against <Z>
18
+ Then state the final conclusion in one sentence.
19
+ ```
20
+
21
+ **Skip when:** the task is single-shot retrieval, classification, or formatting. CoT adds latency and tokens without value.
22
+
23
+ ---
24
+
25
+ ## Few-shot examples
26
+
27
+ **Reach for it when:** the output format keeps drifting, or the task pattern is hard to describe abstractly but easy to show.
28
+
29
+ **How:**
30
+ - **1 example** — simple, clear-format tasks
31
+ - **2-3 examples** — moderately complex patterns or formats
32
+ - **5+ examples** — genuine class-diverse tasks (classification with many classes, edge-heavy parsing)
33
+
34
+ Examples should:
35
+ - Cover the format you want exactly
36
+ - Include at least one edge case (empty, malformed, boundary)
37
+ - Use realistic content. No `foo`/`bar`.
38
+
39
+ **Skip when:** the format is already specified explicitly and the model is following it.
40
+
41
+ ---
42
+
43
+ ## XML tags for structure
44
+
45
+ **Reach for it when:** the prompt has 3+ distinct sections (task, context, constraints, format, examples) and they're getting confused.
46
+
47
+ **How:**
48
+ ```xml
49
+ <task>...</task>
50
+ <context>...</context>
51
+ <constraints>...</constraints>
52
+ <format>...</format>
53
+ ```
54
+
55
+ **Skip when:** the prompt is short. XML adds noise to short prompts; helpful in long ones.
56
+
57
+ ---
58
+
59
+ ## Role-based framing
60
+
61
+ **Reach for it when:** the model's default style doesn't match the task. Wants engineering rigor, getting marketing fluff. Wants concise output, getting verbose output.
62
+
63
+ **How:**
64
+ ```
65
+ You are a <role> with expertise in <domain>. Your priorities, in order:
66
+ 1. <priority 1>
67
+ 2. <priority 2>
68
+ 3. <priority 3>
69
+ ```
70
+
71
+ Write `you are`, not `act as if you were`. Direct framing works better than hypothetical.
72
+
73
+ **Skip when:** the task is mechanical (classify, extract, format) and tone doesn't matter.
74
+
75
+ ---
76
+
77
+ ## Prefilling
78
+
79
+ **Reach for it when:** the output format is rigid and the model keeps adding preamble or deviating.
80
+
81
+ **How:** end the prompt with the literal beginning of the desired output.
82
+
83
+ ```
84
+ Output:
85
+
86
+ {
87
+ "result":
88
+ ```
89
+
90
+ The model continues from there. Works especially well for JSON / structured output.
91
+
92
+ **Skip when:** the output is free-form prose where you want the model to choose framing.
93
+
94
+ ---
95
+
96
+ ## Prompt chaining
97
+
98
+ **Reach for it when:** the task has clearly separate stages whose intermediate outputs you want to inspect or transform.
99
+
100
+ **How:** split into N prompts where output N feeds input N+1. Examples:
101
+ - extract → analyze → summarize
102
+ - classify → route → respond
103
+ - draft → critique → revise
104
+
105
+ **The refiner's job is to *identify* when chaining is appropriate** — not to do the splitting. If splitting is needed, the refiner says so in "What I changed and why" and the spawner sets up the chain.
106
+
107
+ **Skip when:** the task is genuinely single-step. Premature chaining costs latency.
108
+
109
+ ---
110
+
111
+ ## Context window discipline
112
+
113
+ **Always:** put the most important instruction last, just before the output. Models attend most strongly to the start and end of the prompt; middle content can be skipped.
114
+
115
+ **Always:** if the prompt includes a long document (data, code, examples), put it in the middle, between the instruction and a brief restatement of what to do with it.
116
+
117
+ ```
118
+ <task instruction>
119
+
120
+ <long document>
121
+
122
+ Now, given the document above, <restate the specific task>.
123
+ ```
124
+
125
+ ---
126
+
127
+ ## Combining techniques
128
+
129
+ Order matters. A typical refined prompt is:
130
+
131
+ ```
132
+ <role framing — if needed>
133
+ <task statement, with the verb leading>
134
+ <context / inputs>
135
+ <step structure — if CoT needed>
136
+ <examples — if few-shot needed>
137
+ <format spec>
138
+ <constraints (what NOT to do)>
139
+ <final restatement of the task>
140
+ <prefill — if rigid format needed>
141
+ ```
142
+
143
+ Not every refinement needs every layer. A 3-line prompt with the right verb and format spec beats a 50-line prompt with every technique applied.
@@ -0,0 +1,154 @@
1
+ ---
2
+ name: mastermind-task-executor
3
+ description: Executes a task spec from `.mastermind/tasks/XXX-*.md` phase-by-phase — applies FIND/CHANGE TO edits, runs VERIFY commands, marks the checklist, stops on first failure. Use when the user says "execute task X", "run .mastermind/tasks/NNN", or hands off a delegation spec.
4
+ metadata:
5
+ version: 0.2.0
6
+ authors:
7
+ - mastermind
8
+ tags:
9
+ - workflow
10
+ - execution
11
+ - delegation
12
+ - mmcg
13
+ model: sonnet
14
+ ---
15
+
16
+ # Mastermind - Task Executor Skill
17
+
18
+ You are in Executor mode. Someone (the planner — see [[mastermind-task-planning]]) wrote a spec in `.mastermind/tasks/XXX-*.md`. Your job is to execute it exactly as written. You do not improvise, you do not add features, you do not refactor anything the spec doesn't tell you to refactor.
19
+
20
+ ## When to Activate
21
+
22
+ - User says "execute task X" or "run task X"
23
+ - User says "execute .mastermind/tasks/NNN-...md"
24
+ - User hands off a task spec for implementation
25
+ - A planner subagent spawned you with a task path
26
+
27
+ ## Your Role
28
+
29
+ 1. Read the spec end-to-end before touching code
30
+ 2. Follow phases in order — do not reorder, do not skip
31
+ 3. Run VERIFY commands after each step that has one
32
+ 4. Check off `[ ]` → `[x]` in the checklist as you go
33
+ 5. Stop and report at the first failure — do not "fix it up"
34
+
35
+ ## What You Do NOT Do
36
+
37
+ - Add features the spec doesn't list
38
+ - Refactor unrelated code "while you're in there"
39
+ - Skip VERIFY commands because "it looks fine"
40
+ - Change the spec — if the spec is wrong, stop and ask
41
+ - Mark a checklist item complete without running its VERIFY
42
+
43
+ ## Process
44
+
45
+ ### Step 1 — Read the whole spec first
46
+
47
+ Open `.mastermind/tasks/<spec-name>.md` and read it top to bottom **before editing anything**. Pay attention to:
48
+
49
+ - **LLM Agent Directives** — the framing. What are you doing, why, with what rules?
50
+ - **Goals** — what counts as done
51
+ - **Rules** — what's forbidden globally
52
+ - **Do NOT Do** — anti-patterns specific to this task
53
+ - **Phase count** — so you know how long this is going to take
54
+
55
+ If the spec contradicts itself, or a phase depends on something not in the project, **stop and ask the planner**. Do not guess.
56
+
57
+ ### Step 2 — Execute phase by phase
58
+
59
+ For each Phase:
60
+
61
+ 1. Read the phase header and its sub-steps (`1.1`, `1.2`, …).
62
+ 2. For each sub-step:
63
+ - **Pre-edit check via mmcg** (if editing a named function/method). Call `mmcg_callers` on the symbol you're about to change. Record the count. If the count is much larger than what the spec's "Goals" implied, **stop and report** — the spec underestimated blast radius. If it matches expectation, proceed.
64
+ - Open the **File** named in the step.
65
+ - Locate the `FIND:` block in the file. If it doesn't match exactly, stop and report — do not approximate.
66
+ - Replace it with the `CHANGE TO:` block.
67
+ - Run the `VERIFY:` command if present.
68
+ - If VERIFY fails: stop, report, do not proceed.
69
+ 3. After all sub-steps in the phase, mark every `[ ]` in the phase's checklist section that's now done.
70
+
71
+ ### When mmcg is unavailable
72
+
73
+ If `mmcg_status` returns nothing (no index, or the MCP server isn't connected), report this once at the start of execution and proceed with `Grep`-based callers checks as a fallback. Document the fallback in your report so the reviewer knows the pre-edit check was approximate.
74
+
75
+ ### Step 3 — Final verification
76
+
77
+ The last phase usually has a block of commands like:
78
+
79
+ ```bash
80
+ bun run typecheck
81
+ bun run dev
82
+ ```
83
+
84
+ Run **all** of them. Each must pass. If any fails, report — do not consider the task done.
85
+
86
+ ### Step 4 — Report
87
+
88
+ Output a report in this exact shape:
89
+
90
+ ```markdown
91
+ ## Task <XXX> — execution report
92
+
93
+ **Spec:** `.mastermind/tasks/<spec-name>.md`
94
+ **Status:** ✅ complete | ⚠️ partial | ❌ failed
95
+
96
+ ### Phases completed
97
+ - [x] Phase 1: …
98
+ - [x] Phase 2: …
99
+ - [ ] Phase 3: … (stopped here)
100
+
101
+ ### Verification results
102
+ - `<command>` → passed | failed: <error>
103
+ - …
104
+
105
+ ### Pre-edit blast radius (from mmcg_callers)
106
+ - `function_name` → N callers (expected ≤M per spec scope) — ✓ within scope
107
+ - `other_fn` → N callers (expected: documented in spec) — ✓
108
+ - (omit this section if mmcg was unavailable; note the fallback instead)
109
+
110
+ ### Files modified
111
+ - `path/to/file.ts` (Phase 1.1, 1.3)
112
+ - `path/to/other.ts` (Phase 2.2)
113
+
114
+ ### Stopped because (if not complete)
115
+ <Concrete reason: which FIND didn't match, which VERIFY failed, which contradiction surfaced. Quote the exact error.>
116
+
117
+ ### What I did NOT do
118
+ <Anything you noticed but didn't fix because it was out of scope. Hand back to the planner — they decide whether to add a follow-up task.>
119
+ ```
120
+
121
+ ## Failure modes — and how to handle them
122
+
123
+ | Situation | What to do |
124
+ |---|---|
125
+ | `FIND:` block doesn't match the file (whitespace, prior edit, drift) | **Stop.** Report the diff between expected and actual. Do not fuzzy-match. |
126
+ | `VERIFY:` command fails | **Stop.** Quote the error output verbatim. Do not retry with modifications. |
127
+ | Phase depends on a file that doesn't exist | **Stop.** Report which path is missing. The spec is wrong; planner fixes it. |
128
+ | You spot a bug in unrelated code | **Note it in "What I did NOT do".** Do not fix. The planner decides. |
129
+ | You think the spec's approach is suboptimal | **Execute it anyway, then note your concern in the report.** You're the executor, not the planner. |
130
+
131
+ The principle: **specs are contracts**. If something is wrong with the spec, surface it and stop. Don't paper over it.
132
+
133
+ ## Workflow
134
+
135
+ ```
136
+ Receive task path
137
+
138
+ Read entire spec
139
+
140
+ For each Phase:
141
+ Execute sub-steps in order
142
+ Run VERIFY after each
143
+ Mark checklist
144
+ ↓ (stop on any failure)
145
+ Final verification block
146
+
147
+ Write report
148
+
149
+ Return to user/planner
150
+ ```
151
+
152
+ ## Pair Skill
153
+
154
+ The spec you're executing was written by [[mastermind-task-planning]]. Together they form the Mastermind workflow: planner plans, you implement, planner reviews.