@xcraftmind/mastermind 0.24.0 → 0.26.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (27) hide show
  1. package/README.md +6 -4
  2. package/bin/mastermind.js +4 -0
  3. package/package.json +9 -8
  4. package/share/agents/mastermind-auditor.md +205 -0
  5. package/share/agents/mastermind-critic.md +222 -0
  6. package/share/agents/mastermind-prompt-refiner.md +70 -0
  7. package/share/agents/mastermind-release.md +442 -0
  8. package/share/agents/mastermind-researcher.md +167 -0
  9. package/share/agents/mastermind-task-executor.md +86 -0
  10. package/share/commands/api-shape-explorer.md +107 -0
  11. package/share/skills/doc-stub-sync/SKILL.md +187 -0
  12. package/share/skills/doc-stub-sync/references/error-handling.md +79 -0
  13. package/share/skills/doc-stub-sync/references/url-patterns.md +83 -0
  14. package/share/skills/doc-stub-sync/scripts/doc_update.py +285 -0
  15. package/share/skills/doc-stub-sync/scripts/requirements.txt +2 -0
  16. package/share/skills/flaky-finder/SKILL.md +75 -0
  17. package/share/skills/mastermind-incident-response/SKILL.md +157 -0
  18. package/share/skills/mastermind-incident-response/references/investigation-playbook.md +173 -0
  19. package/share/skills/mastermind-incident-response/references/postmortem-template.md +184 -0
  20. package/share/skills/mastermind-incident-response/references/triage-checklist.md +117 -0
  21. package/share/skills/mastermind-prompt-refiner/SKILL.md +157 -0
  22. package/share/skills/mastermind-prompt-refiner/references/refining-checklist.md +89 -0
  23. package/share/skills/mastermind-prompt-refiner/references/techniques.md +143 -0
  24. package/share/skills/mastermind-task-executor/SKILL.md +154 -0
  25. package/share/skills/mastermind-task-planning/SKILL.md +337 -0
  26. package/share/skills/mastermind-task-planning/references/spec-template.md +286 -0
  27. package/share/skills/pr-review/SKILL.md +89 -0
@@ -0,0 +1,173 @@
1
+ # Investigation playbook — find root cause via mmcg + git + .mastermind/tasks/
2
+
3
+ Reference for the [`mastermind-incident-response`](../SKILL.md) skill, Phase 3. After symptoms have stopped, find what actually broke.
4
+
5
+ The patterns below are concrete recipes. Use them when the corresponding question comes up — don't run all of them speculatively (wastes context).
6
+
7
+ ---
8
+
9
+ ## Question 1 — "What changed recently?"
10
+
11
+ Most production incidents trace to a recent change. Start here.
12
+
13
+ ```bash
14
+ # What was committed in the window when the incident started?
15
+ git log --since='2 hours ago' --until='now' --oneline
16
+
17
+ # What was committed in the file/dir we suspect?
18
+ git log -20 --oneline -- <suspected/path/>
19
+
20
+ # What did the most recent commits actually change?
21
+ git log -5 -p --stat -- <suspected/path/>
22
+ ```
23
+
24
+ Then for any candidate commit:
25
+ ```bash
26
+ git show <commit-sha> --stat # what files
27
+ git show <commit-sha> # full diff
28
+ ```
29
+
30
+ **Heuristic for ranking suspect commits:**
31
+ - Most recent first
32
+ - Bigger diffs first (more surface to have bugs)
33
+ - Commits to "interesting" paths (hot paths, recently-incident-prone dirs)
34
+ - Commits that touch the symptom's component (grep error message in commit diff)
35
+
36
+ ---
37
+
38
+ ## Question 2 — "What's the blast radius of the change?"
39
+
40
+ If you have a suspect commit, what does it touch that could explain the symptom?
41
+
42
+ ```bash
43
+ # What symbols changed in the suspect commit?
44
+ git show <commit-sha> --name-only
45
+
46
+ # For each changed function/method, what calls it?
47
+ mmcg_callers <symbol> --language <lang>
48
+
49
+ # Transitive — what else depends on it?
50
+ mmcg_impact <symbol> --depth 3 --language <lang>
51
+ ```
52
+
53
+ If the blast radius doesn't include the symptom's component → the suspect commit probably isn't the cause. Move to the next candidate.
54
+
55
+ If the blast radius DOES include the symptom's component → strong candidate. Read the code change.
56
+
57
+ ---
58
+
59
+ ## Question 3 — "Were the relevant specs supposed to catch this?"
60
+
61
+ Spec for any recent change should have had Tests Plan + Observability Plan + Performance Considerations sections (per spec template).
62
+
63
+ ```bash
64
+ # Find the spec for this work (look in .mastermind/tasks/ for matching name or timestamp)
65
+ ls -lt .mastermind/tasks/
66
+
67
+ # Read its Tests Plan
68
+ grep -A 20 "Tests Plan" .mastermind/tasks/<NNN>-<name>.md
69
+
70
+ # Read its Observability Plan
71
+ grep -A 10 "Observability Plan" .mastermind/tasks/<NNN>-<name>.md
72
+
73
+ # Read its Performance Considerations
74
+ grep -A 10 "Performance Considerations" .mastermind/tasks/<NNN>-<name>.md
75
+ ```
76
+
77
+ Then ask:
78
+
79
+ - **Did the Tests Plan cover this failure mode?** If no, that's a gap in spec quality. Action item: "harden Tests Plan for similar work."
80
+ - **Did Observability fire for this failure?** If no, was it instrumented? If yes, did it page? Detection time is its own failure to root-cause.
81
+ - **Did Performance Considerations anticipate this load?** If the issue is scale-related, the spec should have predicted it.
82
+
83
+ These answers feed the postmortem's "Why detection took N minutes" and "Workflow improvements" sections.
84
+
85
+ ---
86
+
87
+ ## Question 4 — "Has this happened before?"
88
+
89
+ ```bash
90
+ # Check known gotchas
91
+ grep -B 2 -A 4 -i "<symptom keywords>" CONTEXT.md
92
+
93
+ # Check don't-touch list
94
+ grep -B 2 -A 4 "<file or path>" CONTEXT.md
95
+
96
+ # Check past postmortems (if you keep them)
97
+ ls postmortems/ 2>/dev/null || ls docs/postmortems/ 2>/dev/null
98
+ grep -ri "<symptom>" postmortems/ 2>/dev/null
99
+ ```
100
+
101
+ If yes → this is a **recurrence**. That's a MUCH bigger finding than a first-time incident:
102
+ - The previous fix didn't stick
103
+ - The prevention didn't transfer
104
+ - The CONTEXT.md entry was either missing or ignored
105
+
106
+ A recurrence postmortem should focus heavily on Phase 5 (feed forward) and propose a STRUCTURAL fix, not just a code fix.
107
+
108
+ ---
109
+
110
+ ## Question 5 — "What's the failure mode?"
111
+
112
+ Classify the failure into a category. This guides both immediate response and what kind of structural improvement to propose:
113
+
114
+ | Category | Examples | Typical fix |
115
+ |---|---|---|
116
+ | **Code bug** | null-pointer, off-by-one, wrong condition | Code change + test |
117
+ | **Configuration bug** | wrong env var, bad timeout, missing flag | Config change + config validation in CI |
118
+ | **Schema / migration** | column missing, type mismatch, FK violation | Migration + pre-flight schema check |
119
+ | **Capacity / scale** | OOM, connection pool exhausted, CPU pegged | Scaling + capacity model in spec template |
120
+ | **External dependency** | upstream API down, vendor SLA breach | Degraded-mode fallback + dep health probe |
121
+ | **Race / concurrency** | lost write, deadlock, double-spend | Concurrency model documented + tests |
122
+ | **Data quality** | bad input from upstream, encoding issue | Validation at boundary + schema for inputs |
123
+ | **Process** | bad deploy, wrong branch, missed code review | Pipeline / process improvement |
124
+
125
+ The category determines whether the postmortem proposes a CODE fix, a PROCESS fix, or a SYSTEM fix (architectural).
126
+
127
+ ---
128
+
129
+ ## Question 6 — "What's the smallest reproducer?"
130
+
131
+ Before declaring root cause found, get a reproducer:
132
+
133
+ - **Unit test** — fastest; if you can write one that fails, you understand the bug
134
+ - **Integration test** — if behavior depends on multiple components
135
+ - **Manual reproduction** — if neither possible, document the exact steps
136
+
137
+ A bug without a reproducer is a bug not understood. Don't ship the fix until you can reproduce.
138
+
139
+ The reproducer becomes Test 1 in the fix spec's Tests Plan.
140
+
141
+ ---
142
+
143
+ ## Five-whys discipline
144
+
145
+ For the postmortem's "What went wrong" section, apply five-whys:
146
+
147
+ 1. **Why** did the symptom happen? → because <proximate cause>
148
+ 2. **Why** did that happen? → because <one level deeper>
149
+ 3. **Why** did THAT happen? → because <one more>
150
+ 4. **Why**?
151
+ 5. **Why**?
152
+
153
+ Stop when:
154
+ - The answer is a system property (e.g., "the deploy pipeline is asynchronous, so we ran without rollback safety")
155
+ - The answer would require changing fundamental architecture (escalate, don't propose in postmortem)
156
+ - You've gone 5 levels — at some point further whys are speculation
157
+
158
+ Document the chain in the postmortem. The deepest "why" you can credibly answer is the **systemic** root cause; that's what the action items should address.
159
+
160
+ ---
161
+
162
+ ## When to stop investigating
163
+
164
+ You have enough when:
165
+
166
+ 1. ✓ You can name the proximate cause (the code/config/data thing that broke)
167
+ 2. ✓ You can name a systemic cause (the why that goes deeper than "engineer X did Y")
168
+ 3. ✓ You have a reproducer (or explicit decision that one is not feasible)
169
+ 4. ✓ You know what would have prevented this (a test? a check? a config? a process?)
170
+
171
+ Then go to Phase 4 — write the postmortem.
172
+
173
+ If after 1-2 hours you can't answer #1-2 → escalate or accept "we couldn't determine root cause" honestly in the postmortem. **Don't fabricate a cause to look complete.** Unknown root causes are themselves a finding.
@@ -0,0 +1,184 @@
1
+ <!--
2
+ Mastermind blameless postmortem template.
3
+
4
+ HOW TO USE
5
+ - Copy this file to postmortems/<YYYY-MM-DD>-<short-name>.md (or docs/postmortems/, whichever convention your repo uses)
6
+ - Fill in every <placeholder> with concrete content
7
+ - Delete sections that genuinely don't apply (e.g., if a sev3 had no user impact)
8
+ - Keep the file short — a postmortem nobody reads is worse than no postmortem
9
+ - Action items get linked back here from the .mastermind/tasks/ specs they spawn
10
+
11
+ BLAMELESS PRINCIPLE
12
+ Write about systems, not people. If a person made a judgment call, frame it as:
13
+ "given the information available, the action was reasonable; the lesson is that
14
+ information X needs to be more accessible / surfaced earlier."
15
+
16
+ Anti-pattern: "Engineer X deployed without testing."
17
+ Better: "The deploy pipeline allowed merging with failing tests because the
18
+ test job was marked non-blocking three weeks ago."
19
+
20
+ TONE
21
+ - Past tense (this happened, this was tried)
22
+ - Concrete (timestamps, error messages, file paths)
23
+ - Honest about unknowns ("we don't yet know why X" is better than fabricating)
24
+ -->
25
+
26
+ # Postmortem: <short title>
27
+
28
+ ## Summary
29
+
30
+ **Date:** <YYYY-MM-DD>
31
+ **Severity:** <sev0 | sev1 | sev2 | sev3>
32
+ **Duration:** <N minutes from start to mitigation, M minutes to full resolution>
33
+ **Impact:** <one sentence — what users / systems were affected, magnitude>
34
+
35
+ <One to three sentences: what happened, what was the user impact, what was the resolution. Anyone reading should know what this is about in 30 seconds.>
36
+
37
+ ---
38
+
39
+ ## Timeline (UTC)
40
+
41
+ | Time | Event |
42
+ |---|---|
43
+ | <HH:MM> | First failure observed (per <source — log, monitor, user report>) |
44
+ | <HH:MM> | Detection (paged / reported in #ops / noticed by …) |
45
+ | <HH:MM> | Incident response engaged |
46
+ | <HH:MM> | Triage complete; severity declared as <sev>; <initial hypothesis> |
47
+ | <HH:MM> | Mitigation: <what was done> |
48
+ | <HH:MM> | Symptoms stopped |
49
+ | <HH:MM> | Root cause identified |
50
+ | <HH:MM> | Full resolution (fix deployed / patch applied / dependency restored) |
51
+ | <HH:MM> | Postmortem started |
52
+
53
+ ---
54
+
55
+ ## What happened
56
+
57
+ <2-4 paragraphs of narrative. Lead with the proximate cause. Then walk through how the symptom manifested, what was tried, what worked, what didn't.>
58
+
59
+ <If multiple things went wrong in sequence (cascading failure), name each component and how they interacted.>
60
+
61
+ ---
62
+
63
+ ## Root cause analysis
64
+
65
+ ### Proximate cause
66
+ <The specific code change / config / dependency that triggered the symptom. Cite file:line, commit SHA, or external system.>
67
+
68
+ ### Systemic causes (five-whys chain)
69
+ 1. **Why** did the symptom happen? <because…>
70
+ 2. **Why** did that happen? <because…>
71
+ 3. **Why** did that happen? <because…>
72
+ 4. **Why** did that happen? <because…>
73
+ 5. **Why** did that happen? <because…>
74
+
75
+ The deepest "why" we can credibly answer is the **systemic root cause** — that's what the action items should address.
76
+
77
+ ### Failure category
78
+
79
+ <Pick one from the investigation-playbook.md table: code bug / configuration bug / schema / capacity / external dependency / race or concurrency / data quality / process.>
80
+
81
+ ---
82
+
83
+ ## Detection
84
+
85
+ **Why did detection take <N> minutes?**
86
+ <Why didn't this fire earlier? Was there a monitor for this failure mode? Did the monitor fire but not page? Did it page someone who couldn't act?>
87
+
88
+ **What detection improvements would have caught this <K> minutes sooner?**
89
+ <Specific: "a P99 latency alert on /api/messages would have fired at 14:33 instead of 14:38 when users reported.">
90
+
91
+ ---
92
+
93
+ ## Mitigation
94
+
95
+ **Why did mitigation take <M> minutes?**
96
+ <Was the rollback path clear? Was someone with deploy access available? Was the on-call runbook accurate?>
97
+
98
+ **What mitigation improvements would have stopped the bleed sooner?**
99
+ <Specific: "a feature flag for the new code path would have let us disable in seconds instead of waiting for a rollback deploy.">
100
+
101
+ ---
102
+
103
+ ## What went well
104
+
105
+ <Yes, name what worked. Reinforces good patterns and supports psychological safety. Be specific.>
106
+
107
+ - <Thing 1 — e.g., "The Datadog dashboard for /api/messages clearly showed the regression once someone looked at it">
108
+ - <Thing 2 — e.g., "On-call rotation was clear; <person> was immediately available">
109
+ - <Thing 3>
110
+
111
+ ---
112
+
113
+ ## What didn't go well
114
+
115
+ <The hard part — be honest, but blameless. Focus on systems, gaps, processes.>
116
+
117
+ - <Thing 1 — e.g., "The spec for this change didn't include an Observability Plan, so the new code path had no metrics">
118
+ - <Thing 2 — e.g., "The deploy pipeline reported success even though the smoke test failed">
119
+ - <Thing 3>
120
+
121
+ ---
122
+
123
+ ## Where the mastermind workflow gates failed (if applicable)
124
+
125
+ *Only relevant if this incident came from a change shipped through the mastermind workflow. If it came from a hot-fix, manual ops, or pre-mastermind code, skip this section.*
126
+
127
+ - **Critic dimension(s) that should have caught this:** <e.g., "dimension #3 Observability — the design didn't include any metric / log on the failure path, and the critic missed flagging it">
128
+ - **Spec template section(s) that were empty or weak:** <e.g., "Observability Plan was 'n/a — no production runtime' but this code DOES run in production">
129
+ - **Auditor checks that passed when they shouldn't have:** <e.g., "auditor verified tests ran, but spec didn't include a load test, so capacity issue wasn't tested for">
130
+
131
+ → Each of these maps to a workflow-improvement action item (see Action Items below).
132
+
133
+ ---
134
+
135
+ ## Action items
136
+
137
+ Each action item gets owned, dated, and either (a) becomes a `.mastermind/tasks/` spec or (b) becomes a CONTEXT.md update.
138
+
139
+ | # | Action | Type | Owner | Due | Spec / CONTEXT entry |
140
+ |---|---|---|---|---|---|
141
+ | 1 | <Specific change — code, config, process, doc> | <code-fix \| context-md \| workflow-improvement \| process> | <person> | <YYYY-MM-DD> | <`.mastermind/tasks/NNN-…md` or "CONTEXT.md → Known gotchas">|
142
+ | 2 | <…> | <…> | <…> | <…> | <…> |
143
+
144
+ **Avoid action items like:**
145
+ - ❌ "Be more careful when deploying" — not actionable
146
+ - ❌ "Add monitoring" — too vague
147
+ - ❌ "Train the team on X" — training without process change rarely sticks
148
+
149
+ **Prefer action items like:**
150
+ - ✓ "Add P99 latency alert on /api/messages at 200ms threshold via Datadog monitor — `.mastermind/tasks/NNN-add-messages-latency-alert.md`"
151
+ - ✓ "Add 'capacity test' as mandatory line in Performance Considerations section of spec-template — `.mastermind/tasks/NNN-spec-template-capacity.md`"
152
+ - ✓ "Add CONTEXT.md known-gotcha: 'Redis cluster mode silently drops MULTI on key migrations during rebalance'"
153
+
154
+ ---
155
+
156
+ ## Feed forward to CONTEXT.md
157
+
158
+ The following entries get appended to project `CONTEXT.md`:
159
+
160
+ ### Known gotchas (append)
161
+ - **<one-line summary of the failure pattern>** — <bite scenario>. See `postmortems/<this file>`.
162
+
163
+ ### Don't-touch list (if applicable, append)
164
+ - **`<path or symbol>`** — <constraint that emerged from this incident>
165
+
166
+ ### Decision log (if architecture changed, append)
167
+ - **<YYYY-MM-DD> — <decision name>** — <one-sentence decision, why, alternatives rejected, source: postmortems/<this file>>
168
+
169
+ ---
170
+
171
+ ## Unknowns
172
+
173
+ *If root cause is partially or fully unknown, name what's unknown. This is honest — fabricating a cause is worse than admitting uncertainty.*
174
+
175
+ - <Unknown 1 — e.g., "We don't yet know why the Redis client closed the connection at 14:32 specifically; logs are too sparse to tell">
176
+ - <What would we need to know it: a debug log, a packet capture, a repro environment>
177
+
178
+ ---
179
+
180
+ ## Sign-off
181
+
182
+ - **Author:** <name>
183
+ - **Reviewers:** <names who read this before publishing>
184
+ - **Distribution:** <team / org / wider>
@@ -0,0 +1,117 @@
1
+ # Triage checklist — first 5 minutes
2
+
3
+ Reference for the [`mastermind-incident-response`](../SKILL.md) skill, Phase 1. Run through these questions / commands in parallel with talking to the user. Goal: enough information to choose a mitigation in Phase 2.
4
+
5
+ ---
6
+
7
+ ## What to ask the user (priority order)
8
+
9
+ 1. **Symptom — what's observable?**
10
+ - "What error are you / users seeing?"
11
+ - Paste actual error message, not a paraphrase
12
+ - Single sentence, ideally
13
+ - Example: `❌ "the app is slow"` → `✓ "/api/messages returning 500 with 'connection refused' since 14:32"`
14
+
15
+ 2. **Scope**
16
+ - "How many users / what % of traffic / which surfaces?"
17
+ - Distinguish: 1 user vs 1 region vs everyone
18
+ - If unclear → ask: do we know yet?
19
+
20
+ 3. **Severity classification** (pick one; if unclear, default to one level higher):
21
+ - **sev0** — total outage / safety / data loss; immediate
22
+ - **sev1** — major surface broken; act within minutes
23
+ - **sev2** — partial degradation; act within hours
24
+ - **sev3** — cosmetic / single user; act within days
25
+ - Severity drives whether to ask "should we page?" before doing anything else
26
+
27
+ 4. **Timeline — when did this start?**
28
+ - "What's the first timestamp you have?"
29
+ - Convert to UTC immediately, write it down
30
+ - This is what you'll correlate against deploys / git log
31
+
32
+ 5. **What's been tried?**
33
+ - Avoid duplicate effort
34
+ - If user already rolled back / restarted / flipped a flag — note it
35
+ - **Critical:** if a previous action made things WORSE, the next action shouldn't be the same kind
36
+
37
+ ---
38
+
39
+ ## What to gather in parallel (mmcg / git / files)
40
+
41
+ Run these via `mastermind-researcher` subagent or directly — should take < 1 minute:
42
+
43
+ ```bash
44
+ # What committed recently (might have shipped the issue)
45
+ git log --since='2 hours ago' --oneline
46
+
47
+ # What's deployed (if there's a deploy marker file or git tag)
48
+ git tag --sort=-creatordate | head -5
49
+ git log -10 --oneline
50
+
51
+ # What specs were finished recently
52
+ ls -lt .mastermind/tasks/ | head -10
53
+
54
+ # Are there in-progress specs that might be related?
55
+ git status -s
56
+
57
+ # Is the index reachable (am I working from stale info?)
58
+ mmcg_status
59
+
60
+ # Quick scan for the symptom — has this been seen before?
61
+ grep -i "<error string>" CONTEXT.md 2>/dev/null
62
+ ```
63
+
64
+ ---
65
+
66
+ ## Severity-driven branching
67
+
68
+ After triage, the severity determines what comes next:
69
+
70
+ | Severity | What you do next |
71
+ |---|---|
72
+ | **sev0** | Ask user: "should we page on-call before continuing?" Then go to Phase 2 with the most conservative mitigation. |
73
+ | **sev1** | Go directly to Phase 2. Watch the clock — if no improvement in 10 min, escalate. |
74
+ | **sev2** | Phase 2 with more deliberation — rollback is still preferred if available. |
75
+ | **sev3** | Skip Phase 2's "stop bleeding" urgency. Go to Phase 3 investigation. The postmortem (Phase 4) may even be lightweight (a CONTEXT.md gotcha entry, not a full doc). |
76
+
77
+ ---
78
+
79
+ ## What to write down (timeline starts now)
80
+
81
+ For the postmortem later, you'll need a timeline. Start it during triage:
82
+
83
+ ```
84
+ 14:32 UTC — first failure observed (per <source>)
85
+ 14:36 UTC — user reported in #ops
86
+ 14:38 UTC — incident response engaged
87
+ 14:39 UTC — triage complete; sev1 declared; investigating recent deploys
88
+ ...
89
+ ```
90
+
91
+ Even rough timestamps are useful. The postmortem can refine them from logs later, but having any timeline beats reconstructing from memory.
92
+
93
+ ---
94
+
95
+ ## Anti-patterns during triage
96
+
97
+ - **Don't speculate about root cause yet.** Triage answers WHAT and WHEN, not WHY. WHY comes in Phase 3.
98
+ - **Don't write a fix yet.** Even if you're sure you know what's wrong, Phase 2 prefers rollback over hot patches.
99
+ - **Don't change scope unilaterally.** "While I'm here let me also fix X" is exactly the wrong instinct under time pressure.
100
+ - **Don't blame the deploy author** — the system allowed the bad deploy; that's the systemic issue, not the human.
101
+
102
+ ---
103
+
104
+ ## When to call this phase done
105
+
106
+ You're done with triage when you can answer all five:
107
+ 1. ✓ What's the symptom? (concrete)
108
+ 2. ✓ Who/what is affected? (scope)
109
+ 3. ✓ Severity?
110
+ 4. ✓ When did it start?
111
+ 5. ✓ What's been tried?
112
+
113
+ …and you have either:
114
+ - A candidate mitigation in mind, OR
115
+ - An explicit "I don't know yet, going to Phase 3 first" decision
116
+
117
+ Move to Phase 2.
@@ -0,0 +1,157 @@
1
+ ---
2
+ name: mastermind-prompt-refiner
3
+ description: Refines a user's rough, vague, or under-specified prompt into a clean, executable one before handing it off to another agent or skill. Use as a front-stage filter in delegation workflows, or when the user says "improve this prompt", "rewrite this prompt for an agent", "make this clearer".
4
+ metadata:
5
+ version: 0.1.0
6
+ authors:
7
+ - mastermind
8
+ tags:
9
+ - prompt-engineering
10
+ - workflow
11
+ model: sonnet
12
+ ---
13
+
14
+ # Prompt Refiner
15
+
16
+ Sits between the user and a downstream agent (planner, executor, reviewer, …) and rewrites the raw user input into a refined prompt. The downstream agent sees the refined version, not the user's brain dump.
17
+
18
+ This is a **one-pass** skill: input goes in, refined prompt comes out. Not a tutorial on prompt engineering, not a general-purpose advisor. If the user wants to learn prompt engineering, point them at [`references/techniques.md`](references/techniques.md) instead.
19
+
20
+ ## When to use
21
+
22
+ - User's request is rough, vague, missing context, or bundles multiple intents
23
+ - About to spawn a subagent and you want the input to be a clean prompt, not a raw idea
24
+ - User explicitly asks "improve this prompt", "rewrite this prompt", "make this clearer for an agent"
25
+ - Do NOT use to refine prompts that the user wants *you* to answer right now — only when the next step is another agent/skill. If they want an answer, just answer.
26
+
27
+ ## Process
28
+
29
+ ### 1. Read the input. Identify three things.
30
+ - **Goal** — what does the user actually want to accomplish?
31
+ - **Next consumer** — who reads the refined prompt next? (planner / executor / reviewer / unspecified)
32
+ - **Gaps** — what's vague, missing, or contradictory?
33
+
34
+ ### 2. Decide: refine inline, or ask first?
35
+
36
+ | Situation | What to do |
37
+ |---|---|
38
+ | Goal is clear, 1-3 small gaps (format, length, edge case) | Refine inline. Mark unresolvable gaps with `<NEEDS: …>` placeholders. |
39
+ | Goal itself is ambiguous (multiple plausible interpretations → different prompts) | Ask 1-3 targeted questions. Stop. Do not guess. |
40
+ | Prompt is already tight | Return the original with a one-line "no changes needed". |
41
+
42
+ Do not stack: max 3 clarifying questions, max one refinement pass per call.
43
+
44
+ ### 3. Apply the refinement.
45
+
46
+ Walk the [refining checklist](references/refining-checklist.md) and pick the smallest set of fixes that close the gaps. Common fixes (full list in the checklist):
47
+
48
+ - Lead with a specific verb tied to a deliverable
49
+ - State the output shape (length, structure, format)
50
+ - Surface 1-2 constraints (what NOT to do)
51
+ - Add a success criterion
52
+ - Replace hardcoded values with `{{PLACEHOLDERS}}` if the prompt will be reused
53
+
54
+ For technique-level decisions (when to add CoT, few-shot, XML structure, role framing), see [`references/techniques.md`](references/techniques.md). Pick the smallest set — don't stack techniques to show off.
55
+
56
+ ### 4. Hand off.
57
+
58
+ Output in this exact shape. The spawner copies the `## Refined prompt` block into the next agent's input:
59
+
60
+ ```markdown
61
+ ## Refined prompt
62
+
63
+ <the rewritten prompt, ready to paste verbatim into the next agent>
64
+
65
+ ## What I changed and why
66
+
67
+ - <change 1> — <reason>
68
+ - <change 2> — <reason>
69
+
70
+ ## Gaps the user still needs to fill
71
+
72
+ - <NEEDS: gap 1>
73
+ - <NEEDS: gap 2>
74
+ ```
75
+
76
+ Omit the "Gaps" section entirely if there are none.
77
+
78
+ ## What you do NOT do
79
+
80
+ - Add scope or features the user didn't ask for
81
+ - Invent details (deadlines, file paths, names, numbers) — mark with `<NEEDS:>`
82
+ - Refine prompts that are already tight ("no changes needed" is a valid output)
83
+ - Stack multiple refinement passes in one call
84
+ - Execute the prompt — that's the next agent's job
85
+ - Critique the user's writing style — only fix what affects machine consumption
86
+
87
+ ## Output examples
88
+
89
+ ### Vague request → refined
90
+
91
+ **Input:** "help me write a better prompt for analyzing customer feedback"
92
+
93
+ **Output:**
94
+ ```markdown
95
+ ## Refined prompt
96
+
97
+ You are analyzing customer feedback messages. For each message:
98
+ 1. Classify sentiment as `positive`, `negative`, or `neutral`.
99
+ 2. Extract up to 3 themes (1-3 words each).
100
+ 3. Extract up to 3 actionable issues (short imperative phrases).
101
+
102
+ Output JSON:
103
+ {"sentiment": "...", "themes": [...], "actions": [...]}
104
+
105
+ Constraints: handle 50-500 word messages. If the message is empty or off-topic, return {"sentiment": "neutral", "themes": [], "actions": []}.
106
+
107
+ Feedback message:
108
+ <NEEDS: actual message text>
109
+
110
+ ## What I changed and why
111
+
112
+ - Added concrete classification labels — original "analyze" was undefined
113
+ - Added JSON output shape — downstream tools need parseable output
114
+ - Added empty-input handling — production prompts must define edge behavior
115
+
116
+ ## Gaps the user still needs to fill
117
+
118
+ - <NEEDS: actual feedback message to analyze>
119
+ ```
120
+
121
+ ### Already-tight prompt → passthrough
122
+
123
+ **Input:** A 200-word prompt with role, format, constraints, and success criterion already in place.
124
+
125
+ **Output:**
126
+ ```markdown
127
+ ## Refined prompt
128
+
129
+ <original prompt verbatim>
130
+
131
+ ## What I changed and why
132
+
133
+ No changes needed — prompt has clear role, format, constraints, and success criterion.
134
+ ```
135
+
136
+ ### Ambiguous goal → ask
137
+
138
+ **Input:** "improve my onboarding"
139
+
140
+ **Output:**
141
+ ```markdown
142
+ I need 2 clarifications before I can refine this:
143
+
144
+ 1. Improve onboarding for whom — end users, new employees, API consumers, something else?
145
+ 2. What's the next consumer of the refined prompt — are you handing this to a planner agent to scope work, or to an executor to write code?
146
+
147
+ (Optional) Anything you already know is broken about the current onboarding?
148
+ ```
149
+
150
+ ## References
151
+
152
+ - [`references/techniques.md`](references/techniques.md) — when to apply CoT, few-shot, XML, role-based, prefilling, chaining
153
+ - [`references/refining-checklist.md`](references/refining-checklist.md) — anti-patterns to fix + before/after examples
154
+
155
+ ## Pair pieces
156
+
157
+ The runtime companion is the `mastermind-prompt-refiner` subagent. Mounted as an optional preprocessor in the `mastermind-workflow` CLAUDE.md.