@xcraftmind/mastermind 0.24.0 → 0.26.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +6 -4
- package/bin/mastermind.js +4 -0
- package/package.json +9 -8
- package/share/agents/mastermind-auditor.md +205 -0
- package/share/agents/mastermind-critic.md +222 -0
- package/share/agents/mastermind-prompt-refiner.md +70 -0
- package/share/agents/mastermind-release.md +442 -0
- package/share/agents/mastermind-researcher.md +167 -0
- package/share/agents/mastermind-task-executor.md +86 -0
- package/share/commands/api-shape-explorer.md +107 -0
- package/share/skills/doc-stub-sync/SKILL.md +187 -0
- package/share/skills/doc-stub-sync/references/error-handling.md +79 -0
- package/share/skills/doc-stub-sync/references/url-patterns.md +83 -0
- package/share/skills/doc-stub-sync/scripts/doc_update.py +285 -0
- package/share/skills/doc-stub-sync/scripts/requirements.txt +2 -0
- package/share/skills/flaky-finder/SKILL.md +75 -0
- package/share/skills/mastermind-incident-response/SKILL.md +157 -0
- package/share/skills/mastermind-incident-response/references/investigation-playbook.md +173 -0
- package/share/skills/mastermind-incident-response/references/postmortem-template.md +184 -0
- package/share/skills/mastermind-incident-response/references/triage-checklist.md +117 -0
- package/share/skills/mastermind-prompt-refiner/SKILL.md +157 -0
- package/share/skills/mastermind-prompt-refiner/references/refining-checklist.md +89 -0
- package/share/skills/mastermind-prompt-refiner/references/techniques.md +143 -0
- package/share/skills/mastermind-task-executor/SKILL.md +154 -0
- package/share/skills/mastermind-task-planning/SKILL.md +337 -0
- package/share/skills/mastermind-task-planning/references/spec-template.md +286 -0
- package/share/skills/pr-review/SKILL.md +89 -0
|
@@ -0,0 +1,173 @@
|
|
|
1
|
+
# Investigation playbook — find root cause via mmcg + git + .mastermind/tasks/
|
|
2
|
+
|
|
3
|
+
Reference for the [`mastermind-incident-response`](../SKILL.md) skill, Phase 3. After symptoms have stopped, find what actually broke.
|
|
4
|
+
|
|
5
|
+
The patterns below are concrete recipes. Use them when the corresponding question comes up — don't run all of them speculatively (wastes context).
|
|
6
|
+
|
|
7
|
+
---
|
|
8
|
+
|
|
9
|
+
## Question 1 — "What changed recently?"
|
|
10
|
+
|
|
11
|
+
Most production incidents trace to a recent change. Start here.
|
|
12
|
+
|
|
13
|
+
```bash
|
|
14
|
+
# What was committed in the window when the incident started?
|
|
15
|
+
git log --since='2 hours ago' --until='now' --oneline
|
|
16
|
+
|
|
17
|
+
# What was committed in the file/dir we suspect?
|
|
18
|
+
git log -20 --oneline -- <suspected/path/>
|
|
19
|
+
|
|
20
|
+
# What did the most recent commits actually change?
|
|
21
|
+
git log -5 -p --stat -- <suspected/path/>
|
|
22
|
+
```
|
|
23
|
+
|
|
24
|
+
Then for any candidate commit:
|
|
25
|
+
```bash
|
|
26
|
+
git show <commit-sha> --stat # what files
|
|
27
|
+
git show <commit-sha> # full diff
|
|
28
|
+
```
|
|
29
|
+
|
|
30
|
+
**Heuristic for ranking suspect commits:**
|
|
31
|
+
- Most recent first
|
|
32
|
+
- Bigger diffs first (more surface to have bugs)
|
|
33
|
+
- Commits to "interesting" paths (hot paths, recently-incident-prone dirs)
|
|
34
|
+
- Commits that touch the symptom's component (grep error message in commit diff)
|
|
35
|
+
|
|
36
|
+
---
|
|
37
|
+
|
|
38
|
+
## Question 2 — "What's the blast radius of the change?"
|
|
39
|
+
|
|
40
|
+
If you have a suspect commit, what does it touch that could explain the symptom?
|
|
41
|
+
|
|
42
|
+
```bash
|
|
43
|
+
# What symbols changed in the suspect commit?
|
|
44
|
+
git show <commit-sha> --name-only
|
|
45
|
+
|
|
46
|
+
# For each changed function/method, what calls it?
|
|
47
|
+
mmcg_callers <symbol> --language <lang>
|
|
48
|
+
|
|
49
|
+
# Transitive — what else depends on it?
|
|
50
|
+
mmcg_impact <symbol> --depth 3 --language <lang>
|
|
51
|
+
```
|
|
52
|
+
|
|
53
|
+
If the blast radius doesn't include the symptom's component → the suspect commit probably isn't the cause. Move to the next candidate.
|
|
54
|
+
|
|
55
|
+
If the blast radius DOES include the symptom's component → strong candidate. Read the code change.
|
|
56
|
+
|
|
57
|
+
---
|
|
58
|
+
|
|
59
|
+
## Question 3 — "Were the relevant specs supposed to catch this?"
|
|
60
|
+
|
|
61
|
+
Spec for any recent change should have had Tests Plan + Observability Plan + Performance Considerations sections (per spec template).
|
|
62
|
+
|
|
63
|
+
```bash
|
|
64
|
+
# Find the spec for this work (look in .mastermind/tasks/ for matching name or timestamp)
|
|
65
|
+
ls -lt .mastermind/tasks/
|
|
66
|
+
|
|
67
|
+
# Read its Tests Plan
|
|
68
|
+
grep -A 20 "Tests Plan" .mastermind/tasks/<NNN>-<name>.md
|
|
69
|
+
|
|
70
|
+
# Read its Observability Plan
|
|
71
|
+
grep -A 10 "Observability Plan" .mastermind/tasks/<NNN>-<name>.md
|
|
72
|
+
|
|
73
|
+
# Read its Performance Considerations
|
|
74
|
+
grep -A 10 "Performance Considerations" .mastermind/tasks/<NNN>-<name>.md
|
|
75
|
+
```
|
|
76
|
+
|
|
77
|
+
Then ask:
|
|
78
|
+
|
|
79
|
+
- **Did the Tests Plan cover this failure mode?** If no, that's a gap in spec quality. Action item: "harden Tests Plan for similar work."
|
|
80
|
+
- **Did Observability fire for this failure?** If no, was it instrumented? If yes, did it page? Detection time is its own failure to root-cause.
|
|
81
|
+
- **Did Performance Considerations anticipate this load?** If the issue is scale-related, the spec should have predicted it.
|
|
82
|
+
|
|
83
|
+
These answers feed the postmortem's "Why detection took N minutes" and "Workflow improvements" sections.
|
|
84
|
+
|
|
85
|
+
---
|
|
86
|
+
|
|
87
|
+
## Question 4 — "Has this happened before?"
|
|
88
|
+
|
|
89
|
+
```bash
|
|
90
|
+
# Check known gotchas
|
|
91
|
+
grep -B 2 -A 4 -i "<symptom keywords>" CONTEXT.md
|
|
92
|
+
|
|
93
|
+
# Check don't-touch list
|
|
94
|
+
grep -B 2 -A 4 "<file or path>" CONTEXT.md
|
|
95
|
+
|
|
96
|
+
# Check past postmortems (if you keep them)
|
|
97
|
+
ls postmortems/ 2>/dev/null || ls docs/postmortems/ 2>/dev/null
|
|
98
|
+
grep -ri "<symptom>" postmortems/ 2>/dev/null
|
|
99
|
+
```
|
|
100
|
+
|
|
101
|
+
If yes → this is a **recurrence**. That's a MUCH bigger finding than a first-time incident:
|
|
102
|
+
- The previous fix didn't stick
|
|
103
|
+
- The prevention didn't transfer
|
|
104
|
+
- The CONTEXT.md entry was either missing or ignored
|
|
105
|
+
|
|
106
|
+
A recurrence postmortem should focus heavily on Phase 5 (feed forward) and propose a STRUCTURAL fix, not just a code fix.
|
|
107
|
+
|
|
108
|
+
---
|
|
109
|
+
|
|
110
|
+
## Question 5 — "What's the failure mode?"
|
|
111
|
+
|
|
112
|
+
Classify the failure into a category. This guides both immediate response and what kind of structural improvement to propose:
|
|
113
|
+
|
|
114
|
+
| Category | Examples | Typical fix |
|
|
115
|
+
|---|---|---|
|
|
116
|
+
| **Code bug** | null-pointer, off-by-one, wrong condition | Code change + test |
|
|
117
|
+
| **Configuration bug** | wrong env var, bad timeout, missing flag | Config change + config validation in CI |
|
|
118
|
+
| **Schema / migration** | column missing, type mismatch, FK violation | Migration + pre-flight schema check |
|
|
119
|
+
| **Capacity / scale** | OOM, connection pool exhausted, CPU pegged | Scaling + capacity model in spec template |
|
|
120
|
+
| **External dependency** | upstream API down, vendor SLA breach | Degraded-mode fallback + dep health probe |
|
|
121
|
+
| **Race / concurrency** | lost write, deadlock, double-spend | Concurrency model documented + tests |
|
|
122
|
+
| **Data quality** | bad input from upstream, encoding issue | Validation at boundary + schema for inputs |
|
|
123
|
+
| **Process** | bad deploy, wrong branch, missed code review | Pipeline / process improvement |
|
|
124
|
+
|
|
125
|
+
The category determines whether the postmortem proposes a CODE fix, a PROCESS fix, or a SYSTEM fix (architectural).
|
|
126
|
+
|
|
127
|
+
---
|
|
128
|
+
|
|
129
|
+
## Question 6 — "What's the smallest reproducer?"
|
|
130
|
+
|
|
131
|
+
Before declaring root cause found, get a reproducer:
|
|
132
|
+
|
|
133
|
+
- **Unit test** — fastest; if you can write one that fails, you understand the bug
|
|
134
|
+
- **Integration test** — if behavior depends on multiple components
|
|
135
|
+
- **Manual reproduction** — if neither possible, document the exact steps
|
|
136
|
+
|
|
137
|
+
A bug without a reproducer is a bug not understood. Don't ship the fix until you can reproduce.
|
|
138
|
+
|
|
139
|
+
The reproducer becomes Test 1 in the fix spec's Tests Plan.
|
|
140
|
+
|
|
141
|
+
---
|
|
142
|
+
|
|
143
|
+
## Five-whys discipline
|
|
144
|
+
|
|
145
|
+
For the postmortem's "What went wrong" section, apply five-whys:
|
|
146
|
+
|
|
147
|
+
1. **Why** did the symptom happen? → because <proximate cause>
|
|
148
|
+
2. **Why** did that happen? → because <one level deeper>
|
|
149
|
+
3. **Why** did THAT happen? → because <one more>
|
|
150
|
+
4. **Why**?
|
|
151
|
+
5. **Why**?
|
|
152
|
+
|
|
153
|
+
Stop when:
|
|
154
|
+
- The answer is a system property (e.g., "the deploy pipeline is asynchronous, so we ran without rollback safety")
|
|
155
|
+
- The answer would require changing fundamental architecture (escalate, don't propose in postmortem)
|
|
156
|
+
- You've gone 5 levels — at some point further whys are speculation
|
|
157
|
+
|
|
158
|
+
Document the chain in the postmortem. The deepest "why" you can credibly answer is the **systemic** root cause; that's what the action items should address.
|
|
159
|
+
|
|
160
|
+
---
|
|
161
|
+
|
|
162
|
+
## When to stop investigating
|
|
163
|
+
|
|
164
|
+
You have enough when:
|
|
165
|
+
|
|
166
|
+
1. ✓ You can name the proximate cause (the code/config/data thing that broke)
|
|
167
|
+
2. ✓ You can name a systemic cause (the why that goes deeper than "engineer X did Y")
|
|
168
|
+
3. ✓ You have a reproducer (or explicit decision that one is not feasible)
|
|
169
|
+
4. ✓ You know what would have prevented this (a test? a check? a config? a process?)
|
|
170
|
+
|
|
171
|
+
Then go to Phase 4 — write the postmortem.
|
|
172
|
+
|
|
173
|
+
If after 1-2 hours you can't answer #1-2 → escalate or accept "we couldn't determine root cause" honestly in the postmortem. **Don't fabricate a cause to look complete.** Unknown root causes are themselves a finding.
|
|
@@ -0,0 +1,184 @@
|
|
|
1
|
+
<!--
|
|
2
|
+
Mastermind blameless postmortem template.
|
|
3
|
+
|
|
4
|
+
HOW TO USE
|
|
5
|
+
- Copy this file to postmortems/<YYYY-MM-DD>-<short-name>.md (or docs/postmortems/, whichever convention your repo uses)
|
|
6
|
+
- Fill in every <placeholder> with concrete content
|
|
7
|
+
- Delete sections that genuinely don't apply (e.g., if a sev3 had no user impact)
|
|
8
|
+
- Keep the file short — a postmortem nobody reads is worse than no postmortem
|
|
9
|
+
- Action items get linked back here from the .mastermind/tasks/ specs they spawn
|
|
10
|
+
|
|
11
|
+
BLAMELESS PRINCIPLE
|
|
12
|
+
Write about systems, not people. If a person made a judgment call, frame it as:
|
|
13
|
+
"given the information available, the action was reasonable; the lesson is that
|
|
14
|
+
information X needs to be more accessible / surfaced earlier."
|
|
15
|
+
|
|
16
|
+
Anti-pattern: "Engineer X deployed without testing."
|
|
17
|
+
Better: "The deploy pipeline allowed merging with failing tests because the
|
|
18
|
+
test job was marked non-blocking three weeks ago."
|
|
19
|
+
|
|
20
|
+
TONE
|
|
21
|
+
- Past tense (this happened, this was tried)
|
|
22
|
+
- Concrete (timestamps, error messages, file paths)
|
|
23
|
+
- Honest about unknowns ("we don't yet know why X" is better than fabricating)
|
|
24
|
+
-->
|
|
25
|
+
|
|
26
|
+
# Postmortem: <short title>
|
|
27
|
+
|
|
28
|
+
## Summary
|
|
29
|
+
|
|
30
|
+
**Date:** <YYYY-MM-DD>
|
|
31
|
+
**Severity:** <sev0 | sev1 | sev2 | sev3>
|
|
32
|
+
**Duration:** <N minutes from start to mitigation, M minutes to full resolution>
|
|
33
|
+
**Impact:** <one sentence — what users / systems were affected, magnitude>
|
|
34
|
+
|
|
35
|
+
<One to three sentences: what happened, what was the user impact, what was the resolution. Anyone reading should know what this is about in 30 seconds.>
|
|
36
|
+
|
|
37
|
+
---
|
|
38
|
+
|
|
39
|
+
## Timeline (UTC)
|
|
40
|
+
|
|
41
|
+
| Time | Event |
|
|
42
|
+
|---|---|
|
|
43
|
+
| <HH:MM> | First failure observed (per <source — log, monitor, user report>) |
|
|
44
|
+
| <HH:MM> | Detection (paged / reported in #ops / noticed by …) |
|
|
45
|
+
| <HH:MM> | Incident response engaged |
|
|
46
|
+
| <HH:MM> | Triage complete; severity declared as <sev>; <initial hypothesis> |
|
|
47
|
+
| <HH:MM> | Mitigation: <what was done> |
|
|
48
|
+
| <HH:MM> | Symptoms stopped |
|
|
49
|
+
| <HH:MM> | Root cause identified |
|
|
50
|
+
| <HH:MM> | Full resolution (fix deployed / patch applied / dependency restored) |
|
|
51
|
+
| <HH:MM> | Postmortem started |
|
|
52
|
+
|
|
53
|
+
---
|
|
54
|
+
|
|
55
|
+
## What happened
|
|
56
|
+
|
|
57
|
+
<2-4 paragraphs of narrative. Lead with the proximate cause. Then walk through how the symptom manifested, what was tried, what worked, what didn't.>
|
|
58
|
+
|
|
59
|
+
<If multiple things went wrong in sequence (cascading failure), name each component and how they interacted.>
|
|
60
|
+
|
|
61
|
+
---
|
|
62
|
+
|
|
63
|
+
## Root cause analysis
|
|
64
|
+
|
|
65
|
+
### Proximate cause
|
|
66
|
+
<The specific code change / config / dependency that triggered the symptom. Cite file:line, commit SHA, or external system.>
|
|
67
|
+
|
|
68
|
+
### Systemic causes (five-whys chain)
|
|
69
|
+
1. **Why** did the symptom happen? <because…>
|
|
70
|
+
2. **Why** did that happen? <because…>
|
|
71
|
+
3. **Why** did that happen? <because…>
|
|
72
|
+
4. **Why** did that happen? <because…>
|
|
73
|
+
5. **Why** did that happen? <because…>
|
|
74
|
+
|
|
75
|
+
The deepest "why" we can credibly answer is the **systemic root cause** — that's what the action items should address.
|
|
76
|
+
|
|
77
|
+
### Failure category
|
|
78
|
+
|
|
79
|
+
<Pick one from the investigation-playbook.md table: code bug / configuration bug / schema / capacity / external dependency / race or concurrency / data quality / process.>
|
|
80
|
+
|
|
81
|
+
---
|
|
82
|
+
|
|
83
|
+
## Detection
|
|
84
|
+
|
|
85
|
+
**Why did detection take <N> minutes?**
|
|
86
|
+
<Why didn't this fire earlier? Was there a monitor for this failure mode? Did the monitor fire but not page? Did it page someone who couldn't act?>
|
|
87
|
+
|
|
88
|
+
**What detection improvements would have caught this <K> minutes sooner?**
|
|
89
|
+
<Specific: "a P99 latency alert on /api/messages would have fired at 14:33 instead of 14:38 when users reported.">
|
|
90
|
+
|
|
91
|
+
---
|
|
92
|
+
|
|
93
|
+
## Mitigation
|
|
94
|
+
|
|
95
|
+
**Why did mitigation take <M> minutes?**
|
|
96
|
+
<Was the rollback path clear? Was someone with deploy access available? Was the on-call runbook accurate?>
|
|
97
|
+
|
|
98
|
+
**What mitigation improvements would have stopped the bleed sooner?**
|
|
99
|
+
<Specific: "a feature flag for the new code path would have let us disable in seconds instead of waiting for a rollback deploy.">
|
|
100
|
+
|
|
101
|
+
---
|
|
102
|
+
|
|
103
|
+
## What went well
|
|
104
|
+
|
|
105
|
+
<Yes, name what worked. Reinforces good patterns and supports psychological safety. Be specific.>
|
|
106
|
+
|
|
107
|
+
- <Thing 1 — e.g., "The Datadog dashboard for /api/messages clearly showed the regression once someone looked at it">
|
|
108
|
+
- <Thing 2 — e.g., "On-call rotation was clear; <person> was immediately available">
|
|
109
|
+
- <Thing 3>
|
|
110
|
+
|
|
111
|
+
---
|
|
112
|
+
|
|
113
|
+
## What didn't go well
|
|
114
|
+
|
|
115
|
+
<The hard part — be honest, but blameless. Focus on systems, gaps, processes.>
|
|
116
|
+
|
|
117
|
+
- <Thing 1 — e.g., "The spec for this change didn't include an Observability Plan, so the new code path had no metrics">
|
|
118
|
+
- <Thing 2 — e.g., "The deploy pipeline reported success even though the smoke test failed">
|
|
119
|
+
- <Thing 3>
|
|
120
|
+
|
|
121
|
+
---
|
|
122
|
+
|
|
123
|
+
## Where the mastermind workflow gates failed (if applicable)
|
|
124
|
+
|
|
125
|
+
*Only relevant if this incident came from a change shipped through the mastermind workflow. If it came from a hot-fix, manual ops, or pre-mastermind code, skip this section.*
|
|
126
|
+
|
|
127
|
+
- **Critic dimension(s) that should have caught this:** <e.g., "dimension #3 Observability — the design didn't include any metric / log on the failure path, and the critic missed flagging it">
|
|
128
|
+
- **Spec template section(s) that were empty or weak:** <e.g., "Observability Plan was 'n/a — no production runtime' but this code DOES run in production">
|
|
129
|
+
- **Auditor checks that passed when they shouldn't have:** <e.g., "auditor verified tests ran, but spec didn't include a load test, so capacity issue wasn't tested for">
|
|
130
|
+
|
|
131
|
+
→ Each of these maps to a workflow-improvement action item (see Action Items below).
|
|
132
|
+
|
|
133
|
+
---
|
|
134
|
+
|
|
135
|
+
## Action items
|
|
136
|
+
|
|
137
|
+
Each action item gets owned, dated, and either (a) becomes a `.mastermind/tasks/` spec or (b) becomes a CONTEXT.md update.
|
|
138
|
+
|
|
139
|
+
| # | Action | Type | Owner | Due | Spec / CONTEXT entry |
|
|
140
|
+
|---|---|---|---|---|---|
|
|
141
|
+
| 1 | <Specific change — code, config, process, doc> | <code-fix \| context-md \| workflow-improvement \| process> | <person> | <YYYY-MM-DD> | <`.mastermind/tasks/NNN-…md` or "CONTEXT.md → Known gotchas">|
|
|
142
|
+
| 2 | <…> | <…> | <…> | <…> | <…> |
|
|
143
|
+
|
|
144
|
+
**Avoid action items like:**
|
|
145
|
+
- ❌ "Be more careful when deploying" — not actionable
|
|
146
|
+
- ❌ "Add monitoring" — too vague
|
|
147
|
+
- ❌ "Train the team on X" — training without process change rarely sticks
|
|
148
|
+
|
|
149
|
+
**Prefer action items like:**
|
|
150
|
+
- ✓ "Add P99 latency alert on /api/messages at 200ms threshold via Datadog monitor — `.mastermind/tasks/NNN-add-messages-latency-alert.md`"
|
|
151
|
+
- ✓ "Add 'capacity test' as mandatory line in Performance Considerations section of spec-template — `.mastermind/tasks/NNN-spec-template-capacity.md`"
|
|
152
|
+
- ✓ "Add CONTEXT.md known-gotcha: 'Redis cluster mode silently drops MULTI on key migrations during rebalance'"
|
|
153
|
+
|
|
154
|
+
---
|
|
155
|
+
|
|
156
|
+
## Feed forward to CONTEXT.md
|
|
157
|
+
|
|
158
|
+
The following entries get appended to project `CONTEXT.md`:
|
|
159
|
+
|
|
160
|
+
### Known gotchas (append)
|
|
161
|
+
- **<one-line summary of the failure pattern>** — <bite scenario>. See `postmortems/<this file>`.
|
|
162
|
+
|
|
163
|
+
### Don't-touch list (if applicable, append)
|
|
164
|
+
- **`<path or symbol>`** — <constraint that emerged from this incident>
|
|
165
|
+
|
|
166
|
+
### Decision log (if architecture changed, append)
|
|
167
|
+
- **<YYYY-MM-DD> — <decision name>** — <one-sentence decision, why, alternatives rejected, source: postmortems/<this file>>
|
|
168
|
+
|
|
169
|
+
---
|
|
170
|
+
|
|
171
|
+
## Unknowns
|
|
172
|
+
|
|
173
|
+
*If root cause is partially or fully unknown, name what's unknown. This is honest — fabricating a cause is worse than admitting uncertainty.*
|
|
174
|
+
|
|
175
|
+
- <Unknown 1 — e.g., "We don't yet know why the Redis client closed the connection at 14:32 specifically; logs are too sparse to tell">
|
|
176
|
+
- <What would we need to know it: a debug log, a packet capture, a repro environment>
|
|
177
|
+
|
|
178
|
+
---
|
|
179
|
+
|
|
180
|
+
## Sign-off
|
|
181
|
+
|
|
182
|
+
- **Author:** <name>
|
|
183
|
+
- **Reviewers:** <names who read this before publishing>
|
|
184
|
+
- **Distribution:** <team / org / wider>
|
|
@@ -0,0 +1,117 @@
|
|
|
1
|
+
# Triage checklist — first 5 minutes
|
|
2
|
+
|
|
3
|
+
Reference for the [`mastermind-incident-response`](../SKILL.md) skill, Phase 1. Run through these questions / commands in parallel with talking to the user. Goal: enough information to choose a mitigation in Phase 2.
|
|
4
|
+
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
## What to ask the user (priority order)
|
|
8
|
+
|
|
9
|
+
1. **Symptom — what's observable?**
|
|
10
|
+
- "What error are you / users seeing?"
|
|
11
|
+
- Paste actual error message, not a paraphrase
|
|
12
|
+
- Single sentence, ideally
|
|
13
|
+
- Example: `❌ "the app is slow"` → `✓ "/api/messages returning 500 with 'connection refused' since 14:32"`
|
|
14
|
+
|
|
15
|
+
2. **Scope**
|
|
16
|
+
- "How many users / what % of traffic / which surfaces?"
|
|
17
|
+
- Distinguish: 1 user vs 1 region vs everyone
|
|
18
|
+
- If unclear → ask: do we know yet?
|
|
19
|
+
|
|
20
|
+
3. **Severity classification** (pick one; if unclear, default to one level higher):
|
|
21
|
+
- **sev0** — total outage / safety / data loss; immediate
|
|
22
|
+
- **sev1** — major surface broken; act within minutes
|
|
23
|
+
- **sev2** — partial degradation; act within hours
|
|
24
|
+
- **sev3** — cosmetic / single user; act within days
|
|
25
|
+
- Severity drives whether to ask "should we page?" before doing anything else
|
|
26
|
+
|
|
27
|
+
4. **Timeline — when did this start?**
|
|
28
|
+
- "What's the first timestamp you have?"
|
|
29
|
+
- Convert to UTC immediately, write it down
|
|
30
|
+
- This is what you'll correlate against deploys / git log
|
|
31
|
+
|
|
32
|
+
5. **What's been tried?**
|
|
33
|
+
- Avoid duplicate effort
|
|
34
|
+
- If user already rolled back / restarted / flipped a flag — note it
|
|
35
|
+
- **Critical:** if a previous action made things WORSE, the next action shouldn't be the same kind
|
|
36
|
+
|
|
37
|
+
---
|
|
38
|
+
|
|
39
|
+
## What to gather in parallel (mmcg / git / files)
|
|
40
|
+
|
|
41
|
+
Run these via `mastermind-researcher` subagent or directly — should take < 1 minute:
|
|
42
|
+
|
|
43
|
+
```bash
|
|
44
|
+
# What committed recently (might have shipped the issue)
|
|
45
|
+
git log --since='2 hours ago' --oneline
|
|
46
|
+
|
|
47
|
+
# What's deployed (if there's a deploy marker file or git tag)
|
|
48
|
+
git tag --sort=-creatordate | head -5
|
|
49
|
+
git log -10 --oneline
|
|
50
|
+
|
|
51
|
+
# What specs were finished recently
|
|
52
|
+
ls -lt .mastermind/tasks/ | head -10
|
|
53
|
+
|
|
54
|
+
# Are there in-progress specs that might be related?
|
|
55
|
+
git status -s
|
|
56
|
+
|
|
57
|
+
# Is the index reachable (am I working from stale info?)
|
|
58
|
+
mmcg_status
|
|
59
|
+
|
|
60
|
+
# Quick scan for the symptom — has this been seen before?
|
|
61
|
+
grep -i "<error string>" CONTEXT.md 2>/dev/null
|
|
62
|
+
```
|
|
63
|
+
|
|
64
|
+
---
|
|
65
|
+
|
|
66
|
+
## Severity-driven branching
|
|
67
|
+
|
|
68
|
+
After triage, the severity determines what comes next:
|
|
69
|
+
|
|
70
|
+
| Severity | What you do next |
|
|
71
|
+
|---|---|
|
|
72
|
+
| **sev0** | Ask user: "should we page on-call before continuing?" Then go to Phase 2 with the most conservative mitigation. |
|
|
73
|
+
| **sev1** | Go directly to Phase 2. Watch the clock — if no improvement in 10 min, escalate. |
|
|
74
|
+
| **sev2** | Phase 2 with more deliberation — rollback is still preferred if available. |
|
|
75
|
+
| **sev3** | Skip Phase 2's "stop bleeding" urgency. Go to Phase 3 investigation. The postmortem (Phase 4) may even be lightweight (a CONTEXT.md gotcha entry, not a full doc). |
|
|
76
|
+
|
|
77
|
+
---
|
|
78
|
+
|
|
79
|
+
## What to write down (timeline starts now)
|
|
80
|
+
|
|
81
|
+
For the postmortem later, you'll need a timeline. Start it during triage:
|
|
82
|
+
|
|
83
|
+
```
|
|
84
|
+
14:32 UTC — first failure observed (per <source>)
|
|
85
|
+
14:36 UTC — user reported in #ops
|
|
86
|
+
14:38 UTC — incident response engaged
|
|
87
|
+
14:39 UTC — triage complete; sev1 declared; investigating recent deploys
|
|
88
|
+
...
|
|
89
|
+
```
|
|
90
|
+
|
|
91
|
+
Even rough timestamps are useful. The postmortem can refine them from logs later, but having any timeline beats reconstructing from memory.
|
|
92
|
+
|
|
93
|
+
---
|
|
94
|
+
|
|
95
|
+
## Anti-patterns during triage
|
|
96
|
+
|
|
97
|
+
- **Don't speculate about root cause yet.** Triage answers WHAT and WHEN, not WHY. WHY comes in Phase 3.
|
|
98
|
+
- **Don't write a fix yet.** Even if you're sure you know what's wrong, Phase 2 prefers rollback over hot patches.
|
|
99
|
+
- **Don't change scope unilaterally.** "While I'm here let me also fix X" is exactly the wrong instinct under time pressure.
|
|
100
|
+
- **Don't blame the deploy author** — the system allowed the bad deploy; that's the systemic issue, not the human.
|
|
101
|
+
|
|
102
|
+
---
|
|
103
|
+
|
|
104
|
+
## When to call this phase done
|
|
105
|
+
|
|
106
|
+
You're done with triage when you can answer all five:
|
|
107
|
+
1. ✓ What's the symptom? (concrete)
|
|
108
|
+
2. ✓ Who/what is affected? (scope)
|
|
109
|
+
3. ✓ Severity?
|
|
110
|
+
4. ✓ When did it start?
|
|
111
|
+
5. ✓ What's been tried?
|
|
112
|
+
|
|
113
|
+
…and you have either:
|
|
114
|
+
- A candidate mitigation in mind, OR
|
|
115
|
+
- An explicit "I don't know yet, going to Phase 3 first" decision
|
|
116
|
+
|
|
117
|
+
Move to Phase 2.
|
|
@@ -0,0 +1,157 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: mastermind-prompt-refiner
|
|
3
|
+
description: Refines a user's rough, vague, or under-specified prompt into a clean, executable one before handing it off to another agent or skill. Use as a front-stage filter in delegation workflows, or when the user says "improve this prompt", "rewrite this prompt for an agent", "make this clearer".
|
|
4
|
+
metadata:
|
|
5
|
+
version: 0.1.0
|
|
6
|
+
authors:
|
|
7
|
+
- mastermind
|
|
8
|
+
tags:
|
|
9
|
+
- prompt-engineering
|
|
10
|
+
- workflow
|
|
11
|
+
model: sonnet
|
|
12
|
+
---
|
|
13
|
+
|
|
14
|
+
# Prompt Refiner
|
|
15
|
+
|
|
16
|
+
Sits between the user and a downstream agent (planner, executor, reviewer, …) and rewrites the raw user input into a refined prompt. The downstream agent sees the refined version, not the user's brain dump.
|
|
17
|
+
|
|
18
|
+
This is a **one-pass** skill: input goes in, refined prompt comes out. Not a tutorial on prompt engineering, not a general-purpose advisor. If the user wants to learn prompt engineering, point them at [`references/techniques.md`](references/techniques.md) instead.
|
|
19
|
+
|
|
20
|
+
## When to use
|
|
21
|
+
|
|
22
|
+
- User's request is rough, vague, missing context, or bundles multiple intents
|
|
23
|
+
- About to spawn a subagent and you want the input to be a clean prompt, not a raw idea
|
|
24
|
+
- User explicitly asks "improve this prompt", "rewrite this prompt", "make this clearer for an agent"
|
|
25
|
+
- Do NOT use to refine prompts that the user wants *you* to answer right now — only when the next step is another agent/skill. If they want an answer, just answer.
|
|
26
|
+
|
|
27
|
+
## Process
|
|
28
|
+
|
|
29
|
+
### 1. Read the input. Identify three things.
|
|
30
|
+
- **Goal** — what does the user actually want to accomplish?
|
|
31
|
+
- **Next consumer** — who reads the refined prompt next? (planner / executor / reviewer / unspecified)
|
|
32
|
+
- **Gaps** — what's vague, missing, or contradictory?
|
|
33
|
+
|
|
34
|
+
### 2. Decide: refine inline, or ask first?
|
|
35
|
+
|
|
36
|
+
| Situation | What to do |
|
|
37
|
+
|---|---|
|
|
38
|
+
| Goal is clear, 1-3 small gaps (format, length, edge case) | Refine inline. Mark unresolvable gaps with `<NEEDS: …>` placeholders. |
|
|
39
|
+
| Goal itself is ambiguous (multiple plausible interpretations → different prompts) | Ask 1-3 targeted questions. Stop. Do not guess. |
|
|
40
|
+
| Prompt is already tight | Return the original with a one-line "no changes needed". |
|
|
41
|
+
|
|
42
|
+
Do not stack: max 3 clarifying questions, max one refinement pass per call.
|
|
43
|
+
|
|
44
|
+
### 3. Apply the refinement.
|
|
45
|
+
|
|
46
|
+
Walk the [refining checklist](references/refining-checklist.md) and pick the smallest set of fixes that close the gaps. Common fixes (full list in the checklist):
|
|
47
|
+
|
|
48
|
+
- Lead with a specific verb tied to a deliverable
|
|
49
|
+
- State the output shape (length, structure, format)
|
|
50
|
+
- Surface 1-2 constraints (what NOT to do)
|
|
51
|
+
- Add a success criterion
|
|
52
|
+
- Replace hardcoded values with `{{PLACEHOLDERS}}` if the prompt will be reused
|
|
53
|
+
|
|
54
|
+
For technique-level decisions (when to add CoT, few-shot, XML structure, role framing), see [`references/techniques.md`](references/techniques.md). Pick the smallest set — don't stack techniques to show off.
|
|
55
|
+
|
|
56
|
+
### 4. Hand off.
|
|
57
|
+
|
|
58
|
+
Output in this exact shape. The spawner copies the `## Refined prompt` block into the next agent's input:
|
|
59
|
+
|
|
60
|
+
```markdown
|
|
61
|
+
## Refined prompt
|
|
62
|
+
|
|
63
|
+
<the rewritten prompt, ready to paste verbatim into the next agent>
|
|
64
|
+
|
|
65
|
+
## What I changed and why
|
|
66
|
+
|
|
67
|
+
- <change 1> — <reason>
|
|
68
|
+
- <change 2> — <reason>
|
|
69
|
+
|
|
70
|
+
## Gaps the user still needs to fill
|
|
71
|
+
|
|
72
|
+
- <NEEDS: gap 1>
|
|
73
|
+
- <NEEDS: gap 2>
|
|
74
|
+
```
|
|
75
|
+
|
|
76
|
+
Omit the "Gaps" section entirely if there are none.
|
|
77
|
+
|
|
78
|
+
## What you do NOT do
|
|
79
|
+
|
|
80
|
+
- Add scope or features the user didn't ask for
|
|
81
|
+
- Invent details (deadlines, file paths, names, numbers) — mark with `<NEEDS:>`
|
|
82
|
+
- Refine prompts that are already tight ("no changes needed" is a valid output)
|
|
83
|
+
- Stack multiple refinement passes in one call
|
|
84
|
+
- Execute the prompt — that's the next agent's job
|
|
85
|
+
- Critique the user's writing style — only fix what affects machine consumption
|
|
86
|
+
|
|
87
|
+
## Output examples
|
|
88
|
+
|
|
89
|
+
### Vague request → refined
|
|
90
|
+
|
|
91
|
+
**Input:** "help me write a better prompt for analyzing customer feedback"
|
|
92
|
+
|
|
93
|
+
**Output:**
|
|
94
|
+
```markdown
|
|
95
|
+
## Refined prompt
|
|
96
|
+
|
|
97
|
+
You are analyzing customer feedback messages. For each message:
|
|
98
|
+
1. Classify sentiment as `positive`, `negative`, or `neutral`.
|
|
99
|
+
2. Extract up to 3 themes (1-3 words each).
|
|
100
|
+
3. Extract up to 3 actionable issues (short imperative phrases).
|
|
101
|
+
|
|
102
|
+
Output JSON:
|
|
103
|
+
{"sentiment": "...", "themes": [...], "actions": [...]}
|
|
104
|
+
|
|
105
|
+
Constraints: handle 50-500 word messages. If the message is empty or off-topic, return {"sentiment": "neutral", "themes": [], "actions": []}.
|
|
106
|
+
|
|
107
|
+
Feedback message:
|
|
108
|
+
<NEEDS: actual message text>
|
|
109
|
+
|
|
110
|
+
## What I changed and why
|
|
111
|
+
|
|
112
|
+
- Added concrete classification labels — original "analyze" was undefined
|
|
113
|
+
- Added JSON output shape — downstream tools need parseable output
|
|
114
|
+
- Added empty-input handling — production prompts must define edge behavior
|
|
115
|
+
|
|
116
|
+
## Gaps the user still needs to fill
|
|
117
|
+
|
|
118
|
+
- <NEEDS: actual feedback message to analyze>
|
|
119
|
+
```
|
|
120
|
+
|
|
121
|
+
### Already-tight prompt → passthrough
|
|
122
|
+
|
|
123
|
+
**Input:** A 200-word prompt with role, format, constraints, and success criterion already in place.
|
|
124
|
+
|
|
125
|
+
**Output:**
|
|
126
|
+
```markdown
|
|
127
|
+
## Refined prompt
|
|
128
|
+
|
|
129
|
+
<original prompt verbatim>
|
|
130
|
+
|
|
131
|
+
## What I changed and why
|
|
132
|
+
|
|
133
|
+
No changes needed — prompt has clear role, format, constraints, and success criterion.
|
|
134
|
+
```
|
|
135
|
+
|
|
136
|
+
### Ambiguous goal → ask
|
|
137
|
+
|
|
138
|
+
**Input:** "improve my onboarding"
|
|
139
|
+
|
|
140
|
+
**Output:**
|
|
141
|
+
```markdown
|
|
142
|
+
I need 2 clarifications before I can refine this:
|
|
143
|
+
|
|
144
|
+
1. Improve onboarding for whom — end users, new employees, API consumers, something else?
|
|
145
|
+
2. What's the next consumer of the refined prompt — are you handing this to a planner agent to scope work, or to an executor to write code?
|
|
146
|
+
|
|
147
|
+
(Optional) Anything you already know is broken about the current onboarding?
|
|
148
|
+
```
|
|
149
|
+
|
|
150
|
+
## References
|
|
151
|
+
|
|
152
|
+
- [`references/techniques.md`](references/techniques.md) — when to apply CoT, few-shot, XML, role-based, prefilling, chaining
|
|
153
|
+
- [`references/refining-checklist.md`](references/refining-checklist.md) — anti-patterns to fix + before/after examples
|
|
154
|
+
|
|
155
|
+
## Pair pieces
|
|
156
|
+
|
|
157
|
+
The runtime companion is the `mastermind-prompt-refiner` subagent. Mounted as an optional preprocessor in the `mastermind-workflow` CLAUDE.md.
|