@xcraftmind/mastermind 0.24.0 → 0.25.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +6 -4
- package/bin/mastermind.js +4 -0
- package/package.json +9 -8
- package/share/agents/mastermind-auditor.md +205 -0
- package/share/agents/mastermind-critic.md +222 -0
- package/share/agents/mastermind-prompt-refiner.md +70 -0
- package/share/agents/mastermind-release.md +442 -0
- package/share/agents/mastermind-researcher.md +167 -0
- package/share/agents/mastermind-task-executor.md +86 -0
- package/share/skills/doc-stub-sync/SKILL.md +187 -0
- package/share/skills/doc-stub-sync/references/error-handling.md +79 -0
- package/share/skills/doc-stub-sync/references/url-patterns.md +83 -0
- package/share/skills/doc-stub-sync/scripts/doc_update.py +285 -0
- package/share/skills/doc-stub-sync/scripts/requirements.txt +2 -0
- package/share/skills/flaky-finder/SKILL.md +75 -0
- package/share/skills/mastermind-incident-response/SKILL.md +157 -0
- package/share/skills/mastermind-incident-response/references/investigation-playbook.md +173 -0
- package/share/skills/mastermind-incident-response/references/postmortem-template.md +184 -0
- package/share/skills/mastermind-incident-response/references/triage-checklist.md +117 -0
- package/share/skills/mastermind-prompt-refiner/SKILL.md +157 -0
- package/share/skills/mastermind-prompt-refiner/references/refining-checklist.md +89 -0
- package/share/skills/mastermind-prompt-refiner/references/techniques.md +143 -0
- package/share/skills/mastermind-task-executor/SKILL.md +154 -0
- package/share/skills/mastermind-task-planning/SKILL.md +337 -0
- package/share/skills/mastermind-task-planning/references/spec-template.md +286 -0
- package/share/skills/pr-review/SKILL.md +89 -0
|
@@ -0,0 +1,117 @@
|
|
|
1
|
+
# Triage checklist — first 5 minutes
|
|
2
|
+
|
|
3
|
+
Reference for the [`mastermind-incident-response`](../SKILL.md) skill, Phase 1. Run through these questions / commands in parallel with talking to the user. Goal: enough information to choose a mitigation in Phase 2.
|
|
4
|
+
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
## What to ask the user (priority order)
|
|
8
|
+
|
|
9
|
+
1. **Symptom — what's observable?**
|
|
10
|
+
- "What error are you / users seeing?"
|
|
11
|
+
- Paste actual error message, not a paraphrase
|
|
12
|
+
- Single sentence, ideally
|
|
13
|
+
- Example: `❌ "the app is slow"` → `✓ "/api/messages returning 500 with 'connection refused' since 14:32"`
|
|
14
|
+
|
|
15
|
+
2. **Scope**
|
|
16
|
+
- "How many users / what % of traffic / which surfaces?"
|
|
17
|
+
- Distinguish: 1 user vs 1 region vs everyone
|
|
18
|
+
- If unclear → ask: do we know yet?
|
|
19
|
+
|
|
20
|
+
3. **Severity classification** (pick one; if unclear, default to one level higher):
|
|
21
|
+
- **sev0** — total outage / safety / data loss; immediate
|
|
22
|
+
- **sev1** — major surface broken; act within minutes
|
|
23
|
+
- **sev2** — partial degradation; act within hours
|
|
24
|
+
- **sev3** — cosmetic / single user; act within days
|
|
25
|
+
- Severity drives whether to ask "should we page?" before doing anything else
|
|
26
|
+
|
|
27
|
+
4. **Timeline — when did this start?**
|
|
28
|
+
- "What's the first timestamp you have?"
|
|
29
|
+
- Convert to UTC immediately, write it down
|
|
30
|
+
- This is what you'll correlate against deploys / git log
|
|
31
|
+
|
|
32
|
+
5. **What's been tried?**
|
|
33
|
+
- Avoid duplicate effort
|
|
34
|
+
- If user already rolled back / restarted / flipped a flag — note it
|
|
35
|
+
- **Critical:** if a previous action made things WORSE, the next action shouldn't be the same kind
|
|
36
|
+
|
|
37
|
+
---
|
|
38
|
+
|
|
39
|
+
## What to gather in parallel (mmcg / git / files)
|
|
40
|
+
|
|
41
|
+
Run these via `mastermind-researcher` subagent or directly — should take < 1 minute:
|
|
42
|
+
|
|
43
|
+
```bash
|
|
44
|
+
# What committed recently (might have shipped the issue)
|
|
45
|
+
git log --since='2 hours ago' --oneline
|
|
46
|
+
|
|
47
|
+
# What's deployed (if there's a deploy marker file or git tag)
|
|
48
|
+
git tag --sort=-creatordate | head -5
|
|
49
|
+
git log -10 --oneline
|
|
50
|
+
|
|
51
|
+
# What specs were finished recently
|
|
52
|
+
ls -lt .mastermind/tasks/ | head -10
|
|
53
|
+
|
|
54
|
+
# Are there in-progress specs that might be related?
|
|
55
|
+
git status -s
|
|
56
|
+
|
|
57
|
+
# Is the index reachable (am I working from stale info?)
|
|
58
|
+
mmcg_status
|
|
59
|
+
|
|
60
|
+
# Quick scan for the symptom — has this been seen before?
|
|
61
|
+
grep -i "<error string>" CONTEXT.md 2>/dev/null
|
|
62
|
+
```
|
|
63
|
+
|
|
64
|
+
---
|
|
65
|
+
|
|
66
|
+
## Severity-driven branching
|
|
67
|
+
|
|
68
|
+
After triage, the severity determines what comes next:
|
|
69
|
+
|
|
70
|
+
| Severity | What you do next |
|
|
71
|
+
|---|---|
|
|
72
|
+
| **sev0** | Ask user: "should we page on-call before continuing?" Then go to Phase 2 with the most conservative mitigation. |
|
|
73
|
+
| **sev1** | Go directly to Phase 2. Watch the clock — if no improvement in 10 min, escalate. |
|
|
74
|
+
| **sev2** | Phase 2 with more deliberation — rollback is still preferred if available. |
|
|
75
|
+
| **sev3** | Skip Phase 2's "stop bleeding" urgency. Go to Phase 3 investigation. The postmortem (Phase 4) may even be lightweight (a CONTEXT.md gotcha entry, not a full doc). |
|
|
76
|
+
|
|
77
|
+
---
|
|
78
|
+
|
|
79
|
+
## What to write down (timeline starts now)
|
|
80
|
+
|
|
81
|
+
For the postmortem later, you'll need a timeline. Start it during triage:
|
|
82
|
+
|
|
83
|
+
```
|
|
84
|
+
14:32 UTC — first failure observed (per <source>)
|
|
85
|
+
14:36 UTC — user reported in #ops
|
|
86
|
+
14:38 UTC — incident response engaged
|
|
87
|
+
14:39 UTC — triage complete; sev1 declared; investigating recent deploys
|
|
88
|
+
...
|
|
89
|
+
```
|
|
90
|
+
|
|
91
|
+
Even rough timestamps are useful. The postmortem can refine them from logs later, but having any timeline beats reconstructing from memory.
|
|
92
|
+
|
|
93
|
+
---
|
|
94
|
+
|
|
95
|
+
## Anti-patterns during triage
|
|
96
|
+
|
|
97
|
+
- **Don't speculate about root cause yet.** Triage answers WHAT and WHEN, not WHY. WHY comes in Phase 3.
|
|
98
|
+
- **Don't write a fix yet.** Even if you're sure you know what's wrong, Phase 2 prefers rollback over hot patches.
|
|
99
|
+
- **Don't change scope unilaterally.** "While I'm here let me also fix X" is exactly the wrong instinct under time pressure.
|
|
100
|
+
- **Don't blame the deploy author** — the system allowed the bad deploy; that's the systemic issue, not the human.
|
|
101
|
+
|
|
102
|
+
---
|
|
103
|
+
|
|
104
|
+
## When to call this phase done
|
|
105
|
+
|
|
106
|
+
You're done with triage when you can answer all five:
|
|
107
|
+
1. ✓ What's the symptom? (concrete)
|
|
108
|
+
2. ✓ Who/what is affected? (scope)
|
|
109
|
+
3. ✓ Severity?
|
|
110
|
+
4. ✓ When did it start?
|
|
111
|
+
5. ✓ What's been tried?
|
|
112
|
+
|
|
113
|
+
…and you have either:
|
|
114
|
+
- A candidate mitigation in mind, OR
|
|
115
|
+
- An explicit "I don't know yet, going to Phase 3 first" decision
|
|
116
|
+
|
|
117
|
+
Move to Phase 2.
|
|
@@ -0,0 +1,157 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: mastermind-prompt-refiner
|
|
3
|
+
description: Refines a user's rough, vague, or under-specified prompt into a clean, executable one before handing it off to another agent or skill. Use as a front-stage filter in delegation workflows, or when the user says "improve this prompt", "rewrite this prompt for an agent", "make this clearer".
|
|
4
|
+
metadata:
|
|
5
|
+
version: 0.1.0
|
|
6
|
+
authors:
|
|
7
|
+
- mastermind
|
|
8
|
+
tags:
|
|
9
|
+
- prompt-engineering
|
|
10
|
+
- workflow
|
|
11
|
+
model: sonnet
|
|
12
|
+
---
|
|
13
|
+
|
|
14
|
+
# Prompt Refiner
|
|
15
|
+
|
|
16
|
+
Sits between the user and a downstream agent (planner, executor, reviewer, …) and rewrites the raw user input into a refined prompt. The downstream agent sees the refined version, not the user's brain dump.
|
|
17
|
+
|
|
18
|
+
This is a **one-pass** skill: input goes in, refined prompt comes out. Not a tutorial on prompt engineering, not a general-purpose advisor. If the user wants to learn prompt engineering, point them at [`references/techniques.md`](references/techniques.md) instead.
|
|
19
|
+
|
|
20
|
+
## When to use
|
|
21
|
+
|
|
22
|
+
- User's request is rough, vague, missing context, or bundles multiple intents
|
|
23
|
+
- About to spawn a subagent and you want the input to be a clean prompt, not a raw idea
|
|
24
|
+
- User explicitly asks "improve this prompt", "rewrite this prompt", "make this clearer for an agent"
|
|
25
|
+
- Do NOT use to refine prompts that the user wants *you* to answer right now — only when the next step is another agent/skill. If they want an answer, just answer.
|
|
26
|
+
|
|
27
|
+
## Process
|
|
28
|
+
|
|
29
|
+
### 1. Read the input. Identify three things.
|
|
30
|
+
- **Goal** — what does the user actually want to accomplish?
|
|
31
|
+
- **Next consumer** — who reads the refined prompt next? (planner / executor / reviewer / unspecified)
|
|
32
|
+
- **Gaps** — what's vague, missing, or contradictory?
|
|
33
|
+
|
|
34
|
+
### 2. Decide: refine inline, or ask first?
|
|
35
|
+
|
|
36
|
+
| Situation | What to do |
|
|
37
|
+
|---|---|
|
|
38
|
+
| Goal is clear, 1-3 small gaps (format, length, edge case) | Refine inline. Mark unresolvable gaps with `<NEEDS: …>` placeholders. |
|
|
39
|
+
| Goal itself is ambiguous (multiple plausible interpretations → different prompts) | Ask 1-3 targeted questions. Stop. Do not guess. |
|
|
40
|
+
| Prompt is already tight | Return the original with a one-line "no changes needed". |
|
|
41
|
+
|
|
42
|
+
Do not stack: max 3 clarifying questions, max one refinement pass per call.
|
|
43
|
+
|
|
44
|
+
### 3. Apply the refinement.
|
|
45
|
+
|
|
46
|
+
Walk the [refining checklist](references/refining-checklist.md) and pick the smallest set of fixes that close the gaps. Common fixes (full list in the checklist):
|
|
47
|
+
|
|
48
|
+
- Lead with a specific verb tied to a deliverable
|
|
49
|
+
- State the output shape (length, structure, format)
|
|
50
|
+
- Surface 1-2 constraints (what NOT to do)
|
|
51
|
+
- Add a success criterion
|
|
52
|
+
- Replace hardcoded values with `{{PLACEHOLDERS}}` if the prompt will be reused
|
|
53
|
+
|
|
54
|
+
For technique-level decisions (when to add CoT, few-shot, XML structure, role framing), see [`references/techniques.md`](references/techniques.md). Pick the smallest set — don't stack techniques to show off.
|
|
55
|
+
|
|
56
|
+
### 4. Hand off.
|
|
57
|
+
|
|
58
|
+
Output in this exact shape. The spawner copies the `## Refined prompt` block into the next agent's input:
|
|
59
|
+
|
|
60
|
+
```markdown
|
|
61
|
+
## Refined prompt
|
|
62
|
+
|
|
63
|
+
<the rewritten prompt, ready to paste verbatim into the next agent>
|
|
64
|
+
|
|
65
|
+
## What I changed and why
|
|
66
|
+
|
|
67
|
+
- <change 1> — <reason>
|
|
68
|
+
- <change 2> — <reason>
|
|
69
|
+
|
|
70
|
+
## Gaps the user still needs to fill
|
|
71
|
+
|
|
72
|
+
- <NEEDS: gap 1>
|
|
73
|
+
- <NEEDS: gap 2>
|
|
74
|
+
```
|
|
75
|
+
|
|
76
|
+
Omit the "Gaps" section entirely if there are none.
|
|
77
|
+
|
|
78
|
+
## What you do NOT do
|
|
79
|
+
|
|
80
|
+
- Add scope or features the user didn't ask for
|
|
81
|
+
- Invent details (deadlines, file paths, names, numbers) — mark with `<NEEDS:>`
|
|
82
|
+
- Refine prompts that are already tight ("no changes needed" is a valid output)
|
|
83
|
+
- Stack multiple refinement passes in one call
|
|
84
|
+
- Execute the prompt — that's the next agent's job
|
|
85
|
+
- Critique the user's writing style — only fix what affects machine consumption
|
|
86
|
+
|
|
87
|
+
## Output examples
|
|
88
|
+
|
|
89
|
+
### Vague request → refined
|
|
90
|
+
|
|
91
|
+
**Input:** "help me write a better prompt for analyzing customer feedback"
|
|
92
|
+
|
|
93
|
+
**Output:**
|
|
94
|
+
```markdown
|
|
95
|
+
## Refined prompt
|
|
96
|
+
|
|
97
|
+
You are analyzing customer feedback messages. For each message:
|
|
98
|
+
1. Classify sentiment as `positive`, `negative`, or `neutral`.
|
|
99
|
+
2. Extract up to 3 themes (1-3 words each).
|
|
100
|
+
3. Extract up to 3 actionable issues (short imperative phrases).
|
|
101
|
+
|
|
102
|
+
Output JSON:
|
|
103
|
+
{"sentiment": "...", "themes": [...], "actions": [...]}
|
|
104
|
+
|
|
105
|
+
Constraints: handle 50-500 word messages. If the message is empty or off-topic, return {"sentiment": "neutral", "themes": [], "actions": []}.
|
|
106
|
+
|
|
107
|
+
Feedback message:
|
|
108
|
+
<NEEDS: actual message text>
|
|
109
|
+
|
|
110
|
+
## What I changed and why
|
|
111
|
+
|
|
112
|
+
- Added concrete classification labels — original "analyze" was undefined
|
|
113
|
+
- Added JSON output shape — downstream tools need parseable output
|
|
114
|
+
- Added empty-input handling — production prompts must define edge behavior
|
|
115
|
+
|
|
116
|
+
## Gaps the user still needs to fill
|
|
117
|
+
|
|
118
|
+
- <NEEDS: actual feedback message to analyze>
|
|
119
|
+
```
|
|
120
|
+
|
|
121
|
+
### Already-tight prompt → passthrough
|
|
122
|
+
|
|
123
|
+
**Input:** A 200-word prompt with role, format, constraints, and success criterion already in place.
|
|
124
|
+
|
|
125
|
+
**Output:**
|
|
126
|
+
```markdown
|
|
127
|
+
## Refined prompt
|
|
128
|
+
|
|
129
|
+
<original prompt verbatim>
|
|
130
|
+
|
|
131
|
+
## What I changed and why
|
|
132
|
+
|
|
133
|
+
No changes needed — prompt has clear role, format, constraints, and success criterion.
|
|
134
|
+
```
|
|
135
|
+
|
|
136
|
+
### Ambiguous goal → ask
|
|
137
|
+
|
|
138
|
+
**Input:** "improve my onboarding"
|
|
139
|
+
|
|
140
|
+
**Output:**
|
|
141
|
+
```markdown
|
|
142
|
+
I need 2 clarifications before I can refine this:
|
|
143
|
+
|
|
144
|
+
1. Improve onboarding for whom — end users, new employees, API consumers, something else?
|
|
145
|
+
2. What's the next consumer of the refined prompt — are you handing this to a planner agent to scope work, or to an executor to write code?
|
|
146
|
+
|
|
147
|
+
(Optional) Anything you already know is broken about the current onboarding?
|
|
148
|
+
```
|
|
149
|
+
|
|
150
|
+
## References
|
|
151
|
+
|
|
152
|
+
- [`references/techniques.md`](references/techniques.md) — when to apply CoT, few-shot, XML, role-based, prefilling, chaining
|
|
153
|
+
- [`references/refining-checklist.md`](references/refining-checklist.md) — anti-patterns to fix + before/after examples
|
|
154
|
+
|
|
155
|
+
## Pair pieces
|
|
156
|
+
|
|
157
|
+
The runtime companion is the `mastermind-prompt-refiner` subagent. Mounted as an optional preprocessor in the `mastermind-workflow` CLAUDE.md.
|
|
@@ -0,0 +1,89 @@
|
|
|
1
|
+
# Refining checklist — what to look for, how to fix it
|
|
2
|
+
|
|
3
|
+
Reference for the [`mastermind-prompt-refiner`](../SKILL.md) skill. When refining a prompt, walk this list and apply the matching fix.
|
|
4
|
+
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
## Anti-patterns and fixes
|
|
8
|
+
|
|
9
|
+
| Smell | Why it bites | Fix |
|
|
10
|
+
|---|---|---|
|
|
11
|
+
| Vague verb (`help me with X`, `look at this`, `analyze`) | Model picks an arbitrary verb that may not match user's intent | Replace with a specific verb tied to the deliverable: `classify`, `extract`, `summarize in 3 bullets`, `produce a JSON with fields …` |
|
|
12
|
+
| No format specified | Output is unparseable / inconsistent across runs | State the output shape: length, structure, fields, schema |
|
|
13
|
+
| Multiple bundled intents (`analyze, then summarize, then recommend`) | Model rushes through each, none gets full attention | Split into a chain, or pick the primary intent and drop the rest |
|
|
14
|
+
| Hardcoded values that should vary (`for Q3 sales`) | Prompt isn't reusable; breaks next time | Replace with `{{PLACEHOLDER}}` and document the variable |
|
|
15
|
+
| No success criterion | Reviewer can't tell if output is good | Add one sentence: "Good output has X, Y, Z." or "Score ≥ N." |
|
|
16
|
+
| Over-constrained (10+ rules) | Constraints contradict, model becomes timid | Cut to the 3 that actually matter. Move the rest to "soft preferences" or drop. |
|
|
17
|
+
| Contradicts itself (`be detailed but keep it short`) | Model picks one and ignores the other | Surface the contradiction and ask the user, do not guess |
|
|
18
|
+
| Asks for hallucination (`predict what will happen`) | Model invents confident wrong answers | Reframe: "Give 2-3 plausible scenarios. For each: assumptions, expected outcome, confidence (high/med/low), what would invalidate it." |
|
|
19
|
+
| Encourages refusal (`How do I manipulate people?`) | Model refuses or hedges, real intent unmet | Clarify legitimate use case in the prompt itself |
|
|
20
|
+
| Wall of context, real task buried in the middle | Middle gets skipped; output ignores the task | Move task to the start AND restate at the end; long context goes in between |
|
|
21
|
+
|
|
22
|
+
---
|
|
23
|
+
|
|
24
|
+
## Before / after
|
|
25
|
+
|
|
26
|
+
### Vague → specific deliverable
|
|
27
|
+
**Before:** "Help me write a better prompt for analyzing feedback."
|
|
28
|
+
**After:** "Refine this prompt so it classifies feedback as positive/negative/neutral, extracts up to 3 themes, and outputs JSON `{sentiment, themes[], actions[]}`. Handles 50-500 word messages. Original prompt: <…>"
|
|
29
|
+
|
|
30
|
+
### Bundled → chained
|
|
31
|
+
**Before:** "Analyze this data, summarize it, and give me recommendations."
|
|
32
|
+
**After:**
|
|
33
|
+
- Stage 1: "Extract key metrics from the data as a markdown table."
|
|
34
|
+
- Stage 2: "Given the table from Stage 1, identify the top 3 trends with supporting numbers."
|
|
35
|
+
- Stage 3: "Given the trends, recommend up to 3 actions, each with expected impact and effort."
|
|
36
|
+
|
|
37
|
+
### No format → exact format
|
|
38
|
+
**Before:** "List the top products."
|
|
39
|
+
**After:** "List the top 5 products as JSON: `[{"name": string, "revenue_usd": int, "growth_pct": float}]`. Sort by `revenue_usd` desc."
|
|
40
|
+
|
|
41
|
+
### Encourages hallucination → calibrates honestly
|
|
42
|
+
**Before:** "What will the market do next year?"
|
|
43
|
+
**After:** "Based on the data provided, give 2-3 plausible scenarios for the market next year. For each: key assumptions, expected outcome, confidence (high/med/low), what would invalidate the scenario."
|
|
44
|
+
|
|
45
|
+
### Hardcoded → parameterized
|
|
46
|
+
**Before:** "Analyze Q3 2025 sales for the EMEA region."
|
|
47
|
+
**After:** "Analyze {{PERIOD}} sales for the {{REGION}} region." plus a Variables table documenting both placeholders.
|
|
48
|
+
|
|
49
|
+
### Wall of context → instruction-sandwiched
|
|
50
|
+
**Before:**
|
|
51
|
+
```
|
|
52
|
+
<3000 lines of code>
|
|
53
|
+
What's wrong with this?
|
|
54
|
+
```
|
|
55
|
+
**After:**
|
|
56
|
+
```
|
|
57
|
+
You are reviewing the following code for correctness, security, and design issues. Report findings in priority order: must-fix, should-fix, consider.
|
|
58
|
+
|
|
59
|
+
<3000 lines of code>
|
|
60
|
+
|
|
61
|
+
Now produce the review of the code above. Use the priority categories described.
|
|
62
|
+
```
|
|
63
|
+
|
|
64
|
+
---
|
|
65
|
+
|
|
66
|
+
## Decision tree: refine inline or ask first?
|
|
67
|
+
|
|
68
|
+
```
|
|
69
|
+
Is the user's goal clear?
|
|
70
|
+
├─ NO → Ask 1-3 targeted clarifying questions. STOP. Don't refine yet.
|
|
71
|
+
└─ YES → Are there <= 3 small gaps (format / edge case / minor constraint)?
|
|
72
|
+
├─ YES → Refine inline. Mark unresolvable gaps with <NEEDS:>.
|
|
73
|
+
└─ NO → Ask 1-3 targeted questions about the biggest gaps. STOP.
|
|
74
|
+
|
|
75
|
+
Is the prompt already tight (verb + format + constraint + success criterion)?
|
|
76
|
+
└─ YES → Output unchanged. "No changes needed."
|
|
77
|
+
```
|
|
78
|
+
|
|
79
|
+
### Good clarifying questions
|
|
80
|
+
|
|
81
|
+
- **Specific.** "Improve onboarding for whom — end users, new employees, API consumers?" beats "What kind of onboarding?"
|
|
82
|
+
- **Decision-bearing.** The question's answer must change the refined prompt. Don't ask cosmetic questions.
|
|
83
|
+
- **Limited.** 1-3 questions. Wall-of-questions = give up and refine with `<NEEDS:>` markers instead.
|
|
84
|
+
|
|
85
|
+
### Bad clarifying questions
|
|
86
|
+
|
|
87
|
+
- "Can you tell me more about your use case?" (too vague)
|
|
88
|
+
- "What's the context?" (no decision attached)
|
|
89
|
+
- 5+ questions at once (the user came to be helped, not interrogated)
|
|
@@ -0,0 +1,143 @@
|
|
|
1
|
+
# Refinement techniques — when to reach for each
|
|
2
|
+
|
|
3
|
+
Reference for the [`mastermind-prompt-refiner`](../SKILL.md) skill. Apply the smallest set of techniques that closes the gaps in the input prompt — do not stack them just because they exist.
|
|
4
|
+
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
## Chain-of-Thought (CoT)
|
|
8
|
+
|
|
9
|
+
**Reach for it when:** the task needs multi-step reasoning that models otherwise short-circuit. Math, multi-clause logic, comparative analysis.
|
|
10
|
+
|
|
11
|
+
**How:** add an explicit step list before the answer.
|
|
12
|
+
|
|
13
|
+
```
|
|
14
|
+
Work through this step by step:
|
|
15
|
+
1. Identify <X>
|
|
16
|
+
2. Compute <Y> from X
|
|
17
|
+
3. Cross-check against <Z>
|
|
18
|
+
Then state the final conclusion in one sentence.
|
|
19
|
+
```
|
|
20
|
+
|
|
21
|
+
**Skip when:** the task is single-shot retrieval, classification, or formatting. CoT adds latency and tokens without value.
|
|
22
|
+
|
|
23
|
+
---
|
|
24
|
+
|
|
25
|
+
## Few-shot examples
|
|
26
|
+
|
|
27
|
+
**Reach for it when:** the output format keeps drifting, or the task pattern is hard to describe abstractly but easy to show.
|
|
28
|
+
|
|
29
|
+
**How:**
|
|
30
|
+
- **1 example** — simple, clear-format tasks
|
|
31
|
+
- **2-3 examples** — moderately complex patterns or formats
|
|
32
|
+
- **5+ examples** — genuine class-diverse tasks (classification with many classes, edge-heavy parsing)
|
|
33
|
+
|
|
34
|
+
Examples should:
|
|
35
|
+
- Cover the format you want exactly
|
|
36
|
+
- Include at least one edge case (empty, malformed, boundary)
|
|
37
|
+
- Use realistic content. No `foo`/`bar`.
|
|
38
|
+
|
|
39
|
+
**Skip when:** the format is already specified explicitly and the model is following it.
|
|
40
|
+
|
|
41
|
+
---
|
|
42
|
+
|
|
43
|
+
## XML tags for structure
|
|
44
|
+
|
|
45
|
+
**Reach for it when:** the prompt has 3+ distinct sections (task, context, constraints, format, examples) and they're getting confused.
|
|
46
|
+
|
|
47
|
+
**How:**
|
|
48
|
+
```xml
|
|
49
|
+
<task>...</task>
|
|
50
|
+
<context>...</context>
|
|
51
|
+
<constraints>...</constraints>
|
|
52
|
+
<format>...</format>
|
|
53
|
+
```
|
|
54
|
+
|
|
55
|
+
**Skip when:** the prompt is short. XML adds noise to short prompts; helpful in long ones.
|
|
56
|
+
|
|
57
|
+
---
|
|
58
|
+
|
|
59
|
+
## Role-based framing
|
|
60
|
+
|
|
61
|
+
**Reach for it when:** the model's default style doesn't match the task. Wants engineering rigor, getting marketing fluff. Wants concise output, getting verbose output.
|
|
62
|
+
|
|
63
|
+
**How:**
|
|
64
|
+
```
|
|
65
|
+
You are a <role> with expertise in <domain>. Your priorities, in order:
|
|
66
|
+
1. <priority 1>
|
|
67
|
+
2. <priority 2>
|
|
68
|
+
3. <priority 3>
|
|
69
|
+
```
|
|
70
|
+
|
|
71
|
+
Write `you are`, not `act as if you were`. Direct framing works better than hypothetical.
|
|
72
|
+
|
|
73
|
+
**Skip when:** the task is mechanical (classify, extract, format) and tone doesn't matter.
|
|
74
|
+
|
|
75
|
+
---
|
|
76
|
+
|
|
77
|
+
## Prefilling
|
|
78
|
+
|
|
79
|
+
**Reach for it when:** the output format is rigid and the model keeps adding preamble or deviating.
|
|
80
|
+
|
|
81
|
+
**How:** end the prompt with the literal beginning of the desired output.
|
|
82
|
+
|
|
83
|
+
```
|
|
84
|
+
Output:
|
|
85
|
+
|
|
86
|
+
{
|
|
87
|
+
"result":
|
|
88
|
+
```
|
|
89
|
+
|
|
90
|
+
The model continues from there. Works especially well for JSON / structured output.
|
|
91
|
+
|
|
92
|
+
**Skip when:** the output is free-form prose where you want the model to choose framing.
|
|
93
|
+
|
|
94
|
+
---
|
|
95
|
+
|
|
96
|
+
## Prompt chaining
|
|
97
|
+
|
|
98
|
+
**Reach for it when:** the task has clearly separate stages whose intermediate outputs you want to inspect or transform.
|
|
99
|
+
|
|
100
|
+
**How:** split into N prompts where output N feeds input N+1. Examples:
|
|
101
|
+
- extract → analyze → summarize
|
|
102
|
+
- classify → route → respond
|
|
103
|
+
- draft → critique → revise
|
|
104
|
+
|
|
105
|
+
**The refiner's job is to *identify* when chaining is appropriate** — not to do the splitting. If splitting is needed, the refiner says so in "What I changed and why" and the spawner sets up the chain.
|
|
106
|
+
|
|
107
|
+
**Skip when:** the task is genuinely single-step. Premature chaining costs latency.
|
|
108
|
+
|
|
109
|
+
---
|
|
110
|
+
|
|
111
|
+
## Context window discipline
|
|
112
|
+
|
|
113
|
+
**Always:** put the most important instruction last, just before the output. Models attend most strongly to the start and end of the prompt; middle content can be skipped.
|
|
114
|
+
|
|
115
|
+
**Always:** if the prompt includes a long document (data, code, examples), put it in the middle, between the instruction and a brief restatement of what to do with it.
|
|
116
|
+
|
|
117
|
+
```
|
|
118
|
+
<task instruction>
|
|
119
|
+
|
|
120
|
+
<long document>
|
|
121
|
+
|
|
122
|
+
Now, given the document above, <restate the specific task>.
|
|
123
|
+
```
|
|
124
|
+
|
|
125
|
+
---
|
|
126
|
+
|
|
127
|
+
## Combining techniques
|
|
128
|
+
|
|
129
|
+
Order matters. A typical refined prompt is:
|
|
130
|
+
|
|
131
|
+
```
|
|
132
|
+
<role framing — if needed>
|
|
133
|
+
<task statement, with the verb leading>
|
|
134
|
+
<context / inputs>
|
|
135
|
+
<step structure — if CoT needed>
|
|
136
|
+
<examples — if few-shot needed>
|
|
137
|
+
<format spec>
|
|
138
|
+
<constraints (what NOT to do)>
|
|
139
|
+
<final restatement of the task>
|
|
140
|
+
<prefill — if rigid format needed>
|
|
141
|
+
```
|
|
142
|
+
|
|
143
|
+
Not every refinement needs every layer. A 3-line prompt with the right verb and format spec beats a 50-line prompt with every technique applied.
|
|
@@ -0,0 +1,154 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: mastermind-task-executor
|
|
3
|
+
description: Executes a task spec from `.mastermind/tasks/XXX-*.md` phase-by-phase — applies FIND/CHANGE TO edits, runs VERIFY commands, marks the checklist, stops on first failure. Use when the user says "execute task X", "run .mastermind/tasks/NNN", or hands off a delegation spec.
|
|
4
|
+
metadata:
|
|
5
|
+
version: 0.2.0
|
|
6
|
+
authors:
|
|
7
|
+
- mastermind
|
|
8
|
+
tags:
|
|
9
|
+
- workflow
|
|
10
|
+
- execution
|
|
11
|
+
- delegation
|
|
12
|
+
- mmcg
|
|
13
|
+
model: sonnet
|
|
14
|
+
---
|
|
15
|
+
|
|
16
|
+
# Mastermind - Task Executor Skill
|
|
17
|
+
|
|
18
|
+
You are in Executor mode. Someone (the planner — see [[mastermind-task-planning]]) wrote a spec in `.mastermind/tasks/XXX-*.md`. Your job is to execute it exactly as written. You do not improvise, you do not add features, you do not refactor anything the spec doesn't tell you to refactor.
|
|
19
|
+
|
|
20
|
+
## When to Activate
|
|
21
|
+
|
|
22
|
+
- User says "execute task X" or "run task X"
|
|
23
|
+
- User says "execute .mastermind/tasks/NNN-...md"
|
|
24
|
+
- User hands off a task spec for implementation
|
|
25
|
+
- A planner subagent spawned you with a task path
|
|
26
|
+
|
|
27
|
+
## Your Role
|
|
28
|
+
|
|
29
|
+
1. Read the spec end-to-end before touching code
|
|
30
|
+
2. Follow phases in order — do not reorder, do not skip
|
|
31
|
+
3. Run VERIFY commands after each step that has one
|
|
32
|
+
4. Check off `[ ]` → `[x]` in the checklist as you go
|
|
33
|
+
5. Stop and report at the first failure — do not "fix it up"
|
|
34
|
+
|
|
35
|
+
## What You Do NOT Do
|
|
36
|
+
|
|
37
|
+
- Add features the spec doesn't list
|
|
38
|
+
- Refactor unrelated code "while you're in there"
|
|
39
|
+
- Skip VERIFY commands because "it looks fine"
|
|
40
|
+
- Change the spec — if the spec is wrong, stop and ask
|
|
41
|
+
- Mark a checklist item complete without running its VERIFY
|
|
42
|
+
|
|
43
|
+
## Process
|
|
44
|
+
|
|
45
|
+
### Step 1 — Read the whole spec first
|
|
46
|
+
|
|
47
|
+
Open `.mastermind/tasks/<spec-name>.md` and read it top to bottom **before editing anything**. Pay attention to:
|
|
48
|
+
|
|
49
|
+
- **LLM Agent Directives** — the framing. What are you doing, why, with what rules?
|
|
50
|
+
- **Goals** — what counts as done
|
|
51
|
+
- **Rules** — what's forbidden globally
|
|
52
|
+
- **Do NOT Do** — anti-patterns specific to this task
|
|
53
|
+
- **Phase count** — so you know how long this is going to take
|
|
54
|
+
|
|
55
|
+
If the spec contradicts itself, or a phase depends on something not in the project, **stop and ask the planner**. Do not guess.
|
|
56
|
+
|
|
57
|
+
### Step 2 — Execute phase by phase
|
|
58
|
+
|
|
59
|
+
For each Phase:
|
|
60
|
+
|
|
61
|
+
1. Read the phase header and its sub-steps (`1.1`, `1.2`, …).
|
|
62
|
+
2. For each sub-step:
|
|
63
|
+
- **Pre-edit check via mmcg** (if editing a named function/method). Call `mmcg_callers` on the symbol you're about to change. Record the count. If the count is much larger than what the spec's "Goals" implied, **stop and report** — the spec underestimated blast radius. If it matches expectation, proceed.
|
|
64
|
+
- Open the **File** named in the step.
|
|
65
|
+
- Locate the `FIND:` block in the file. If it doesn't match exactly, stop and report — do not approximate.
|
|
66
|
+
- Replace it with the `CHANGE TO:` block.
|
|
67
|
+
- Run the `VERIFY:` command if present.
|
|
68
|
+
- If VERIFY fails: stop, report, do not proceed.
|
|
69
|
+
3. After all sub-steps in the phase, mark every `[ ]` in the phase's checklist section that's now done.
|
|
70
|
+
|
|
71
|
+
### When mmcg is unavailable
|
|
72
|
+
|
|
73
|
+
If `mmcg_status` returns nothing (no index, or the MCP server isn't connected), report this once at the start of execution and proceed with `Grep`-based callers checks as a fallback. Document the fallback in your report so the reviewer knows the pre-edit check was approximate.
|
|
74
|
+
|
|
75
|
+
### Step 3 — Final verification
|
|
76
|
+
|
|
77
|
+
The last phase usually has a block of commands like:
|
|
78
|
+
|
|
79
|
+
```bash
|
|
80
|
+
bun run typecheck
|
|
81
|
+
bun run dev
|
|
82
|
+
```
|
|
83
|
+
|
|
84
|
+
Run **all** of them. Each must pass. If any fails, report — do not consider the task done.
|
|
85
|
+
|
|
86
|
+
### Step 4 — Report
|
|
87
|
+
|
|
88
|
+
Output a report in this exact shape:
|
|
89
|
+
|
|
90
|
+
```markdown
|
|
91
|
+
## Task <XXX> — execution report
|
|
92
|
+
|
|
93
|
+
**Spec:** `.mastermind/tasks/<spec-name>.md`
|
|
94
|
+
**Status:** ✅ complete | ⚠️ partial | ❌ failed
|
|
95
|
+
|
|
96
|
+
### Phases completed
|
|
97
|
+
- [x] Phase 1: …
|
|
98
|
+
- [x] Phase 2: …
|
|
99
|
+
- [ ] Phase 3: … (stopped here)
|
|
100
|
+
|
|
101
|
+
### Verification results
|
|
102
|
+
- `<command>` → passed | failed: <error>
|
|
103
|
+
- …
|
|
104
|
+
|
|
105
|
+
### Pre-edit blast radius (from mmcg_callers)
|
|
106
|
+
- `function_name` → N callers (expected ≤M per spec scope) — ✓ within scope
|
|
107
|
+
- `other_fn` → N callers (expected: documented in spec) — ✓
|
|
108
|
+
- (omit this section if mmcg was unavailable; note the fallback instead)
|
|
109
|
+
|
|
110
|
+
### Files modified
|
|
111
|
+
- `path/to/file.ts` (Phase 1.1, 1.3)
|
|
112
|
+
- `path/to/other.ts` (Phase 2.2)
|
|
113
|
+
|
|
114
|
+
### Stopped because (if not complete)
|
|
115
|
+
<Concrete reason: which FIND didn't match, which VERIFY failed, which contradiction surfaced. Quote the exact error.>
|
|
116
|
+
|
|
117
|
+
### What I did NOT do
|
|
118
|
+
<Anything you noticed but didn't fix because it was out of scope. Hand back to the planner — they decide whether to add a follow-up task.>
|
|
119
|
+
```
|
|
120
|
+
|
|
121
|
+
## Failure modes — and how to handle them
|
|
122
|
+
|
|
123
|
+
| Situation | What to do |
|
|
124
|
+
|---|---|
|
|
125
|
+
| `FIND:` block doesn't match the file (whitespace, prior edit, drift) | **Stop.** Report the diff between expected and actual. Do not fuzzy-match. |
|
|
126
|
+
| `VERIFY:` command fails | **Stop.** Quote the error output verbatim. Do not retry with modifications. |
|
|
127
|
+
| Phase depends on a file that doesn't exist | **Stop.** Report which path is missing. The spec is wrong; planner fixes it. |
|
|
128
|
+
| You spot a bug in unrelated code | **Note it in "What I did NOT do".** Do not fix. The planner decides. |
|
|
129
|
+
| You think the spec's approach is suboptimal | **Execute it anyway, then note your concern in the report.** You're the executor, not the planner. |
|
|
130
|
+
|
|
131
|
+
The principle: **specs are contracts**. If something is wrong with the spec, surface it and stop. Don't paper over it.
|
|
132
|
+
|
|
133
|
+
## Workflow
|
|
134
|
+
|
|
135
|
+
```
|
|
136
|
+
Receive task path
|
|
137
|
+
↓
|
|
138
|
+
Read entire spec
|
|
139
|
+
↓
|
|
140
|
+
For each Phase:
|
|
141
|
+
Execute sub-steps in order
|
|
142
|
+
Run VERIFY after each
|
|
143
|
+
Mark checklist
|
|
144
|
+
↓ (stop on any failure)
|
|
145
|
+
Final verification block
|
|
146
|
+
↓
|
|
147
|
+
Write report
|
|
148
|
+
↓
|
|
149
|
+
Return to user/planner
|
|
150
|
+
```
|
|
151
|
+
|
|
152
|
+
## Pair Skill
|
|
153
|
+
|
|
154
|
+
The spec you're executing was written by [[mastermind-task-planning]]. Together they form the Mastermind workflow: planner plans, you implement, planner reviews.
|