claude-agent-skills 1.3.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +65 -0
- package/bundled-skills/ask-matt/SKILL.md +61 -0
- package/bundled-skills/brainstorming/SKILL.md +159 -0
- package/bundled-skills/brainstorming/scripts/frame-template.html +213 -0
- package/bundled-skills/brainstorming/scripts/helper.js +167 -0
- package/bundled-skills/brainstorming/scripts/server.cjs +723 -0
- package/bundled-skills/brainstorming/scripts/start-server.sh +209 -0
- package/bundled-skills/brainstorming/scripts/stop-server.sh +120 -0
- package/bundled-skills/brainstorming/spec-document-reviewer-prompt.md +49 -0
- package/bundled-skills/brainstorming/visual-companion.md +298 -0
- package/bundled-skills/cavecrew/README.md +41 -0
- package/bundled-skills/cavecrew/SKILL.md +82 -0
- package/bundled-skills/caveman/README.md +48 -0
- package/bundled-skills/caveman/SKILL.md +78 -0
- package/bundled-skills/caveman-commit/README.md +44 -0
- package/bundled-skills/caveman-commit/SKILL.md +65 -0
- package/bundled-skills/caveman-compress/README.md +163 -0
- package/bundled-skills/caveman-compress/SECURITY.md +31 -0
- package/bundled-skills/caveman-compress/SKILL.md +111 -0
- package/bundled-skills/caveman-compress/scripts/__init__.py +9 -0
- package/bundled-skills/caveman-compress/scripts/__main__.py +3 -0
- package/bundled-skills/caveman-compress/scripts/benchmark.py +80 -0
- package/bundled-skills/caveman-compress/scripts/cli.py +85 -0
- package/bundled-skills/caveman-compress/scripts/compress.py +342 -0
- package/bundled-skills/caveman-compress/scripts/detect.py +121 -0
- package/bundled-skills/caveman-compress/scripts/validate.py +213 -0
- package/bundled-skills/caveman-help/README.md +38 -0
- package/bundled-skills/caveman-help/SKILL.md +63 -0
- package/bundled-skills/caveman-review/README.md +33 -0
- package/bundled-skills/caveman-review/SKILL.md +55 -0
- package/bundled-skills/caveman-stats/README.md +30 -0
- package/bundled-skills/caveman-stats/SKILL.md +10 -0
- package/bundled-skills/codebase-design/DEEPENING.md +37 -0
- package/bundled-skills/codebase-design/DESIGN-IT-TWICE.md +44 -0
- package/bundled-skills/codebase-design/SKILL.md +114 -0
- package/bundled-skills/council/SKILL.md +77 -0
- package/bundled-skills/diagnosing-bugs/SKILL.md +134 -0
- package/bundled-skills/diagnosing-bugs/scripts/hitl-loop.template.sh +41 -0
- package/bundled-skills/dispatching-parallel-agents/SKILL.md +185 -0
- package/bundled-skills/domain-modeling/ADR-FORMAT.md +47 -0
- package/bundled-skills/domain-modeling/CONTEXT-FORMAT.md +60 -0
- package/bundled-skills/domain-modeling/SKILL.md +74 -0
- package/bundled-skills/edit-article/SKILL.md +15 -0
- package/bundled-skills/executing-plans/SKILL.md +70 -0
- package/bundled-skills/finishing-a-development-branch/SKILL.md +241 -0
- package/bundled-skills/git-guardrails-claude-code/SKILL.md +95 -0
- package/bundled-skills/git-guardrails-claude-code/scripts/block-dangerous-git.sh +25 -0
- package/bundled-skills/grill-me/SKILL.md +7 -0
- package/bundled-skills/grill-with-docs/SKILL.md +7 -0
- package/bundled-skills/grilling/SKILL.md +10 -0
- package/bundled-skills/handoff/SKILL.md +16 -0
- package/bundled-skills/i-am-dumb/SKILL.md +57 -0
- package/bundled-skills/implement/SKILL.md +15 -0
- package/bundled-skills/improve-codebase-architecture/HTML-REPORT.md +123 -0
- package/bundled-skills/improve-codebase-architecture/SKILL.md +66 -0
- package/bundled-skills/migrate-to-shoehorn/SKILL.md +118 -0
- package/bundled-skills/obsidian-vault/SKILL.md +59 -0
- package/bundled-skills/ponytail/SKILL.md +117 -0
- package/bundled-skills/ponytail-audit/SKILL.md +50 -0
- package/bundled-skills/ponytail-debt/SKILL.md +59 -0
- package/bundled-skills/ponytail-gain/SKILL.md +51 -0
- package/bundled-skills/ponytail-help/SKILL.md +43 -0
- package/bundled-skills/ponytail-review/SKILL.md +51 -0
- package/bundled-skills/prototype/LOGIC.md +79 -0
- package/bundled-skills/prototype/SKILL.md +31 -0
- package/bundled-skills/prototype/UI.md +112 -0
- package/bundled-skills/receiving-code-review/SKILL.md +213 -0
- package/bundled-skills/requesting-code-review/SKILL.md +103 -0
- package/bundled-skills/requesting-code-review/code-reviewer.md +172 -0
- package/bundled-skills/resolving-merge-conflicts/SKILL.md +14 -0
- package/bundled-skills/scaffold-exercises/SKILL.md +106 -0
- package/bundled-skills/setup-matt-pocock-skills/SKILL.md +127 -0
- package/bundled-skills/setup-matt-pocock-skills/domain.md +51 -0
- package/bundled-skills/setup-matt-pocock-skills/issue-tracker-github.md +34 -0
- package/bundled-skills/setup-matt-pocock-skills/issue-tracker-gitlab.md +35 -0
- package/bundled-skills/setup-matt-pocock-skills/issue-tracker-local.md +19 -0
- package/bundled-skills/setup-matt-pocock-skills/triage-labels.md +15 -0
- package/bundled-skills/setup-pre-commit/SKILL.md +91 -0
- package/bundled-skills/subagent-driven-development/SKILL.md +418 -0
- package/bundled-skills/subagent-driven-development/implementer-prompt.md +139 -0
- package/bundled-skills/subagent-driven-development/scripts/review-package +44 -0
- package/bundled-skills/subagent-driven-development/scripts/sdd-workspace +22 -0
- package/bundled-skills/subagent-driven-development/scripts/task-brief +40 -0
- package/bundled-skills/subagent-driven-development/task-reviewer-prompt.md +188 -0
- package/bundled-skills/systematic-debugging/CREATION-LOG.md +119 -0
- package/bundled-skills/systematic-debugging/SKILL.md +296 -0
- package/bundled-skills/systematic-debugging/condition-based-waiting-example.ts +158 -0
- package/bundled-skills/systematic-debugging/condition-based-waiting.md +115 -0
- package/bundled-skills/systematic-debugging/defense-in-depth.md +122 -0
- package/bundled-skills/systematic-debugging/find-polluter.sh +63 -0
- package/bundled-skills/systematic-debugging/root-cause-tracing.md +169 -0
- package/bundled-skills/systematic-debugging/test-academic.md +14 -0
- package/bundled-skills/systematic-debugging/test-pressure-1.md +58 -0
- package/bundled-skills/systematic-debugging/test-pressure-2.md +68 -0
- package/bundled-skills/systematic-debugging/test-pressure-3.md +69 -0
- package/bundled-skills/tdd/SKILL.md +108 -0
- package/bundled-skills/tdd/mocking.md +59 -0
- package/bundled-skills/tdd/refactoring.md +10 -0
- package/bundled-skills/tdd/tests.md +61 -0
- package/bundled-skills/teach/GLOSSARY-FORMAT.md +35 -0
- package/bundled-skills/teach/LEARNING-RECORD-FORMAT.md +46 -0
- package/bundled-skills/teach/MISSION-FORMAT.md +31 -0
- package/bundled-skills/teach/RESOURCES-FORMAT.md +32 -0
- package/bundled-skills/teach/SKILL.md +140 -0
- package/bundled-skills/test-driven-development/SKILL.md +371 -0
- package/bundled-skills/test-driven-development/testing-anti-patterns.md +299 -0
- package/bundled-skills/to-issues/SKILL.md +84 -0
- package/bundled-skills/to-prd/SKILL.md +75 -0
- package/bundled-skills/triage/AGENT-BRIEF.md +207 -0
- package/bundled-skills/triage/OUT-OF-SCOPE.md +105 -0
- package/bundled-skills/triage/SKILL.md +112 -0
- package/bundled-skills/using-git-worktrees/SKILL.md +202 -0
- package/bundled-skills/using-superpowers/SKILL.md +121 -0
- package/bundled-skills/using-superpowers/references/antigravity-tools.md +96 -0
- package/bundled-skills/using-superpowers/references/claude-code-tools.md +50 -0
- package/bundled-skills/using-superpowers/references/codex-tools.md +72 -0
- package/bundled-skills/using-superpowers/references/copilot-tools.md +49 -0
- package/bundled-skills/using-superpowers/references/gemini-tools.md +63 -0
- package/bundled-skills/using-superpowers/references/pi-tools.md +28 -0
- package/bundled-skills/verification-before-completion/SKILL.md +139 -0
- package/bundled-skills/writing-great-skills/GLOSSARY.md +195 -0
- package/bundled-skills/writing-great-skills/SKILL.md +82 -0
- package/bundled-skills/writing-plans/SKILL.md +174 -0
- package/bundled-skills/writing-plans/plan-document-reviewer-prompt.md +49 -0
- package/bundled-skills/writing-skills/SKILL.md +689 -0
- package/bundled-skills/writing-skills/anthropic-best-practices.md +1150 -0
- package/bundled-skills/writing-skills/examples/CLAUDE_MD_TESTING.md +189 -0
- package/bundled-skills/writing-skills/graphviz-conventions.dot +172 -0
- package/bundled-skills/writing-skills/persuasion-principles.md +187 -0
- package/bundled-skills/writing-skills/render-graphs.js +168 -0
- package/bundled-skills/writing-skills/testing-skills-with-subagents.md +384 -0
- package/commands/add.js +97 -0
- package/commands/check.js +54 -0
- package/commands/exportSkills.js +30 -0
- package/commands/hub.js +52 -0
- package/commands/importSkills.js +68 -0
- package/commands/list.js +37 -0
- package/commands/remove.js +59 -0
- package/commands/sync.js +66 -0
- package/commands/update.js +70 -0
- package/index.js +100 -0
- package/lib/banner.js +108 -0
- package/lib/constants.js +10 -0
- package/lib/deps.js +51 -0
- package/lib/hash.js +26 -0
- package/lib/install.js +31 -0
- package/lib/lockfile.js +37 -0
- package/lib/prompts.js +50 -0
- package/lib/scope.js +19 -0
- package/lib/summary.js +108 -0
- package/lib/theme.js +11 -0
- package/package.json +43 -0
- package/skills.json +164 -0
|
@@ -0,0 +1,28 @@
|
|
|
1
|
+
# Pi Tool Mapping
|
|
2
|
+
|
|
3
|
+
Skills speak in actions ("dispatch a subagent", "create a todo", "read a file"). On Pi these resolve to the tools below.
|
|
4
|
+
|
|
5
|
+
| Action skills request | Pi equivalent |
|
|
6
|
+
| --- | --- |
|
|
7
|
+
| Invoke a skill | Pi native skills: load the relevant `SKILL.md` with `read`, or let the human use `/skill:name` |
|
|
8
|
+
| Read a file | `read` |
|
|
9
|
+
| Create a file | `write` |
|
|
10
|
+
| Edit a file | `edit` |
|
|
11
|
+
| Run a shell command | `bash` |
|
|
12
|
+
| Search file contents | `grep` when active; otherwise `bash` with `rg`/`grep` |
|
|
13
|
+
| Find files by name | `find` or `bash` with shell globs |
|
|
14
|
+
| List files and subdirectories | `ls` when active; otherwise `bash` with `ls` |
|
|
15
|
+
| Dispatch a subagent (`Subagent (general-purpose):` template) | Use an installed subagent tool such as `subagent` from `pi-subagents` if available |
|
|
16
|
+
| Task tracking ("create a todo", "mark complete") | Use an installed todo/task tool if available, otherwise track tasks in the plan or `TODO.md` |
|
|
17
|
+
|
|
18
|
+
## Skills
|
|
19
|
+
|
|
20
|
+
Pi discovers skills from configured skill directories and installed Pi packages. A Superpowers Pi package should expose `skills/` through its `pi.skills` manifest entry. Pi does not expose Claude Code's `Skill` tool, but the agent should still follow the Superpowers rule: when a skill applies, load and follow it before responding.
|
|
21
|
+
|
|
22
|
+
## Subagents
|
|
23
|
+
|
|
24
|
+
Pi core does not ship a standard subagent tool. The `pi-subagents` package is a strong optional companion and provides a `subagent` tool with single-agent, chain, parallel, async, forked-context, and resume/status workflows. If no subagent tool is available, do not fabricate `Task` calls; execute sequentially in the current session or explain that the optional subagent capability is not installed.
|
|
25
|
+
|
|
26
|
+
## Task lists
|
|
27
|
+
|
|
28
|
+
Pi core does not ship a standard task-list tool. If a todo/task extension is installed, use its documented tool. Otherwise use Superpowers plan files, checklists in Markdown, or a repo-local `TODO.md` for task tracking. Older Superpowers docs may refer to `TodoWrite`; treat that as the task-tracking action above.
|
|
@@ -0,0 +1,139 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: verification-before-completion
|
|
3
|
+
description: Use when about to claim work is complete, fixed, or passing, before committing or creating PRs - requires running verification commands and confirming output before making any success claims; evidence before assertions always
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# Verification Before Completion
|
|
7
|
+
|
|
8
|
+
## Overview
|
|
9
|
+
|
|
10
|
+
Claiming work is complete without verification is dishonesty, not efficiency.
|
|
11
|
+
|
|
12
|
+
**Core principle:** Evidence before claims, always.
|
|
13
|
+
|
|
14
|
+
**Violating the letter of this rule is violating the spirit of this rule.**
|
|
15
|
+
|
|
16
|
+
## The Iron Law
|
|
17
|
+
|
|
18
|
+
```
|
|
19
|
+
NO COMPLETION CLAIMS WITHOUT FRESH VERIFICATION EVIDENCE
|
|
20
|
+
```
|
|
21
|
+
|
|
22
|
+
If you haven't run the verification command in this message, you cannot claim it passes.
|
|
23
|
+
|
|
24
|
+
## The Gate Function
|
|
25
|
+
|
|
26
|
+
```
|
|
27
|
+
BEFORE claiming any status or expressing satisfaction:
|
|
28
|
+
|
|
29
|
+
1. IDENTIFY: What command proves this claim?
|
|
30
|
+
2. RUN: Execute the FULL command (fresh, complete)
|
|
31
|
+
3. READ: Full output, check exit code, count failures
|
|
32
|
+
4. VERIFY: Does output confirm the claim?
|
|
33
|
+
- If NO: State actual status with evidence
|
|
34
|
+
- If YES: State claim WITH evidence
|
|
35
|
+
5. ONLY THEN: Make the claim
|
|
36
|
+
|
|
37
|
+
Skip any step = lying, not verifying
|
|
38
|
+
```
|
|
39
|
+
|
|
40
|
+
## Common Failures
|
|
41
|
+
|
|
42
|
+
| Claim | Requires | Not Sufficient |
|
|
43
|
+
|-------|----------|----------------|
|
|
44
|
+
| Tests pass | Test command output: 0 failures | Previous run, "should pass" |
|
|
45
|
+
| Linter clean | Linter output: 0 errors | Partial check, extrapolation |
|
|
46
|
+
| Build succeeds | Build command: exit 0 | Linter passing, logs look good |
|
|
47
|
+
| Bug fixed | Test original symptom: passes | Code changed, assumed fixed |
|
|
48
|
+
| Regression test works | Red-green cycle verified | Test passes once |
|
|
49
|
+
| Agent completed | VCS diff shows changes | Agent reports "success" |
|
|
50
|
+
| Requirements met | Line-by-line checklist | Tests passing |
|
|
51
|
+
|
|
52
|
+
## Red Flags - STOP
|
|
53
|
+
|
|
54
|
+
- Using "should", "probably", "seems to"
|
|
55
|
+
- Expressing satisfaction before verification ("Great!", "Perfect!", "Done!", etc.)
|
|
56
|
+
- About to commit/push/PR without verification
|
|
57
|
+
- Trusting agent success reports
|
|
58
|
+
- Relying on partial verification
|
|
59
|
+
- Thinking "just this once"
|
|
60
|
+
- Tired and wanting work over
|
|
61
|
+
- **ANY wording implying success without having run verification**
|
|
62
|
+
|
|
63
|
+
## Rationalization Prevention
|
|
64
|
+
|
|
65
|
+
| Excuse | Reality |
|
|
66
|
+
|--------|---------|
|
|
67
|
+
| "Should work now" | RUN the verification |
|
|
68
|
+
| "I'm confident" | Confidence ≠ evidence |
|
|
69
|
+
| "Just this once" | No exceptions |
|
|
70
|
+
| "Linter passed" | Linter ≠ compiler |
|
|
71
|
+
| "Agent said success" | Verify independently |
|
|
72
|
+
| "I'm tired" | Exhaustion ≠ excuse |
|
|
73
|
+
| "Partial check is enough" | Partial proves nothing |
|
|
74
|
+
| "Different words so rule doesn't apply" | Spirit over letter |
|
|
75
|
+
|
|
76
|
+
## Key Patterns
|
|
77
|
+
|
|
78
|
+
**Tests:**
|
|
79
|
+
```
|
|
80
|
+
✅ [Run test command] [See: 34/34 pass] "All tests pass"
|
|
81
|
+
❌ "Should pass now" / "Looks correct"
|
|
82
|
+
```
|
|
83
|
+
|
|
84
|
+
**Regression tests (TDD Red-Green):**
|
|
85
|
+
```
|
|
86
|
+
✅ Write → Run (pass) → Revert fix → Run (MUST FAIL) → Restore → Run (pass)
|
|
87
|
+
❌ "I've written a regression test" (without red-green verification)
|
|
88
|
+
```
|
|
89
|
+
|
|
90
|
+
**Build:**
|
|
91
|
+
```
|
|
92
|
+
✅ [Run build] [See: exit 0] "Build passes"
|
|
93
|
+
❌ "Linter passed" (linter doesn't check compilation)
|
|
94
|
+
```
|
|
95
|
+
|
|
96
|
+
**Requirements:**
|
|
97
|
+
```
|
|
98
|
+
✅ Re-read plan → Create checklist → Verify each → Report gaps or completion
|
|
99
|
+
❌ "Tests pass, phase complete"
|
|
100
|
+
```
|
|
101
|
+
|
|
102
|
+
**Agent delegation:**
|
|
103
|
+
```
|
|
104
|
+
✅ Agent reports success → Check VCS diff → Verify changes → Report actual state
|
|
105
|
+
❌ Trust agent report
|
|
106
|
+
```
|
|
107
|
+
|
|
108
|
+
## Why This Matters
|
|
109
|
+
|
|
110
|
+
From 24 failure memories:
|
|
111
|
+
- your human partner said "I don't believe you" - trust broken
|
|
112
|
+
- Undefined functions shipped - would crash
|
|
113
|
+
- Missing requirements shipped - incomplete features
|
|
114
|
+
- Time wasted on false completion → redirect → rework
|
|
115
|
+
- Violates: "Honesty is a core value. If you lie, you'll be replaced."
|
|
116
|
+
|
|
117
|
+
## When To Apply
|
|
118
|
+
|
|
119
|
+
**ALWAYS before:**
|
|
120
|
+
- ANY variation of success/completion claims
|
|
121
|
+
- ANY expression of satisfaction
|
|
122
|
+
- ANY positive statement about work state
|
|
123
|
+
- Committing, PR creation, task completion
|
|
124
|
+
- Moving to next task
|
|
125
|
+
- Delegating to agents
|
|
126
|
+
|
|
127
|
+
**Rule applies to:**
|
|
128
|
+
- Exact phrases
|
|
129
|
+
- Paraphrases and synonyms
|
|
130
|
+
- Implications of success
|
|
131
|
+
- ANY communication suggesting completion/correctness
|
|
132
|
+
|
|
133
|
+
## The Bottom Line
|
|
134
|
+
|
|
135
|
+
**No shortcuts for verification.**
|
|
136
|
+
|
|
137
|
+
Run the command. Read the output. THEN claim the result.
|
|
138
|
+
|
|
139
|
+
This is non-negotiable.
|
|
@@ -0,0 +1,195 @@
|
|
|
1
|
+
# Glossary — Building Great Skills
|
|
2
|
+
|
|
3
|
+
The domain model for what makes a skill great. A skill exists to wrangle determinism out of a stochastic system; the root virtue is **Predictability**, and every term below is a lever on it. This is the disclosed reference for [`writing-great-skills`](SKILL.md).
|
|
4
|
+
|
|
5
|
+
The terms are grouped by axis: **Invocation** (how a skill is reached), **Information Hierarchy** (how its content is arranged), **Steering** (how the agent's runtime behaviour is shaped), and **Pruning** (how it is kept lean). Each **failure mode** lives beside the lever that cures it, tagged _failure mode_.
|
|
6
|
+
|
|
7
|
+
**Bold terms** in any definition are themselves defined in this glossary; find them by their heading.
|
|
8
|
+
|
|
9
|
+
## Predictability
|
|
10
|
+
|
|
11
|
+
The degree to which a skill makes the agent behave the same _way_ on every run — the same process, not the same output (a brainstorming skill should _predictably_ diverge; its tokens vary, its behaviour doesn't). The root virtue every other term serves — cost and maintainability are symptoms of it, not rivals.
|
|
12
|
+
|
|
13
|
+
_Avoid_: consistency, reliability, robustness, output-determinism
|
|
14
|
+
|
|
15
|
+
## Invocation
|
|
16
|
+
|
|
17
|
+
How a skill is reached — and the two loads you pay for the choice.
|
|
18
|
+
|
|
19
|
+
### Model-Invoked
|
|
20
|
+
|
|
21
|
+
A skill that keeps its **description** field, so the agent can see it and fire it autonomously — and the human can still type its name, so model-invocation always _includes_ user reach. There is no model-only state: a description only ever _adds_ agent discovery, never removes the human's. Pays a permanent **context load** on every turn in exchange for that discoverability. Reachable by other skills, because the description that makes it agent-discoverable makes it invocable. A model-invoked skill whose content is all **reference** is also one home for shared reference: another skill can invoke it, so reference needed by several skills lives in one place. Pick model-invocation only when the agent must reach the skill on its own; if it never fires except by hand, drop the description and pay no context load.
|
|
22
|
+
|
|
23
|
+
_Avoid_: ability, tool, capability
|
|
24
|
+
|
|
25
|
+
### User-Invoked
|
|
26
|
+
|
|
27
|
+
A skill with its **description** stripped — invisible to the agent and reachable only by the human typing its name (user-_only_, where **model-invoked** is user-_and-agent_). Trades agent-discoverability for zero **context load**. Because it has no description, nothing but the human can reach it: no other skill can fire it.
|
|
28
|
+
|
|
29
|
+
_Avoid_: procedure, workflow, command
|
|
30
|
+
|
|
31
|
+
### Description
|
|
32
|
+
|
|
33
|
+
The skill's machine-readable trigger, and the one **context pointer** a **model-invoked** skill is forced to keep loaded at all times. Its mere presence _is_ the invocation axis: keep it and the skill is model-invoked (and reachable by other skills); delete it and the skill is **user-invoked**, reachable only by the human. The source of a model-invoked skill's **context load**.
|
|
34
|
+
|
|
35
|
+
_Avoid_: frontmatter, summary
|
|
36
|
+
|
|
37
|
+
### Context Pointer
|
|
38
|
+
|
|
39
|
+
A reference held in the agent's context that names some out-of-context material and encodes the condition for reaching it. The **description** is the top-level context pointer (context window → skill); pointers to disclosed files are the same object one level down. Its wording, not the target, decides _when_ the agent reaches — and _how reliably_. A must-have target behind a weakly worded pointer is a variance bug: fix the wording first, and inline the material only if sharpening fails.
|
|
40
|
+
|
|
41
|
+
_Avoid_: link, reference, import
|
|
42
|
+
|
|
43
|
+
### Context Load
|
|
44
|
+
|
|
45
|
+
The cost a **model-invoked** skill imposes on the agent's context window — its **description**, always loaded, spending both tokens and attention. What **user-invoked** skills escape by having no description, and the brake on splitting into more model-invoked skills.
|
|
46
|
+
|
|
47
|
+
_Avoid_: token cost, context bloat
|
|
48
|
+
|
|
49
|
+
### Cognitive Load
|
|
50
|
+
|
|
51
|
+
The cost a **user-invoked** skill imposes on the human — what they must hold in their head: which skills exist and when to reach for each (the human is the index). What **model-invocation** removes by being agent-discoverable, and the brake on splitting into more user-invoked skills. Not a cost to minimise: it is the price of human agency, the reason some skills stay user-invoked. Spend it where human judgement matters; remove it where it does not.
|
|
52
|
+
|
|
53
|
+
_Avoid_: human index, burden, overhead
|
|
54
|
+
|
|
55
|
+
### Router Skill
|
|
56
|
+
|
|
57
|
+
A **user-invoked** skill whose job is to point at your other user-invoked skills — naming each and when to reach for it — so the human has one skill to remember instead of many. It can only hint, never fire them: user-invoked skills have no **description**, so nothing but the human can reach them. The cure for **cognitive load** when user-invoked skills multiply.
|
|
58
|
+
|
|
59
|
+
_Avoid_: dispatcher, menu, registry, index, router procedure
|
|
60
|
+
|
|
61
|
+
### Granularity
|
|
62
|
+
|
|
63
|
+
How finely you divide skills. Finer division spends one of the two loads: more **model-invoked** skills spend **context load** (more descriptions crowding the window and competing for attention); more **user-invoked** skills spend **cognitive load** (more for the human to remember and reach for). Two cuts guide the division. By **invocation**, split off a model-invoked skill where you have a distinct **leading word** to trigger it — a trigger word you actually use in your prompts. By **sequence**, split a run of **steps** where a step's **post-completion steps** need hiding, since isolating it in its own context clears what follows. Beware the reverse: merging sequences exposes each step's post-completion steps to what follows, inviting premature completion.
|
|
64
|
+
|
|
65
|
+
_Avoid_: chunking, modularity
|
|
66
|
+
|
|
67
|
+
## Information Hierarchy
|
|
68
|
+
|
|
69
|
+
How a skill's content is arranged, and how far down the ladder each piece sits.
|
|
70
|
+
|
|
71
|
+
### Information Hierarchy
|
|
72
|
+
|
|
73
|
+
A skill's content ranked by how immediately the agent needs it — a single ladder, produced by two cuts: in-file or behind a pointer, and step or reference. The rungs:
|
|
74
|
+
|
|
75
|
+
- **Steps** — in-file, primary
|
|
76
|
+
- **Reference**, in-file — secondary
|
|
77
|
+
- **Reference**, disclosed — behind a **context pointer**
|
|
78
|
+
|
|
79
|
+
A skill with no **steps** uses just the bottom two rungs — often a legitimately flat peer-set (e.g. every rule of a review on one rung), which is a fine arrangement, not a smell. The hierarchy is independent of invocation: a skill can be model- or user-invoked whether it is all steps, all reference, or both. When a skill has steps, in-file reference that should be disclosed buries them and turns attending to them into a coin-flip — a variance lever, not just a legibility one. Keep the top of the ladder legible; push down it whatever you can.
|
|
80
|
+
|
|
81
|
+
_Avoid_: structure, organization, layout
|
|
82
|
+
|
|
83
|
+
### Steps
|
|
84
|
+
|
|
85
|
+
The ordered actions the agent performs — when a skill has them, the primary tier of its content, and the part that earns its place in SKILL.md. Not every skill has steps: a skill can be all steps (`tdd`), all **reference** (a review), or both, independent of invocation. Every step ends on a **completion criterion**, clear or vague.
|
|
86
|
+
|
|
87
|
+
_Avoid_: workflow, instructions, choreography
|
|
88
|
+
|
|
89
|
+
### Reference
|
|
90
|
+
|
|
91
|
+
Material the agent refers to on demand — definitions, facts, parameters, examples, conditional instructions. When a skill has **steps** it is secondary to them; when a skill has none it is the entire content; or it lives outside any skill entirely — see **External Reference**. Reached via **context pointers**, and the prime candidate for **progressive disclosure**.
|
|
92
|
+
|
|
93
|
+
_Avoid_: supporting material, docs, background
|
|
94
|
+
|
|
95
|
+
### External Reference
|
|
96
|
+
|
|
97
|
+
**Reference** that lives outside the skill system — a plain file, no **description**, no **steps**, not invocable — that any skill can point at. The home for shared reference that needn't fire on its own, and the only shared home two **user-invoked** skills can use, since neither has a description and so neither can fire the other.
|
|
98
|
+
|
|
99
|
+
_Avoid_: doc, resource, knowledge base
|
|
100
|
+
|
|
101
|
+
### Progressive Disclosure
|
|
102
|
+
|
|
103
|
+
Moving **reference** down the ladder — out of SKILL.md and behind a **context pointer** — so the top stays legible. Not primarily a token optimisation; it is how the **information hierarchy** is protected. Licensed by **branching**: disclose what only some branches need, inline what every path needs, and if a pointer fires unreliably on must-have material, sharpen its wording, and pull it back inline only if that fails.
|
|
104
|
+
|
|
105
|
+
_Avoid_: lazy loading, chunking
|
|
106
|
+
|
|
107
|
+
### Co-location
|
|
108
|
+
|
|
109
|
+
Keeping the material an agent needs at once in one place — a concept's definition, rules, and caveats under a single heading, not scattered across the file — so reading one part brings its neighbours with it. The within-file companion to the **Information Hierarchy**: the hierarchy ranks _how far down_ a piece sits; co-location decides _what sits beside it_ once there. There is no formula for the right format of a body of **reference**; the test is that a skill should read like documentation written for the agent, and grouped material reads that way where scattered material does not. Distinct from **Duplication**: that repeats one meaning in two places, where scattering fragments a single meaning across many.
|
|
110
|
+
|
|
111
|
+
_Avoid_: grouping, clustering, cohesion
|
|
112
|
+
|
|
113
|
+
### Sprawl
|
|
114
|
+
|
|
115
|
+
_Failure mode._ A skill that is simply too long — too many lines in SKILL.md — independent of whether they are stale or repeated. Even an all-live, all-unique skill can sprawl. It costs readability (the agent wades through more before it can act, and attention thins across the excess), maintainability (every extra line is one more to keep **relevant**), and tokens. The cure is the **information hierarchy**: push **reference** down behind **context pointers**, and split by **branch** or sequence so each path carries only what it needs. Distinct from **sediment** (length from stale accumulation) and **duplication** (length from repeated meaning) — sprawl is length itself, whatever its cause.
|
|
116
|
+
|
|
117
|
+
_Avoid_: bloat, length, size, verbosity
|
|
118
|
+
|
|
119
|
+
## Steering
|
|
120
|
+
|
|
121
|
+
The levers that shape the agent's runtime behaviour toward **Predictability**.
|
|
122
|
+
|
|
123
|
+
### Branch
|
|
124
|
+
|
|
125
|
+
A distinct way a skill can be invoked — a case the skill handles — so different runs take different paths through it. A skill with many steps may carry many branches; a linear one has none.
|
|
126
|
+
|
|
127
|
+
_Avoid_: path, case, fork
|
|
128
|
+
|
|
129
|
+
### Leading Word
|
|
130
|
+
|
|
131
|
+
A compact concept — also called a _Leitwort_ — already living in the model's pretraining, that the agent thinks with while running the skill. It encodes a behavioural principle in the fewest possible tokens by invoking priors the model already holds (e.g. _lesson_, _proximal zone of development_, _fog of war_, _tracer bullets_). Repeated as a token, never as a sentence, it accumulates a distributed definition across the skill and anchors a whole region of behaviour. Coining your own works if you define it clearly, but a made-up word recruits no priors — you pay in definition tokens what a pretrained word gives free. Reach for an existing word first.
|
|
132
|
+
|
|
133
|
+
A leading word serves **predictability** twice. In the body it anchors **execution** — the agent reaches for the same behaviour every time the concept appears, and inside flat reference it focuses attention on a class of thing to look for, recruiting the right checks each run. In the **description** it anchors **invocation** — and not only within the skill: when the same word lives in your prompts, your docs, and your codebase, the agent links that shared language to the skill and fires it more reliably. Word a description with the leading words you actually use when you want the skill.
|
|
134
|
+
|
|
135
|
+
_Avoid_: keyword, term, motif
|
|
136
|
+
|
|
137
|
+
### Completion Criterion
|
|
138
|
+
|
|
139
|
+
The condition that tells the agent a unit of work is done — the target it judges against. Two properties make it a lever, not just a quality. Its **clarity** (can the agent tell done from not-done?) resists **premature completion** — a vague bound ("understanding reached") lets the agent declare done and slip to the next step; this axis needs _steps_ to bite, since premature completion is a between-steps failure. Its **demand** (how much it requires) sets **legwork** — "every modified model accounted for" forces thorough work where "produce a change list" does not — and this axis is _not_ step-bound: it can bind a body of flat reference too, which is how a skill with no steps still carries an exhaustiveness bar ("every rule applied"). The strongest criteria are both checkable and exhaustive.
|
|
140
|
+
|
|
141
|
+
_Avoid_: done condition, exit condition, stopping rule
|
|
142
|
+
|
|
143
|
+
### Legwork
|
|
144
|
+
|
|
145
|
+
The work an agent does behind the scenes within a single step — reading files, exploring the codebase, making changes, digging up what it needs rather than offloading to the user. It lives below the step structure: never written as its own step, latent in the wording, controlled by the agent rather than the skill. The within-step counterpart to **post-completion steps**' across-step pull. Raised by a **leading word** (_comprehensive_, _thorough_) or a **completion criterion** that demands the work be exhaustive — including the demand axis applied to flat reference, which is what drives a skill of flat reference to cover all its rungs. Goes thin either when that demand is missing or when **premature completion** cuts the step short.
|
|
146
|
+
|
|
147
|
+
_Avoid_: scope, effort, diligence, coverage
|
|
148
|
+
|
|
149
|
+
### Post-Completion Steps
|
|
150
|
+
|
|
151
|
+
The **steps** that follow the current step. Visible, they pull the agent forward into **premature completion** — the more it sees, the stronger the tug; the defence is to hide them by splitting the sequence of steps into two.
|
|
152
|
+
|
|
153
|
+
_Avoid_: horizon, fog of war, lookahead
|
|
154
|
+
|
|
155
|
+
### Premature Completion
|
|
156
|
+
|
|
157
|
+
_Failure mode._ Ending the current step before it is genuinely done, because the agent's attention slips to being done rather than to the work. A between-steps failure: it needs **steps** to occur — a skill with no steps that quits early isn't premature completion but thin **legwork** under an unmet demand. A tug-of-war between two forces: visible **post-completion steps** (the pull forward) and the **completion criterion**'s clarity (the resistance — a sharp, checkable bar holds; a vague one gives way). Fuzziness is the necessary condition: a sharp bound resists the pull no matter how many later steps are visible, so a step that never rushes needs no defending. Two levers hold a step that does, but reach for them in order: **sharpen the bound first** — it is local and cheap. Only when the criterion is irreducibly fuzzy _and_ you actually observe the rush do you **hide the later steps** — and hiding only works across a real context boundary (a user-invoked hand-off or a subagent dispatch; an inline model-invoked call leaves the later steps in context and clears nothing). One cause of thin legwork, but distinct from it: legwork can be thin even when a step runs to full completion.
|
|
158
|
+
|
|
159
|
+
_Avoid_: premature closure, the rush, rushing, shortcutting
|
|
160
|
+
|
|
161
|
+
## Pruning
|
|
162
|
+
|
|
163
|
+
Keeping a skill lean — each remedy paired with the failure it cures.
|
|
164
|
+
|
|
165
|
+
### Single Source of Truth
|
|
166
|
+
|
|
167
|
+
The desired state where each meaning lives in exactly one authoritative place, so a change to the skill's behaviour is a change in one place. **Duplication** is its violation.
|
|
168
|
+
|
|
169
|
+
_Avoid_: home, canonical location
|
|
170
|
+
|
|
171
|
+
### Duplication
|
|
172
|
+
|
|
173
|
+
_Failure mode._ The same meaning given more than one **single source of truth**. It costs maintenance (change one place, you must change the others), costs tokens, and inflates prominence — repeating a meaning weights it on the ladder past its real rank. The accidental inverse of a **leading word**, which raises attention on purpose by repeating a token, never the meaning.
|
|
174
|
+
|
|
175
|
+
_Avoid_: repetition, redundancy
|
|
176
|
+
|
|
177
|
+
### Relevance
|
|
178
|
+
|
|
179
|
+
Whether a line still bears on what the skill does — the lens for what to keep. A line loses relevance either by never bearing on the task (mere exposition, or a **branch** that should be disclosed) or by going stale: drifting out of date as the behaviour or world it describes changes. Shorter skills are easier to keep relevant, because each line is cheaper to check. Distinct from **no-op**: relevance asks whether a line bears on the task, not whether it changes behaviour.
|
|
180
|
+
|
|
181
|
+
_Avoid_: load-bearing, staleness, freshness
|
|
182
|
+
|
|
183
|
+
### Sediment
|
|
184
|
+
|
|
185
|
+
_Failure mode._ Layers of old content that settle in a skill and are never cleared, because adding feels safe and removing feels risky — so stale and irrelevant lines accumulate and you must core down through them to find what is still live. The default fate of any skill without a pruning discipline; the slow erosion of **relevance**, as opposed to **duplication**'s repeated meaning.
|
|
186
|
+
|
|
187
|
+
_Avoid_: accretion, bloat, cruft, rot
|
|
188
|
+
|
|
189
|
+
### No-Op
|
|
190
|
+
|
|
191
|
+
_Failure mode._ An instruction that changes nothing because the model already does it by default — you pay load to tell the agent what it would do anyway. The test: does a line change behaviour versus the default? A line can be perfectly **relevant** and still be a no-op. The same priors that make a **leading word** free make a no-op worthless.
|
|
192
|
+
|
|
193
|
+
A leading word is a _technique_; No-Op is a _verdict_ on a line — and they cross. A leading word too weak to beat the default is a no-op (_be thorough_ when the agent is already thorough-ish), and the fix is a stronger word that passes the verdict (_relentless_), not a different technique. So the No-Op test — does it change behaviour versus the default? — is also how you grade whether a leading word is earning its repetitions. This is model-relative, not reader-relative: two people disagreeing over whether a line is a no-op disagree about the default, and settle it by running the skill, not by debate.
|
|
194
|
+
|
|
195
|
+
_Avoid_: redundant instruction, restating the obvious, belaboring
|
|
@@ -0,0 +1,82 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: writing-great-skills
|
|
3
|
+
description: Reference for writing and editing skills well — the vocabulary and principles that make a skill predictable.
|
|
4
|
+
disable-model-invocation: true
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
A skill exists to wrangle determinism out of a stochastic system. **Predictability** — the agent taking the same _process_ every run, not producing the same output — is the root virtue; every lever below serves it.
|
|
8
|
+
|
|
9
|
+
**Bold terms** are defined in [`GLOSSARY.md`](GLOSSARY.md); look them up there for the full meaning.
|
|
10
|
+
|
|
11
|
+
## Invocation
|
|
12
|
+
|
|
13
|
+
Two choices, trading different costs:
|
|
14
|
+
|
|
15
|
+
- A **model-invoked** skill keeps a **description**, so the agent can fire it autonomously _and_ other skills can reach it (you can still type its name too). It contributes to **context load** — the description sits in the window every turn. Mechanics: omit `disable-model-invocation`, and write a model-facing description with rich trigger phrasing ("Use when the user wants…, mentions…").
|
|
16
|
+
- A **user-invoked** skill strips the description from the agent's reach: only you, typing its name, can invoke it — and no other skill can. Zero context load, but it spends **cognitive load**: _you_ are the index that must remember it exists. Mechanics: set `disable-model-invocation: true`; the `description` becomes human-facing — a one-line summary, trigger lists stripped.
|
|
17
|
+
|
|
18
|
+
Pick model-invocation only when the agent must reach the skill on its own, or another skill must. If it only ever fires by hand, make it user-invoked and pay no context load.
|
|
19
|
+
|
|
20
|
+
When user-invoked skills multiply past what you can remember, that piled-up cognitive load is cured by a **router skill**: one user-invoked skill that names the others and when to reach for each.
|
|
21
|
+
|
|
22
|
+
## Writing the description
|
|
23
|
+
|
|
24
|
+
A model-invoked **description** does two jobs — state what the skill is, and list the **branches** that should trigger it. Every word increases **context load**, so a description earns even harder pruning than the body:
|
|
25
|
+
|
|
26
|
+
- **Front-load the skill's leading word** — the description is where it does its invocation work.
|
|
27
|
+
- **One trigger per branch.** Synonyms that rename a single branch are **duplication** — "build features using TDD … asks for test-first development" is one branch written twice. Collapse them; keep only genuinely distinct branches.
|
|
28
|
+
- **Cut identity that's already in the body.** Keep the description to triggers, plus any "when another skill needs…" reach clause.
|
|
29
|
+
|
|
30
|
+
## Information hierarchy
|
|
31
|
+
|
|
32
|
+
A skill is built from two content types — **steps** and **reference** — that mix freely: a skill can be all steps, all reference, or both. The core decision is which to use and where each sits on the **information hierarchy**, a ladder ranked by how immediately the agent needs the material:
|
|
33
|
+
|
|
34
|
+
1. **In-skill step** — an ordered action in `SKILL.md`, the primary tier: what the agent does, in order. Each step ends on a **completion criterion**, the condition that tells the agent the work is done. Make it _checkable_ (can the agent tell done from not-done?) and, where it matters, _exhaustive_ ("every modified model accounted for", not "produce a change list") — a vague criterion invites **premature completion**.
|
|
35
|
+
2. **In-skill reference** — a definition, rule, or fact in `SKILL.md`, consulted on demand. Often a legitimately flat peer-set (every rule of a review on one rung) — a fine arrangement, not a smell. _This skill is all reference._
|
|
36
|
+
3. **External reference** — reference pushed out of `SKILL.md` into a separate file, reached by a **context pointer**, loaded only when the pointer fires. (Spans _disclosed_ reference — a sibling file like `GLOSSARY.md`, still part of the skill — through fully **external reference** that lives outside the skill system and any skill can point at.)
|
|
37
|
+
|
|
38
|
+
A demanding completion criterion drives thorough **legwork** — the digging the agent does within the work — whether the skill has steps or not, since "every rule applied" binds flat reference just as "every step done" binds a sequence.
|
|
39
|
+
|
|
40
|
+
Push too little down and the top bloats; push too much and you hide material the agent actually needs. That tension is the whole decision.
|
|
41
|
+
|
|
42
|
+
**Progressive disclosure** is the move down the ladder — out of `SKILL.md` into a linked file — so the top stays legible. Mechanics: a linked `.md` file in the skill folder, named for what it holds (this skill discloses its full definitions to `GLOSSARY.md`). Some skills are used in more than one way, and each distinct way is a **branch** — different runs taking different paths through the skill. Branching is the cleanest disclosure test: inline what every branch needs, and push behind a pointer what only some branches reach. A **context pointer**'s _wording_, not its target, decides when and how reliably the agent reaches the material.
|
|
43
|
+
|
|
44
|
+
Where the ladder decides _how far down_ a piece sits, **co-location** decides _what sits beside it_ once there: keep a concept's definition, rules, and caveats under one heading rather than scattered, so reading one part brings its neighbours with it.
|
|
45
|
+
|
|
46
|
+
## When to split
|
|
47
|
+
|
|
48
|
+
**Granularity** is how finely you divide skills, and each cut spends one of the two loads, so split only when the cut earns it. Two cuts:
|
|
49
|
+
|
|
50
|
+
- **By invocation** — split off a **model-invoked** skill when you have a distinct **leading word** that should trigger it on its own, or another skill must reach it. You pay **context load** for the new always-loaded **description**, so that independent reach has to be worth it.
|
|
51
|
+
- **By sequence** — split a run of **steps** when the steps still ahead (a step's **post-completion steps**) tempt the agent to rush the one in front of it (**premature completion**). Keeping them out of view encourages the agent to do more **legwork** on the current task.
|
|
52
|
+
|
|
53
|
+
## Pruning
|
|
54
|
+
|
|
55
|
+
Keep each meaning in a **single source of truth**: one authoritative place, so changing the behaviour is a one-place edit.
|
|
56
|
+
|
|
57
|
+
Check every line for **relevance**: does it still bear on what the skill does?
|
|
58
|
+
|
|
59
|
+
Then hunt **no-ops** sentence by sentence, not just line by line: run the no-op test on each sentence in isolation, and when one fails, delete the whole sentence rather than trim words from it. Be aggressive — most prose that fails should go, not be rewritten.
|
|
60
|
+
|
|
61
|
+
## Leading words
|
|
62
|
+
|
|
63
|
+
A **leading word** is a compact concept already living in the model's pretraining that the agent thinks with while running the skill (e.g. _lesson_, _fog of war_, _tracer bullets_). Repeated throughout the text (though not necessarily - a strong leading word might only be needed once), it accumulates a distributed definition and anchors a whole region of behaviour in the fewest tokens, by recruiting priors the model already holds.
|
|
64
|
+
|
|
65
|
+
It serves predictability twice. In the body it anchors _execution_: the agent reaches for the same behaviour every time the word appears. In the description it anchors _invocation_: when the same word lives in your prompts, docs, and code, the agent links that shared language to the skill and fires it more reliably.
|
|
66
|
+
|
|
67
|
+
Hunt for opportunities to refactor skills to use leading words. A triad spelled out at three sites (**duplication**), a description spending a sentence to gesture at one idea — each is a passage begging to **collapse** into a single token. Examples include:
|
|
68
|
+
|
|
69
|
+
- "fast, deterministic, low-overhead" -> _tight_ — one quality restated across a phase — into a single pretrained word (a _tight_ loop).
|
|
70
|
+
- "a loop you believe in" -> _red_ — converts a fuzzy gate into a binary observable state (the loop goes _red_ on the bug, or it doesn't).
|
|
71
|
+
|
|
72
|
+
You win twice over: fewer tokens, _and_ a sharper hook for the agent to hang its thinking on. Assume every skill is carrying restatements that leading words retire — go find them.
|
|
73
|
+
|
|
74
|
+
## Failure modes
|
|
75
|
+
|
|
76
|
+
Use these to diagnose issues the user may be having with the skill.
|
|
77
|
+
|
|
78
|
+
- **Premature completion** — ending a step before it's genuinely done, attention slipping to _being done_. Defence, in order: sharpen the completion criterion first (cheap, local); only if it is irreducibly fuzzy _and_ you observe the rush, hide the post-completion steps by splitting (the sequence cut).
|
|
79
|
+
- **Duplication** — the same meaning in more than one place. Costs maintenance and tokens, and inflates a meaning's prominence on the ladder past its real rank.
|
|
80
|
+
- **Sediment** — stale layers that settle because adding feels safe and removing feels risky. The default fate of any skill without a pruning discipline.
|
|
81
|
+
- **Sprawl** — a skill simply too long, even when every line is live and unique. Hurts readability and maintainability and wastes tokens. The cure is the ladder: disclose **reference** behind pointers, and split by **branch** or sequence so each path carries only what it needs.
|
|
82
|
+
- **No-op** — a line the model already obeys by default, so you pay load to say nothing. The test: does it change behaviour versus the default? A weak leading word (_be thorough_ when the agent is already thorough-ish) is a no-op; the fix is a stronger word (_relentless_), not a different technique.
|
|
@@ -0,0 +1,174 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: writing-plans
|
|
3
|
+
description: Use when you have a spec or requirements for a multi-step task, before touching code
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# Writing Plans
|
|
7
|
+
|
|
8
|
+
## Overview
|
|
9
|
+
|
|
10
|
+
Write comprehensive implementation plans assuming the engineer has zero context for our codebase and questionable taste. Document everything they need to know: which files to touch for each task, code, testing, docs they might need to check, how to test it. Give them the whole plan as bite-sized tasks. DRY. YAGNI. TDD. Frequent commits.
|
|
11
|
+
|
|
12
|
+
Assume they are a skilled developer, but know almost nothing about our toolset or problem domain. Assume they don't know good test design very well.
|
|
13
|
+
|
|
14
|
+
**Announce at start:** "I'm using the writing-plans skill to create the implementation plan."
|
|
15
|
+
|
|
16
|
+
**Context:** If working in an isolated worktree, it should have been created via the `superpowers:using-git-worktrees` skill at execution time.
|
|
17
|
+
|
|
18
|
+
**Save plans to:** `docs/superpowers/plans/YYYY-MM-DD-<feature-name>.md`
|
|
19
|
+
- (User preferences for plan location override this default)
|
|
20
|
+
|
|
21
|
+
## Scope Check
|
|
22
|
+
|
|
23
|
+
If the spec covers multiple independent subsystems, it should have been broken into sub-project specs during brainstorming. If it wasn't, suggest breaking this into separate plans — one per subsystem. Each plan should produce working, testable software on its own.
|
|
24
|
+
|
|
25
|
+
## File Structure
|
|
26
|
+
|
|
27
|
+
Before defining tasks, map out which files will be created or modified and what each one is responsible for. This is where decomposition decisions get locked in.
|
|
28
|
+
|
|
29
|
+
- Design units with clear boundaries and well-defined interfaces. Each file should have one clear responsibility.
|
|
30
|
+
- You reason best about code you can hold in context at once, and your edits are more reliable when files are focused. Prefer smaller, focused files over large ones that do too much.
|
|
31
|
+
- Files that change together should live together. Split by responsibility, not by technical layer.
|
|
32
|
+
- In existing codebases, follow established patterns. If the codebase uses large files, don't unilaterally restructure - but if a file you're modifying has grown unwieldy, including a split in the plan is reasonable.
|
|
33
|
+
|
|
34
|
+
This structure informs the task decomposition. Each task should produce self-contained changes that make sense independently.
|
|
35
|
+
|
|
36
|
+
## Task Right-Sizing
|
|
37
|
+
|
|
38
|
+
A task is the smallest unit that carries its own test cycle and is worth a
|
|
39
|
+
fresh reviewer's gate. When drawing task boundaries: fold setup,
|
|
40
|
+
configuration, scaffolding, and documentation steps into the task whose
|
|
41
|
+
deliverable needs them; split only where a reviewer could meaningfully
|
|
42
|
+
reject one task while approving its neighbor. Each task ends with an
|
|
43
|
+
independently testable deliverable.
|
|
44
|
+
|
|
45
|
+
## Bite-Sized Task Granularity
|
|
46
|
+
|
|
47
|
+
**Each step is one action (2-5 minutes):**
|
|
48
|
+
- "Write the failing test" - step
|
|
49
|
+
- "Run it to make sure it fails" - step
|
|
50
|
+
- "Implement the minimal code to make the test pass" - step
|
|
51
|
+
- "Run the tests and make sure they pass" - step
|
|
52
|
+
- "Commit" - step
|
|
53
|
+
|
|
54
|
+
## Plan Document Header
|
|
55
|
+
|
|
56
|
+
**Every plan MUST start with this header:**
|
|
57
|
+
|
|
58
|
+
```markdown
|
|
59
|
+
# [Feature Name] Implementation Plan
|
|
60
|
+
|
|
61
|
+
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
|
|
62
|
+
|
|
63
|
+
**Goal:** [One sentence describing what this builds]
|
|
64
|
+
|
|
65
|
+
**Architecture:** [2-3 sentences about approach]
|
|
66
|
+
|
|
67
|
+
**Tech Stack:** [Key technologies/libraries]
|
|
68
|
+
|
|
69
|
+
## Global Constraints
|
|
70
|
+
|
|
71
|
+
[The spec's project-wide requirements — version floors, dependency limits,
|
|
72
|
+
naming and copy rules, platform requirements — one line each, with exact
|
|
73
|
+
values copied verbatim from the spec. Every task's requirements implicitly
|
|
74
|
+
include this section.]
|
|
75
|
+
|
|
76
|
+
---
|
|
77
|
+
```
|
|
78
|
+
|
|
79
|
+
## Task Structure
|
|
80
|
+
|
|
81
|
+
````markdown
|
|
82
|
+
### Task N: [Component Name]
|
|
83
|
+
|
|
84
|
+
**Files:**
|
|
85
|
+
- Create: `exact/path/to/file.py`
|
|
86
|
+
- Modify: `exact/path/to/existing.py:123-145`
|
|
87
|
+
- Test: `tests/exact/path/to/test.py`
|
|
88
|
+
|
|
89
|
+
**Interfaces:**
|
|
90
|
+
- Consumes: [what this task uses from earlier tasks — exact signatures]
|
|
91
|
+
- Produces: [what later tasks rely on — exact function names, parameter
|
|
92
|
+
and return types. A task's implementer sees only their own task; this
|
|
93
|
+
block is how they learn the names and types neighboring tasks use.]
|
|
94
|
+
|
|
95
|
+
- [ ] **Step 1: Write the failing test**
|
|
96
|
+
|
|
97
|
+
```python
|
|
98
|
+
def test_specific_behavior():
|
|
99
|
+
result = function(input)
|
|
100
|
+
assert result == expected
|
|
101
|
+
```
|
|
102
|
+
|
|
103
|
+
- [ ] **Step 2: Run test to verify it fails**
|
|
104
|
+
|
|
105
|
+
Run: `pytest tests/path/test.py::test_name -v`
|
|
106
|
+
Expected: FAIL with "function not defined"
|
|
107
|
+
|
|
108
|
+
- [ ] **Step 3: Write minimal implementation**
|
|
109
|
+
|
|
110
|
+
```python
|
|
111
|
+
def function(input):
|
|
112
|
+
return expected
|
|
113
|
+
```
|
|
114
|
+
|
|
115
|
+
- [ ] **Step 4: Run test to verify it passes**
|
|
116
|
+
|
|
117
|
+
Run: `pytest tests/path/test.py::test_name -v`
|
|
118
|
+
Expected: PASS
|
|
119
|
+
|
|
120
|
+
- [ ] **Step 5: Commit**
|
|
121
|
+
|
|
122
|
+
```bash
|
|
123
|
+
git add tests/path/test.py src/path/file.py
|
|
124
|
+
git commit -m "feat: add specific feature"
|
|
125
|
+
```
|
|
126
|
+
````
|
|
127
|
+
|
|
128
|
+
## No Placeholders
|
|
129
|
+
|
|
130
|
+
Every step must contain the actual content an engineer needs. These are **plan failures** — never write them:
|
|
131
|
+
- "TBD", "TODO", "implement later", "fill in details"
|
|
132
|
+
- "Add appropriate error handling" / "add validation" / "handle edge cases"
|
|
133
|
+
- "Write tests for the above" (without actual test code)
|
|
134
|
+
- "Similar to Task N" (repeat the code — the engineer may be reading tasks out of order)
|
|
135
|
+
- Steps that describe what to do without showing how (code blocks required for code steps)
|
|
136
|
+
- References to types, functions, or methods not defined in any task
|
|
137
|
+
|
|
138
|
+
## Remember
|
|
139
|
+
- Exact file paths always
|
|
140
|
+
- Complete code in every step — if a step changes code, show the code
|
|
141
|
+
- Exact commands with expected output
|
|
142
|
+
- DRY, YAGNI, TDD, frequent commits
|
|
143
|
+
|
|
144
|
+
## Self-Review
|
|
145
|
+
|
|
146
|
+
After writing the complete plan, look at the spec with fresh eyes and check the plan against it. This is a checklist you run yourself — not a subagent dispatch.
|
|
147
|
+
|
|
148
|
+
**1. Spec coverage:** Skim each section/requirement in the spec. Can you point to a task that implements it? List any gaps.
|
|
149
|
+
|
|
150
|
+
**2. Placeholder scan:** Search your plan for red flags — any of the patterns from the "No Placeholders" section above. Fix them.
|
|
151
|
+
|
|
152
|
+
**3. Type consistency:** Do the types, method signatures, and property names you used in later tasks match what you defined in earlier tasks? A function called `clearLayers()` in Task 3 but `clearFullLayers()` in Task 7 is a bug.
|
|
153
|
+
|
|
154
|
+
If you find issues, fix them inline. No need to re-review — just fix and move on. If you find a spec requirement with no task, add the task.
|
|
155
|
+
|
|
156
|
+
## Execution Handoff
|
|
157
|
+
|
|
158
|
+
After saving the plan, offer execution choice:
|
|
159
|
+
|
|
160
|
+
**"Plan complete and saved to `docs/superpowers/plans/<filename>.md`. Two execution options:**
|
|
161
|
+
|
|
162
|
+
**1. Subagent-Driven (recommended)** - I dispatch a fresh subagent per task, review between tasks, fast iteration
|
|
163
|
+
|
|
164
|
+
**2. Inline Execution** - Execute tasks in this session using executing-plans, batch execution with checkpoints
|
|
165
|
+
|
|
166
|
+
**Which approach?"**
|
|
167
|
+
|
|
168
|
+
**If Subagent-Driven chosen:**
|
|
169
|
+
- **REQUIRED SUB-SKILL:** Use superpowers:subagent-driven-development
|
|
170
|
+
- Fresh subagent per task + two-stage review
|
|
171
|
+
|
|
172
|
+
**If Inline Execution chosen:**
|
|
173
|
+
- **REQUIRED SUB-SKILL:** Use superpowers:executing-plans
|
|
174
|
+
- Batch execution with checkpoints for review
|