@super-pocock-ai/compose-workflow 2.0.0 → 2.0.6

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/package.json CHANGED
@@ -1,8 +1,15 @@
1
1
  {
2
2
  "name": "@super-pocock-ai/compose-workflow",
3
- "version": "2.0.0",
3
+ "version": "2.0.6",
4
4
  "description": "Compose 工作流技能,用于编排开发流程",
5
- "files": ["skills"],
6
- "keywords": ["opencode", "skills", "compose", "workflow"],
5
+ "files": [
6
+ "skills"
7
+ ],
8
+ "keywords": [
9
+ "opencode",
10
+ "skills",
11
+ "compose",
12
+ "workflow"
13
+ ],
7
14
  "license": "MIT"
8
15
  }
@@ -0,0 +1,200 @@
1
+ ---
2
+ name: compose:brainstorm
3
+ hidden: true
4
+ description: "You MUST use this before any creative work - creating features, building components, adding functionality, or modifying behavior. Explores user intent, requirements and design before implementation."
5
+ ---
6
+
7
+ # Brainstorming Ideas Into Designs
8
+
9
+ Help turn ideas into fully formed designs and specs through natural collaborative dialogue.
10
+
11
+ Start by understanding the current project context, then ask questions one at a time to refine the idea. Once you understand what you're building, present the design and get user approval.
12
+
13
+ <HARD-GATE>
14
+ Do NOT invoke any implementation skill, write any code, scaffold any project, or take any implementation action until you have presented a design and the user has approved it. This applies to EVERY project regardless of perceived simplicity.
15
+
16
+ **Autonomous override:** When no user is available to approve, this HARD-GATE does not apply: skip design presentation and approval, proceed directly to implementation, treating every approval gate as auto-approved.
17
+ </HARD-GATE>
18
+
19
+ ## Anti-Pattern: "This Is Too Simple To Need A Design"
20
+
21
+ Every project goes through this process. A todo list, a single-function utility, a config change — all of them. "Simple" projects are where unexamined assumptions cause the most wasted work. The design can be short (a few sentences for truly simple projects), but you MUST present it and get approval.
22
+
23
+ **Autonomous override:** When no user is available, skip this anti-pattern guidance — proceed directly to exploring context and implementing; do not present a design for approval.
24
+
25
+ ## Checklist
26
+
27
+ You MUST create a task for each of these items and complete them in order:
28
+
29
+ **Autonomous mode (no user available):** Skip steps 2–8. Only do: explore project context (step 1) → implement directly (invoke compose:plan or compose:execute). Do NOT present a design, ask questions, write a spec, or wait for approval.
30
+
31
+ 1. **Explore project context** — check files, docs, recent commits
32
+ 2. **Offer visual companion** (if topic will involve visual questions) — this is its own message, not combined with a clarifying question. See the Visual Companion section below.
33
+ 3. **Ask clarifying questions** — one at a time, understand purpose/constraints/success criteria
34
+ 4. **Propose 2-3 approaches** — with trade-offs and your recommendation
35
+ 5. **Present design** — in sections scaled to their complexity, get user approval after each section
36
+ 6. **Write design doc** (optional, multi-step features only) — save to the `specs/` directory given in the `<compose_docs_dir>` block of your prompt, as `YYYY-MM-DD-<topic>-design.md`, and commit. For single-step fixes or small changes, keep the design in conversation context only.
37
+ 7. **Spec self-review** (if doc written) — quick inline check for placeholders, contradictions, ambiguity, scope (see below)
38
+ 8. **User reviews written spec** (if doc written) — ask user to review the spec file before proceeding
39
+ 9. **Transition to implementation** — invoke compose:plan to create implementation plan
40
+
41
+ ## Process Flow
42
+
43
+ - **Explore project context**
44
+ - **Visual questions ahead?** — Yes → offer Visual Companion (own message, no other content)
45
+ - **Ask clarifying questions**
46
+ - **Propose 2-3 approaches**
47
+ - **Present design sections**
48
+ - **User approves design?** — No → revise, back to present design sections
49
+ - **Write design doc**
50
+ - **Spec self-review** (fix inline)
51
+ - **User reviews spec?** — Changes requested → back to write design doc / Approved → **invoke compose:plan**
52
+
53
+ **The terminal state is invoking compose:plan.** Do NOT invoke frontend-design, mcp-builder, or any other implementation skill. The ONLY skill you invoke after brainstorming is compose:plan.
54
+
55
+ ## The Process
56
+
57
+ **Understanding the idea:**
58
+
59
+ - Check out the current project state first (files, docs, recent commits)
60
+ - Before asking detailed questions, assess scope: if the request describes multiple independent subsystems (e.g., "build a platform with chat, file storage, billing, and analytics"), flag this immediately. Don't spend questions refining details of a project that needs to be decomposed first.
61
+ - If the project is too large for a single spec, help the user decompose into sub-projects: what are the independent pieces, how do they relate, what order should they be built? Then brainstorm the first sub-project through the normal design flow. Each sub-project gets its own spec → plan → implementation cycle.
62
+ - For appropriately-scoped projects, ask questions one at a time to refine the idea
63
+ - When the question has a known set of likely answers, use `compose:ask` with those answers as options
64
+ - For open-ended questions, use `compose:ask` with 2-3 suggested answers as options — the user can always type their own answer
65
+ - If no user is available, make reasonable assumptions from project context and proceed
66
+ - Only one question per tool call — if a topic needs more exploration, break it into multiple questions
67
+ - Focus on understanding: purpose, constraints, success criteria
68
+
69
+ **Exploring approaches:**
70
+
71
+ - Propose 2-3 different approaches with trade-offs
72
+ - Present options conversationally with your recommendation and reasoning
73
+ - Lead with your recommended option and explain why
74
+
75
+ **Presenting the design:**
76
+
77
+ - Once you believe you understand what you're building, present the design
78
+ - Scale each section to its complexity: a few sentences if straightforward, up to 200-300 words if nuanced
79
+ - After presenting each section, use `compose:ask`:
80
+ - header: `Design Review`
81
+ - question: `Does this <section-name> look right?`
82
+ - options:
83
+ - label: `Looks good`, description: `Approve and continue`
84
+ - label: `Needs changes`, description: `I have feedback`
85
+
86
+ If no user is available, treat as approved and continue.
87
+ - Cover: architecture, components, data flow, error handling, testing
88
+ - Be ready to go back and clarify if something doesn't make sense
89
+
90
+ **Design for isolation and clarity:**
91
+
92
+ - Break the system into smaller units that each have one clear purpose, communicate through well-defined interfaces, and can be understood and tested independently
93
+ - For each unit, you should be able to answer: what does it do, how do you use it, and what does it depend on?
94
+ - Can someone understand what a unit does without reading its internals? Can you change the internals without breaking consumers? If not, the boundaries need work.
95
+ - Smaller, well-bounded units are also easier for you to work with - you reason better about code you can hold in context at once, and your edits are more reliable when files are focused. When a file grows large, that's often a signal that it's doing too much.
96
+
97
+ **Working in existing codebases:**
98
+
99
+ - Explore the current structure before proposing changes. Follow existing patterns.
100
+ - Where existing code has problems that affect the work (e.g., a file that's grown too large, unclear boundaries, tangled responsibilities), include targeted improvements as part of the design - the way a good developer improves code they're working in.
101
+ - Don't propose unrelated refactoring. Stay focused on what serves the current goal.
102
+
103
+ ## After the Design
104
+
105
+ **Documentation (optional, multi-step features only):**
106
+
107
+ For features with multiple tasks or significant architectural decisions:
108
+ - Write the validated design (spec) to the `specs/` directory given in `<compose_docs_dir>`, as `YYYY-MM-DD-<topic>-design.md`
109
+ - (User preferences for spec location override this default)
110
+ - Use elements-of-style:writing-clearly-and-concisely skill if available
111
+ - Commit the design document to git
112
+
113
+ For single bug fixes or small changes, skip the written spec — the design presented in conversation is sufficient.
114
+
115
+ **Spec Self-Review (if doc written):**
116
+ After writing the spec document, look at it with fresh eyes:
117
+
118
+ 1. **Placeholder scan:** Any "TBD", "TODO", incomplete sections, or vague requirements? Fix them.
119
+ 2. **Internal consistency:** Do any sections contradict each other? Does the architecture match the feature descriptions?
120
+ 3. **Scope check:** Is this focused enough for a single implementation plan, or does it need decomposition?
121
+ 4. **Ambiguity check:** Could any requirement be interpreted two different ways? If so, pick one and make it explicit.
122
+
123
+ Fix any issues inline. No need to re-review — just fix and move on.
124
+
125
+ **User Review Gate (if doc written):**
126
+ After spec self-review passes, use `compose:ask`:
127
+ - header: `Spec Review`
128
+ - question: `Spec written and committed to <path>. Ready to proceed?`
129
+ - options:
130
+ - label: `Approved`, description: `Proceed to compose:plan`
131
+ - label: `Changes needed`, description: `I have revisions`
132
+
133
+ If no user is available, treat as approved and invoke compose:plan.
134
+
135
+ If "Changes needed" or custom feedback, apply changes and re-run spec review. Only proceed on approval.
136
+
137
+ **Implementation:**
138
+
139
+ - Invoke compose:plan to create a detailed implementation plan
140
+ - Do NOT invoke any other skill. compose:plan is the next step.
141
+
142
+ ## Spec Section Anchors
143
+
144
+ When writing the spec, give every `##` section heading a stable anchor ID so downstream plan tasks and reviewers can reference exact spec locations. Put the ID at the start of the heading text:
145
+
146
+ ```markdown
147
+ ## [S1] Problem
148
+ ## [S2] Solution overview
149
+ ## [S3] Coverage gate behavior
150
+ ```
151
+
152
+ Rules:
153
+
154
+ - **ID format** is `S` followed by a number (`[S1]`, `[S2]`, `[S3]`, ...), unique within the spec — no two sections share an ID, and no section is left without one.
155
+ - **Number sections in document order** when first authoring the spec (top section is `[S1]`, the next is `[S2]`, and so on).
156
+ - **The ID is stable.** If a heading is later reworded, keep its existing ID and do NOT renumber the other sections. Downstream `covers:` references and review verdicts depend on these IDs not drifting — renumbering would silently break every reference that points at them.
157
+
158
+ These anchors are the index the plan and reviewers use to trace each task and each review verdict back to the exact spec section it serves.
159
+
160
+ ## Key Principles
161
+
162
+ - **One question at a time** - Don't overwhelm with multiple questions
163
+ - **Multiple choice preferred** - Easier to answer than open-ended when possible
164
+ - **YAGNI ruthlessly** - Remove unnecessary features from all designs
165
+ - **Explore alternatives** - Always propose 2-3 approaches before settling
166
+ - **Incremental validation** - Present design, get approval before moving on
167
+ - **Be flexible** - Go back and clarify when something doesn't make sense
168
+
169
+ ## Visual Companion
170
+
171
+ A browser-based companion for showing mockups, diagrams, and visual options during brainstorming. Available as a tool — not a mode. Accepting the companion means it's available for questions that benefit from visual treatment; it does NOT mean every question goes through the browser.
172
+
173
+ **Offering the companion:** When you anticipate visual content (mockups, layouts, diagrams):
174
+
175
+ 1. **Check memory** for a `visual-companion` preference in the `compose-preferences` memory file. If found, honor it.
176
+
177
+ 2. **If no saved preference,** offer consent using `compose:ask` (this MUST be its own message — do not combine with other content):
178
+ - header: `Visual Companion`
179
+ - question: `Some upcoming questions may benefit from browser-based mockups and diagrams. This feature is token-intensive and requires opening a local URL.`
180
+ - options:
181
+ - label: `Yes, always`, description: `Enable visuals for this and future sessions`
182
+ - label: `No, never`, description: `Skip visuals for this and future sessions`
183
+ - label: `Yes, this time`, description: `Enable visuals for this session only`
184
+ - label: `No, this time`, description: `Skip visuals for this session only`
185
+
186
+ If no user is available, skip the visual companion and use text-only.
187
+
188
+ 3. **If "Yes, always" or "No, never":** Save to the `compose-preferences` memory file.
189
+
190
+ If declined, proceed with text-only brainstorming.
191
+
192
+ **Per-question decision:** Even after the user accepts, decide FOR EACH QUESTION whether to use the browser or the terminal. The test: **would the user understand this better by seeing it than reading it?**
193
+
194
+ - **Use the browser** for content that IS visual — mockups, wireframes, layout comparisons, architecture diagrams, side-by-side visual designs
195
+ - **Use the terminal** for content that is text — requirements questions, conceptual choices, tradeoff lists, A/B/C/D text options, scope decisions
196
+
197
+ A question about a UI topic is not automatically a visual question. "What does personality mean in this context?" is a conceptual question — use the terminal. "Which wizard layout works better?" is a visual question — use the browser.
198
+
199
+ If they agree to the companion, read the detailed guide before proceeding:
200
+ `<compose:brainstorm>/visual-companion.md`
@@ -0,0 +1,297 @@
1
+ ---
2
+ name: compose:debug
3
+ hidden: true
4
+ description: Use when encountering any bug, test failure, or unexpected behavior, before proposing fixes
5
+ ---
6
+
7
+ # Systematic Debugging
8
+
9
+ ## Overview
10
+
11
+ Random fixes waste time and create new bugs. Quick patches mask underlying issues.
12
+
13
+ **Core principle:** ALWAYS find root cause before attempting fixes. Symptom fixes are failure.
14
+
15
+ **Violating the letter of this process is violating the spirit of debugging.**
16
+
17
+ ## The Iron Law
18
+
19
+ ```
20
+ NO FIXES WITHOUT ROOT CAUSE INVESTIGATION FIRST
21
+ ```
22
+
23
+ If you haven't completed Phase 1, you cannot propose fixes.
24
+
25
+ ## When to Use
26
+
27
+ Use for ANY technical issue:
28
+ - Test failures
29
+ - Bugs in production
30
+ - Unexpected behavior
31
+ - Performance problems
32
+ - Build failures
33
+ - Integration issues
34
+
35
+ **Use this ESPECIALLY when:**
36
+ - Under time pressure (emergencies make guessing tempting)
37
+ - "Just one quick fix" seems obvious
38
+ - You've already tried multiple fixes
39
+ - Previous fix didn't work
40
+ - You don't fully understand the issue
41
+
42
+ **Don't skip when:**
43
+ - Issue seems simple (simple bugs have root causes too)
44
+ - You're in a hurry (rushing guarantees rework)
45
+ - Manager wants it fixed NOW (systematic is faster than thrashing)
46
+
47
+ ## The Four Phases
48
+
49
+ You MUST complete each phase before proceeding to the next.
50
+
51
+ ### Phase 1: Root Cause Investigation
52
+
53
+ **BEFORE attempting ANY fix:**
54
+
55
+ 1. **Read Error Messages Carefully**
56
+ - Don't skip past errors or warnings
57
+ - They often contain the exact solution
58
+ - Read stack traces completely
59
+ - Note line numbers, file paths, error codes
60
+
61
+ 2. **Reproduce Consistently**
62
+ - Can you trigger it reliably?
63
+ - What are the exact steps?
64
+ - Does it happen every time?
65
+ - If not reproducible → gather more data, don't guess
66
+
67
+ 3. **Check Recent Changes**
68
+ - What changed that could cause this?
69
+ - Git diff, recent commits
70
+ - New dependencies, config changes
71
+ - Environmental differences
72
+
73
+ 4. **Gather Evidence in Multi-Component Systems**
74
+
75
+ **WHEN system has multiple components (CI → build → signing, API → service → database):**
76
+
77
+ **BEFORE proposing fixes, add diagnostic instrumentation:**
78
+ ```
79
+ For EACH component boundary:
80
+ - Log what data enters component
81
+ - Log what data exits component
82
+ - Verify environment/config propagation
83
+ - Check state at each layer
84
+
85
+ Run once to gather evidence showing WHERE it breaks
86
+ THEN analyze evidence to identify failing component
87
+ THEN investigate that specific component
88
+ ```
89
+
90
+ **Example (multi-layer system):**
91
+ ```bash
92
+ # Layer 1: Workflow
93
+ echo "=== Secrets available in workflow: ==="
94
+ echo "IDENTITY: ${IDENTITY:+SET}${IDENTITY:-UNSET}"
95
+
96
+ # Layer 2: Build script
97
+ echo "=== Env vars in build script: ==="
98
+ env | grep IDENTITY || echo "IDENTITY not in environment"
99
+
100
+ # Layer 3: Signing script
101
+ echo "=== Keychain state: ==="
102
+ security list-keychains
103
+ security find-identity -v
104
+
105
+ # Layer 4: Actual signing
106
+ codesign --sign "$IDENTITY" --verbose=4 "$APP"
107
+ ```
108
+
109
+ **This reveals:** Which layer fails (secrets → workflow ✓, workflow → build ✗)
110
+
111
+ 5. **Trace Data Flow**
112
+
113
+ **WHEN error is deep in call stack:**
114
+
115
+ See `root-cause-tracing.md` in this directory for the complete backward tracing technique.
116
+
117
+ **Quick version:**
118
+ - Where does bad value originate?
119
+ - What called this with bad value?
120
+ - Keep tracing up until you find the source
121
+ - Fix at source, not at symptom
122
+
123
+ ### Phase 2: Pattern Analysis
124
+
125
+ **Find the pattern before fixing:**
126
+
127
+ 1. **Find Working Examples**
128
+ - Locate similar working code in same codebase
129
+ - What works that's similar to what's broken?
130
+
131
+ 2. **Compare Against References**
132
+ - If implementing pattern, read reference implementation COMPLETELY
133
+ - Don't skim - read every line
134
+ - Understand the pattern fully before applying
135
+
136
+ 3. **Identify Differences**
137
+ - What's different between working and broken?
138
+ - List every difference, however small
139
+ - Don't assume "that can't matter"
140
+
141
+ 4. **Understand Dependencies**
142
+ - What other components does this need?
143
+ - What settings, config, environment?
144
+ - What assumptions does it make?
145
+
146
+ ### Phase 3: Hypothesis and Testing
147
+
148
+ **Scientific method:**
149
+
150
+ 1. **Form Single Hypothesis**
151
+ - State clearly: "I think X is the root cause because Y"
152
+ - Write it down
153
+ - Be specific, not vague
154
+
155
+ 2. **Test Minimally**
156
+ - Make the SMALLEST possible change to test hypothesis
157
+ - One variable at a time
158
+ - Don't fix multiple things at once
159
+
160
+ 3. **Verify Before Continuing**
161
+ - Did it work? Yes → Phase 4
162
+ - Didn't work? Form NEW hypothesis
163
+ - DON'T add more fixes on top
164
+
165
+ 4. **When You Don't Know**
166
+ - Say "I don't understand X"
167
+ - Don't pretend to know
168
+ - Ask for help through `compose:ask` — present what you've tried and offer structured next-step options. If no user is available, take the most promising next step and continue.
169
+ - Research more
170
+
171
+ ### Phase 4: Implementation
172
+
173
+ **Fix the root cause, not the symptom:**
174
+
175
+ 1. **Create Failing Test Case**
176
+ - Simplest possible reproduction
177
+ - Automated test if possible
178
+ - One-off test script if no framework
179
+ - MUST have before fixing
180
+ - Use the `compose:tdd` skill for writing proper failing tests
181
+
182
+ 2. **Implement Single Fix**
183
+ - Address the root cause identified
184
+ - ONE change at a time
185
+ - No "while I'm here" improvements
186
+ - No bundled refactoring
187
+
188
+ 3. **Verify Fix**
189
+ - Test passes now?
190
+ - No other tests broken?
191
+ - Issue actually resolved?
192
+
193
+ 4. **If Fix Doesn't Work**
194
+ - STOP
195
+ - Count: How many fixes have you tried?
196
+ - If < 3: Return to Phase 1, re-analyze with new information
197
+ - **If ≥ 3: STOP and question the architecture (step 5 below)**
198
+ - DON'T attempt Fix #4 without architectural discussion
199
+
200
+ 5. **If 3+ Fixes Failed: Question Architecture**
201
+
202
+ **Pattern indicating architectural problem:**
203
+ - Each fix reveals new shared state/coupling/problem in different place
204
+ - Fixes require "massive refactoring" to implement
205
+ - Each fix creates new symptoms elsewhere
206
+
207
+ **STOP and question fundamentals:**
208
+ - Is this pattern fundamentally sound?
209
+ - Are we "sticking with it through sheer inertia"?
210
+ - Should we refactor architecture vs. continue fixing symptoms?
211
+
212
+ **Use `compose:ask` to present the architectural concern** with options like "Continue fixing" / "Propose refactor" / "Discuss first". If no user is available, choose "Propose refactor" and proceed.
213
+
214
+ This is NOT a failed hypothesis - this is a wrong architecture.
215
+
216
+ ## Red Flags - STOP and Follow Process
217
+
218
+ If you catch yourself thinking:
219
+ - "Quick fix for now, investigate later"
220
+ - "Just try changing X and see if it works"
221
+ - "Add multiple changes, run tests"
222
+ - "Skip the test, I'll manually verify"
223
+ - "It's probably X, let me fix that"
224
+ - "I don't fully understand but this might work"
225
+ - "Pattern says X but I'll adapt it differently"
226
+ - "Here are the main problems: [lists fixes without investigation]"
227
+ - Proposing solutions before tracing data flow
228
+ - **"One more fix attempt" (when already tried 2+)**
229
+ - **Each fix reveals new problem in different place**
230
+
231
+ **ALL of these mean: STOP. Return to Phase 1.**
232
+
233
+ **If 3+ fixes failed:** Question the architecture (see Phase 4.5)
234
+
235
+ ## your human partner's Signals You're Doing It Wrong
236
+
237
+ **Watch for these redirections:**
238
+ - "Is that not happening?" - You assumed without verifying
239
+ - "Will it show us...?" - You should have added evidence gathering
240
+ - "Stop guessing" - You're proposing fixes without understanding
241
+ - "Ultrathink this" - Question fundamentals, not just symptoms
242
+ - "We're stuck?" (frustrated) - Your approach isn't working
243
+
244
+ **When you see these:** STOP. Return to Phase 1.
245
+
246
+ ## Common Rationalizations
247
+
248
+ | Excuse | Reality |
249
+ |--------|---------|
250
+ | "Issue is simple, don't need process" | Simple issues have root causes too. Process is fast for simple bugs. |
251
+ | "Emergency, no time for process" | Systematic debugging is FASTER than guess-and-check thrashing. |
252
+ | "Just try this first, then investigate" | First fix sets the pattern. Do it right from the start. |
253
+ | "I'll write test after confirming fix works" | Untested fixes don't stick. Test first proves it. |
254
+ | "Multiple fixes at once saves time" | Can't isolate what worked. Causes new bugs. |
255
+ | "Reference too long, I'll adapt the pattern" | Partial understanding guarantees bugs. Read it completely. |
256
+ | "I see the problem, let me fix it" | Seeing symptoms ≠ understanding root cause. |
257
+ | "One more fix attempt" (after 2+ failures) | 3+ failures = architectural problem. Question pattern, don't fix again. |
258
+
259
+ ## Quick Reference
260
+
261
+ | Phase | Key Activities | Success Criteria |
262
+ |-------|---------------|------------------|
263
+ | **1. Root Cause** | Read errors, reproduce, check changes, gather evidence | Understand WHAT and WHY |
264
+ | **2. Pattern** | Find working examples, compare | Identify differences |
265
+ | **3. Hypothesis** | Form theory, test minimally | Confirmed or new hypothesis |
266
+ | **4. Implementation** | Create test, fix, verify | Bug resolved, tests pass |
267
+
268
+ ## When Process Reveals "No Root Cause"
269
+
270
+ If systematic investigation reveals issue is truly environmental, timing-dependent, or external:
271
+
272
+ 1. You've completed the process
273
+ 2. Document what you investigated
274
+ 3. Implement appropriate handling (retry, timeout, error message)
275
+ 4. Add monitoring/logging for future investigation
276
+
277
+ **But:** 95% of "no root cause" cases are incomplete investigation.
278
+
279
+ ## Supporting Techniques
280
+
281
+ These techniques are part of systematic debugging and available in this directory:
282
+
283
+ - **`root-cause-tracing.md`** - Trace bugs backward through call stack to find original trigger
284
+ - **`defense-in-depth.md`** - Add validation at multiple layers after finding root cause
285
+ - **`condition-based-waiting.md`** - Replace arbitrary timeouts with condition polling
286
+
287
+ **Related skills:**
288
+ - **compose:tdd** - For creating failing test case (Phase 4, Step 1)
289
+ - **compose:verify** - Verify fix worked before claiming success
290
+
291
+ ## Real-World Impact
292
+
293
+ From debugging sessions:
294
+ - Systematic approach: 15-30 minutes to fix
295
+ - Random fixes approach: 2-3 hours of thrashing
296
+ - First-time fix rate: 95% vs 40%
297
+ - New bugs introduced: Near zero vs common
@@ -0,0 +1,183 @@
1
+ ---
2
+ name: compose:plan
3
+ hidden: true
4
+ description: Use when you have a spec or requirements for a multi-step task, before touching code
5
+ ---
6
+
7
+ # Writing Plans
8
+
9
+ ## Overview
10
+
11
+ Write comprehensive implementation plans assuming the engineer has zero context for our codebase and questionable taste. Document everything they need to know: which files to touch for each task, code, testing, docs they might need to check, how to test it. Give them the whole plan as bite-sized tasks. DRY. YAGNI. TDD. Frequent commits.
12
+
13
+ Assume they are a skilled developer, but know almost nothing about our toolset or problem domain. Assume they don't know good test design very well.
14
+
15
+ **Announce at start:** "I'm using the compose:plan skill to create the implementation plan."
16
+
17
+ **Context:** If working in an isolated worktree, it should have been created via the `compose:worktree` skill at execution time.
18
+
19
+ **Save plans to:** the `plans/` directory given in the `<compose_docs_dir>` block of your prompt, as `YYYY-MM-DD-<feature-name>.md`.
20
+ - (User preferences for plan location override this default)
21
+
22
+ ## Scope Check
23
+
24
+ If the spec covers multiple independent subsystems, it should have been broken into sub-project specs during brainstorming. If it wasn't, suggest breaking this into separate plans — one per subsystem. Each plan should produce working, testable software on its own.
25
+
26
+ ## File Structure
27
+
28
+ Before defining tasks, map out which files will be created or modified and what each one is responsible for. This is where decomposition decisions get locked in.
29
+
30
+ - Design units with clear boundaries and well-defined interfaces. Each file should have one clear responsibility.
31
+ - You reason best about code you can hold in context at once, and your edits are more reliable when files are focused. Prefer smaller, focused files over large ones that do too much.
32
+ - Files that change together should live together. Split by responsibility, not by technical layer.
33
+ - In existing codebases, follow established patterns. If the codebase uses large files, don't unilaterally restructure - but if a file you're modifying has grown unwieldy, including a split in the plan is reasonable.
34
+
35
+ This structure informs the task decomposition. Each task should produce self-contained changes that make sense independently.
36
+
37
+ ## Task Right-Sizing
38
+
39
+ A task is the smallest unit that carries its own test cycle and is worth a
40
+ fresh reviewer's gate. When drawing task boundaries:
41
+
42
+ - Fold setup, configuration, scaffolding, and documentation steps into the task whose deliverable needs them
43
+ - Split only where a reviewer could meaningfully reject one task while approving its neighbor
44
+ - Each task ends with an independently testable deliverable
45
+
46
+ ## Bite-Sized Task Granularity
47
+
48
+ **Each step is one action (2-5 minutes):**
49
+ - "Write the failing test" - step
50
+ - "Run it to make sure it fails" - step
51
+ - "Implement the minimal code to make the test pass" - step
52
+ - "Run the tests and make sure they pass" - step
53
+ - "Commit" - step
54
+
55
+ ## Plan Document Header
56
+
57
+ **Every plan MUST start with this header:**
58
+
59
+ ```markdown
60
+ # [Feature Name] Implementation Plan
61
+
62
+ > **For agentic workers:** REQUIRED SUB-SKILL: Use compose:subagent (recommended) or compose:execute to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
63
+
64
+ **Goal:** [One sentence describing what this builds]
65
+
66
+ **Architecture:** [2-3 sentences about approach]
67
+
68
+ **Tech Stack:** [Key technologies/libraries]
69
+
70
+ ## Global Constraints
71
+
72
+ [Project-wide requirements that bind EVERY task — version floors, dependency
73
+ limits, naming and copy rules, platform requirements, exact values. One line
74
+ each, copied verbatim from the spec. Implementers and reviewers downstream
75
+ implicitly inherit this section without being told individually.]
76
+
77
+ ---
78
+ ```
79
+
80
+ ## Task Structure
81
+
82
+ ````markdown
83
+ ### Task N: [Component Name]
84
+
85
+ **Covers:** [S3, S7]
86
+ <!-- spec section anchors this task implements; every task that produces
87
+ spec-required behavior must list at least one. Omit only for pure
88
+ scaffolding tasks (e.g. project setup) that map to no spec section. -->
89
+
90
+ **Files:**
91
+ - Create: `exact/path/to/file.py`
92
+ - Modify: `exact/path/to/existing.py:123-145`
93
+ - Test: `tests/exact/path/to/test.py`
94
+
95
+ **Interfaces:**
96
+ - Consumes: [what this task uses from earlier tasks — exact signatures, types]
97
+ - Produces: [what later tasks rely on — exact function names, parameter and
98
+ return types. An implementer sees only its own task; this block is how it
99
+ learns the names and types neighboring tasks use.]
100
+
101
+ - [ ] **Step 1: Write the failing test**
102
+
103
+ ```python
104
+ def test_specific_behavior():
105
+ result = function(input)
106
+ assert result == expected
107
+ ```
108
+
109
+ - [ ] **Step 2: Run test to verify it fails**
110
+
111
+ Run: `pytest tests/path/test.py::test_name -v`
112
+ Expected: FAIL with "function not defined"
113
+
114
+ - [ ] **Step 3: Write minimal implementation**
115
+
116
+ ```python
117
+ def function(input):
118
+ return expected
119
+ ```
120
+
121
+ - [ ] **Step 4: Run test to verify it passes**
122
+
123
+ Run: `pytest tests/path/test.py::test_name -v`
124
+ Expected: PASS
125
+
126
+ - [ ] **Step 5: Commit**
127
+
128
+ ```bash
129
+ git add tests/path/test.py src/path/file.py
130
+ git commit -m "feat: add specific feature"
131
+ ```
132
+ ````
133
+
134
+ ## No Placeholders
135
+
136
+ Every step must contain the actual content an engineer needs. These are **plan failures** — never write them:
137
+ - "TBD", "TODO", "implement later", "fill in details"
138
+ - "Add appropriate error handling" / "add validation" / "handle edge cases"
139
+ - "Write tests for the above" (without actual test code)
140
+ - "Similar to Task N" (repeat the code — the engineer may be reading tasks out of order)
141
+ - Steps that describe what to do without showing how (code blocks required for code steps)
142
+ - References to types, functions, or methods not defined in any task
143
+
144
+ ## Remember
145
+ - Exact file paths always
146
+ - Complete code in every step — if a step changes code, show the code
147
+ - Exact commands with expected output
148
+ - DRY, YAGNI, TDD, frequent commits
149
+
150
+ ## Self-Review
151
+
152
+ After writing the complete plan, look at the spec with fresh eyes and check the plan against it. This is a checklist you run yourself — not a subagent dispatch.
153
+
154
+ **1. Spec coverage:** Skim each `[Sn]` section in the spec. Can you point to a task whose **Covers:** lists it? Every spec section must be covered by at least one task. Conversely, every `Covers:` ID must resolve to a real spec section. List any gap in either direction and add or fix the task.
155
+
156
+ **2. Placeholder scan:** Search your plan for red flags — any of the patterns from the "No Placeholders" section above. Fix them.
157
+
158
+ **3. Type consistency:** Do the types, method signatures, and property names you used in later tasks match what you defined in earlier tasks? A function called `clearLayers()` in Task 3 but `clearFullLayers()` in Task 7 is a bug.
159
+
160
+ If you find issues, fix them inline. No need to re-review — just fix and move on. If you find a spec requirement with no task, add the task.
161
+
162
+ ## Execution Handoff
163
+
164
+ After saving the plan, determine execution approach:
165
+
166
+ 1. **Check memory** for a saved `execution-style` preference in the `compose-preferences` memory file. If found (`subagent` or `inline`), use it and skip to the handler below.
167
+
168
+ 2. **If no saved preference,** ask through `compose:ask`:
169
+ - header: `Execution`
170
+ - question: `Plan saved. How would you like to execute it?`
171
+ - options:
172
+ - label: `Subagent, always`, description: `Fresh subagent per task — remember for future sessions`
173
+ - label: `Subagent, this time`, description: `Fresh subagent per task — just this once`
174
+ - label: `Inline, always`, description: `Execute in this session — remember for future sessions`
175
+ - label: `Inline, this time`, description: `Execute in this session — just this once`
176
+
177
+ If no user is available, default to Inline for ≤ 3 tasks or tightly coupled tasks, Subagent for > 3 independent tasks.
178
+
179
+ 3. **If "always" variant:** Save to the `compose-preferences` memory file as `execution-style: subagent` or `execution-style: inline`.
180
+
181
+ **If Subagent:** Use compose:subagent — fresh subagent per task + two-stage review.
182
+
183
+ **If Inline:** Use compose:execute — batch execution with checkpoints
@@ -0,0 +1,360 @@
1
+ ---
2
+ name: compose:tdd
3
+ hidden: true
4
+ description: Use when implementing any feature or bugfix, before writing implementation code
5
+ ---
6
+
7
+ # Test-Driven Development (TDD)
8
+
9
+ ## Overview
10
+
11
+ Write the test first. Watch it fail. Write minimal code to pass.
12
+
13
+ **Core principle:** If you didn't watch the test fail, you don't know if it tests the right thing.
14
+
15
+ **Violating the letter of the rules is violating the spirit of the rules.**
16
+
17
+ ## When to Use
18
+
19
+ **Always:**
20
+ - New features
21
+ - Bug fixes
22
+ - Refactoring
23
+ - Behavior changes
24
+
25
+ **Exceptions (raise them through `compose:ask`; if no user is available, use your best judgment and proceed):**
26
+ - Throwaway prototypes
27
+ - Generated code
28
+ - Configuration files
29
+
30
+ Thinking "skip TDD just this once"? Stop. That's rationalization.
31
+
32
+ ## The Iron Law
33
+
34
+ ```
35
+ NO PRODUCTION CODE WITHOUT A FAILING TEST FIRST
36
+ ```
37
+
38
+ Write code before the test? Delete it. Start over.
39
+
40
+ **No exceptions:**
41
+ - Don't keep it as "reference"
42
+ - Don't "adapt" it while writing tests
43
+ - Don't look at it
44
+ - Delete means delete
45
+
46
+ Implement fresh from tests. Period.
47
+
48
+ ## Red-Green-Refactor
49
+
50
+ **The cycle:** RED → verify fails → GREEN → verify passes → REFACTOR → verify still green → next test → RED …
51
+
52
+ 1. **RED** — Write one failing test
53
+ - Verify it fails correctly (wrong failure → fix the test, back to RED)
54
+ 2. **GREEN** — Write minimal code to pass
55
+ - Verify all tests pass (still failing → keep coding, stay in GREEN)
56
+ 3. **REFACTOR** — Clean up while staying green
57
+ - Verify tests still pass after each cleanup (broke something → undo, stay in REFACTOR)
58
+ 4. **Next** — back to RED with the next behavior
59
+
60
+ ### RED - Write Failing Test
61
+
62
+ Write one minimal test showing what should happen.
63
+
64
+ <Good>
65
+ ```typescript
66
+ test('retries failed operations 3 times', async () => {
67
+ let attempts = 0;
68
+ const operation = () => {
69
+ attempts++;
70
+ if (attempts < 3) throw new Error('fail');
71
+ return 'success';
72
+ };
73
+
74
+ const result = await retryOperation(operation);
75
+
76
+ expect(result).toBe('success');
77
+ expect(attempts).toBe(3);
78
+ });
79
+ ```
80
+ Clear name, tests real behavior, one thing
81
+ </Good>
82
+
83
+ <Bad>
84
+ ```typescript
85
+ test('retry works', async () => {
86
+ const mock = jest.fn()
87
+ .mockRejectedValueOnce(new Error())
88
+ .mockRejectedValueOnce(new Error())
89
+ .mockResolvedValueOnce('success');
90
+ await retryOperation(mock);
91
+ expect(mock).toHaveBeenCalledTimes(3);
92
+ });
93
+ ```
94
+ Vague name, tests mock not code
95
+ </Bad>
96
+
97
+ **Requirements:**
98
+ - One behavior
99
+ - Clear name
100
+ - Real code (no mocks unless unavoidable)
101
+
102
+ ### Verify RED - Watch It Fail
103
+
104
+ **MANDATORY. Never skip.**
105
+
106
+ ```bash
107
+ npm test path/to/test.test.ts
108
+ ```
109
+
110
+ Confirm:
111
+ - Test fails (not errors)
112
+ - Failure message is expected
113
+ - Fails because feature missing (not typos)
114
+
115
+ **Test passes?** You're testing existing behavior. Fix test.
116
+
117
+ **Test errors?** Fix error, re-run until it fails correctly.
118
+
119
+ ### GREEN - Minimal Code
120
+
121
+ Write simplest code to pass the test.
122
+
123
+ <Good>
124
+ ```typescript
125
+ async function retryOperation<T>(fn: () => Promise<T>): Promise<T> {
126
+ for (let i = 0; i < 3; i++) {
127
+ try {
128
+ return await fn();
129
+ } catch (e) {
130
+ if (i === 2) throw e;
131
+ }
132
+ }
133
+ throw new Error('unreachable');
134
+ }
135
+ ```
136
+ Just enough to pass
137
+ </Good>
138
+
139
+ <Bad>
140
+ ```typescript
141
+ async function retryOperation<T>(
142
+ fn: () => Promise<T>,
143
+ options?: {
144
+ maxRetries?: number;
145
+ backoff?: 'linear' | 'exponential';
146
+ onRetry?: (attempt: number) => void;
147
+ }
148
+ ): Promise<T> {
149
+ // YAGNI
150
+ }
151
+ ```
152
+ Over-engineered
153
+ </Bad>
154
+
155
+ Don't add features, refactor other code, or "improve" beyond the test.
156
+
157
+ ### Verify GREEN - Watch It Pass
158
+
159
+ **MANDATORY.**
160
+
161
+ ```bash
162
+ npm test path/to/test.test.ts
163
+ ```
164
+
165
+ Confirm:
166
+ - Test passes
167
+ - Other tests still pass
168
+ - Output pristine (no errors, warnings)
169
+
170
+ **Test fails?** Fix code, not test.
171
+
172
+ **Other tests fail?** Fix now.
173
+
174
+ ### REFACTOR - Clean Up
175
+
176
+ After green only:
177
+ - Remove duplication
178
+ - Improve names
179
+ - Extract helpers
180
+
181
+ Keep tests green. Don't add behavior.
182
+
183
+ ### Repeat
184
+
185
+ Next failing test for next feature.
186
+
187
+ ## Good Tests
188
+
189
+ | Quality | Good | Bad |
190
+ |---------|------|-----|
191
+ | **Minimal** | One thing. "and" in name? Split it. | `test('validates email and domain and whitespace')` |
192
+ | **Clear** | Name describes behavior | `test('test1')` |
193
+ | **Shows intent** | Demonstrates desired API | Obscures what code should do |
194
+
195
+ ## Why Order Matters
196
+
197
+ **"I'll write tests after to verify it works"**
198
+
199
+ Tests written after code pass immediately. Passing immediately proves nothing:
200
+ - Might test wrong thing
201
+ - Might test implementation, not behavior
202
+ - Might miss edge cases you forgot
203
+ - You never saw it catch the bug
204
+
205
+ Test-first forces you to see the test fail, proving it actually tests something.
206
+
207
+ **"I already manually tested all the edge cases"**
208
+
209
+ Manual testing is ad-hoc. You think you tested everything but:
210
+ - No record of what you tested
211
+ - Can't re-run when code changes
212
+ - Easy to forget cases under pressure
213
+ - "It worked when I tried it" ≠ comprehensive
214
+
215
+ Automated tests are systematic. They run the same way every time.
216
+
217
+ **"Deleting X hours of work is wasteful"**
218
+
219
+ Sunk cost fallacy. The time is already gone. Your choice now:
220
+ - Delete and rewrite with TDD (X more hours, high confidence)
221
+ - Keep it and add tests after (30 min, low confidence, likely bugs)
222
+
223
+ The "waste" is keeping code you can't trust. Working code without real tests is technical debt.
224
+
225
+ **"TDD is dogmatic, being pragmatic means adapting"**
226
+
227
+ TDD IS pragmatic:
228
+ - Finds bugs before commit (faster than debugging after)
229
+ - Prevents regressions (tests catch breaks immediately)
230
+ - Documents behavior (tests show how to use code)
231
+ - Enables refactoring (change freely, tests catch breaks)
232
+
233
+ "Pragmatic" shortcuts = debugging in production = slower.
234
+
235
+ **"Tests after achieve the same goals - it's spirit not ritual"**
236
+
237
+ No. Tests-after answer "What does this do?" Tests-first answer "What should this do?"
238
+
239
+ Tests-after are biased by your implementation. You test what you built, not what's required. You verify remembered edge cases, not discovered ones.
240
+
241
+ Tests-first force edge case discovery before implementing. Tests-after verify you remembered everything (you didn't).
242
+
243
+ 30 minutes of tests after ≠ TDD. You get coverage, lose proof tests work.
244
+
245
+ ## Common Rationalizations
246
+
247
+ | Excuse | Reality |
248
+ |--------|---------|
249
+ | "Too simple to test" | Simple code breaks. Test takes 30 seconds. |
250
+ | "I'll test after" | Tests passing immediately prove nothing. |
251
+ | "Tests after achieve same goals" | Tests-after = "what does this do?" Tests-first = "what should this do?" |
252
+ | "Already manually tested" | Ad-hoc ≠ systematic. No record, can't re-run. |
253
+ | "Deleting X hours is wasteful" | Sunk cost fallacy. Keeping unverified code is technical debt. |
254
+ | "Keep as reference, write tests first" | You'll adapt it. That's testing after. Delete means delete. |
255
+ | "Need to explore first" | Fine. Throw away exploration, start with TDD. |
256
+ | "Test hard = design unclear" | Listen to test. Hard to test = hard to use. |
257
+ | "TDD will slow me down" | TDD faster than debugging. Pragmatic = test-first. |
258
+ | "Manual test faster" | Manual doesn't prove edge cases. You'll re-test every change. |
259
+ | "Existing code has no tests" | You're improving it. Add tests for existing code. |
260
+
261
+ ## Red Flags - STOP and Start Over
262
+
263
+ - Code before test
264
+ - Test after implementation
265
+ - Test passes immediately
266
+ - Can't explain why test failed
267
+ - Tests added "later"
268
+ - Rationalizing "just this once"
269
+ - "I already manually tested it"
270
+ - "Tests after achieve the same purpose"
271
+ - "It's about spirit not ritual"
272
+ - "Keep as reference" or "adapt existing code"
273
+ - "Already spent X hours, deleting is wasteful"
274
+ - "TDD is dogmatic, I'm being pragmatic"
275
+ - "This is different because..."
276
+
277
+ **All of these mean: Delete code. Start over with TDD.**
278
+
279
+ ## Example: Bug Fix
280
+
281
+ **Bug:** Empty email accepted
282
+
283
+ **RED**
284
+ ```typescript
285
+ test('rejects empty email', async () => {
286
+ const result = await submitForm({ email: '' });
287
+ expect(result.error).toBe('Email required');
288
+ });
289
+ ```
290
+
291
+ **Verify RED**
292
+ ```bash
293
+ $ npm test
294
+ FAIL: expected 'Email required', got undefined
295
+ ```
296
+
297
+ **GREEN**
298
+ ```typescript
299
+ function submitForm(data: FormData) {
300
+ if (!data.email?.trim()) {
301
+ return { error: 'Email required' };
302
+ }
303
+ // ...
304
+ }
305
+ ```
306
+
307
+ **Verify GREEN**
308
+ ```bash
309
+ $ npm test
310
+ PASS
311
+ ```
312
+
313
+ **REFACTOR**
314
+ Extract validation for multiple fields if needed.
315
+
316
+ ## Verification Checklist
317
+
318
+ Before marking work complete:
319
+
320
+ - [ ] Every new function/method has a test
321
+ - [ ] Watched each test fail before implementing
322
+ - [ ] Each test failed for expected reason (feature missing, not typo)
323
+ - [ ] Wrote minimal code to pass each test
324
+ - [ ] All tests pass
325
+ - [ ] Output pristine (no errors, warnings)
326
+ - [ ] Tests use real code (mocks only if unavoidable)
327
+ - [ ] Edge cases and errors covered
328
+
329
+ Can't check all boxes? You skipped TDD. Start over.
330
+
331
+ ## When Stuck
332
+
333
+ | Problem | Solution |
334
+ |---------|----------|
335
+ | Don't know how to test | Write wished-for API. Write assertion first. Ask through `compose:ask`; if no user is available, use the simplest testable approach. |
336
+ | Test too complicated | Design too complicated. Simplify interface. |
337
+ | Must mock everything | Code too coupled. Use dependency injection. |
338
+ | Test setup huge | Extract helpers. Still complex? Simplify design. |
339
+
340
+ ## Debugging Integration
341
+
342
+ Bug found? Write failing test reproducing it. Follow TDD cycle. Test proves fix and prevents regression.
343
+
344
+ Never fix bugs without a test.
345
+
346
+ ## Testing Anti-Patterns
347
+
348
+ When adding mocks or test utilities, read @testing-anti-patterns.md to avoid common pitfalls:
349
+ - Testing mock behavior instead of real behavior
350
+ - Adding test-only methods to production classes
351
+ - Mocking without understanding dependencies
352
+
353
+ ## Final Rule
354
+
355
+ ```
356
+ Production code → test exists and failed first
357
+ Otherwise → not TDD
358
+ ```
359
+
360
+ No exceptions without your human partner's permission.
@@ -0,0 +1,140 @@
1
+ ---
2
+ name: compose:verify
3
+ hidden: true
4
+ description: Use when about to claim work is complete, fixed, or passing, before committing or creating PRs - requires running verification commands and confirming output before making any success claims; evidence before assertions always
5
+ ---
6
+
7
+ # Verification Before Completion
8
+
9
+ ## Overview
10
+
11
+ Claiming work is complete without verification is dishonesty, not efficiency.
12
+
13
+ **Core principle:** Evidence before claims, always.
14
+
15
+ **Violating the letter of this rule is violating the spirit of this rule.**
16
+
17
+ ## The Iron Law
18
+
19
+ ```
20
+ NO COMPLETION CLAIMS WITHOUT FRESH VERIFICATION EVIDENCE
21
+ ```
22
+
23
+ If you haven't run the verification command in this message, you cannot claim it passes.
24
+
25
+ ## The Gate Function
26
+
27
+ ```
28
+ BEFORE claiming any status or expressing satisfaction:
29
+
30
+ 1. IDENTIFY: What command proves this claim?
31
+ 2. RUN: Execute the FULL command (fresh, complete)
32
+ 3. READ: Full output, check exit code, count failures
33
+ 4. VERIFY: Does output confirm the claim?
34
+ - If NO: State actual status with evidence
35
+ - If YES: State claim WITH evidence
36
+ 5. ONLY THEN: Make the claim
37
+
38
+ Skip any step = lying, not verifying
39
+ ```
40
+
41
+ ## Common Failures
42
+
43
+ | Claim | Requires | Not Sufficient |
44
+ |-------|----------|----------------|
45
+ | Tests pass | Test command output: 0 failures | Previous run, "should pass" |
46
+ | Linter clean | Linter output: 0 errors | Partial check, extrapolation |
47
+ | Build succeeds | Build command: exit 0 | Linter passing, logs look good |
48
+ | Bug fixed | Test original symptom: passes | Code changed, assumed fixed |
49
+ | Regression test works | Red-green cycle verified | Test passes once |
50
+ | Agent completed | VCS diff shows changes | Agent reports "success" |
51
+ | Requirements met | Line-by-line checklist | Tests passing |
52
+
53
+ ## Red Flags - STOP
54
+
55
+ - Using "should", "probably", "seems to"
56
+ - Expressing satisfaction before verification ("Great!", "Perfect!", "Done!", etc.)
57
+ - About to commit/push/PR without verification
58
+ - Trusting agent success reports
59
+ - Relying on partial verification
60
+ - Thinking "just this once"
61
+ - Tired and wanting work over
62
+ - **ANY wording implying success without having run verification**
63
+
64
+ ## Rationalization Prevention
65
+
66
+ | Excuse | Reality |
67
+ |--------|---------|
68
+ | "Should work now" | RUN the verification |
69
+ | "I'm confident" | Confidence ≠ evidence |
70
+ | "Just this once" | No exceptions |
71
+ | "Linter passed" | Linter ≠ compiler |
72
+ | "Agent said success" | Verify independently |
73
+ | "I'm tired" | Exhaustion ≠ excuse |
74
+ | "Partial check is enough" | Partial proves nothing |
75
+ | "Different words so rule doesn't apply" | Spirit over letter |
76
+
77
+ ## Key Patterns
78
+
79
+ **Tests:**
80
+ ```
81
+ ✅ [Run test command] [See: 34/34 pass] "All tests pass"
82
+ ❌ "Should pass now" / "Looks correct"
83
+ ```
84
+
85
+ **Regression tests (TDD Red-Green):**
86
+ ```
87
+ ✅ Write → Run (pass) → Revert fix → Run (MUST FAIL) → Restore → Run (pass)
88
+ ❌ "I've written a regression test" (without red-green verification)
89
+ ```
90
+
91
+ **Build:**
92
+ ```
93
+ ✅ [Run build] [See: exit 0] "Build passes"
94
+ ❌ "Linter passed" (linter doesn't check compilation)
95
+ ```
96
+
97
+ **Requirements:**
98
+ ```
99
+ ✅ Re-read plan → Create checklist → Verify each → Report gaps or completion
100
+ ❌ "Tests pass, phase complete"
101
+ ```
102
+
103
+ **Agent delegation:**
104
+ ```
105
+ ✅ Agent reports success → Check VCS diff → Verify changes → Report actual state
106
+ ❌ Trust agent report
107
+ ```
108
+
109
+ ## Why This Matters
110
+
111
+ From 24 failure memories:
112
+ - your human partner said "I don't believe you" - trust broken
113
+ - Undefined functions shipped - would crash
114
+ - Missing requirements shipped - incomplete features
115
+ - Time wasted on false completion → redirect → rework
116
+ - Violates: "Honesty is a core value. If you lie, you'll be replaced."
117
+
118
+ ## When To Apply
119
+
120
+ **ALWAYS before:**
121
+ - ANY variation of success/completion claims
122
+ - ANY expression of satisfaction
123
+ - ANY positive statement about work state
124
+ - Committing, PR creation, task completion
125
+ - Moving to next task
126
+ - Delegating to agents
127
+
128
+ **Rule applies to:**
129
+ - Exact phrases
130
+ - Paraphrases and synonyms
131
+ - Implications of success
132
+ - ANY communication suggesting completion/correctness
133
+
134
+ ## The Bottom Line
135
+
136
+ **No shortcuts for verification.**
137
+
138
+ Run the command. Read the output. THEN claim the result.
139
+
140
+ This is non-negotiable.