npm - @super-pocock-ai/compose-workflow - Versions diffs - 2.0.0 → 2.0.6 - Mend

@super-pocock-ai/compose-workflow 2.0.0 → 2.0.6

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (6) hide show

package/package.json +10 -3
package/skills/compose/brainstorm/SKILL.md +200 -0
package/skills/compose/debug/SKILL.md +297 -0
package/skills/compose/plan/SKILL.md +183 -0
package/skills/compose/tdd/SKILL.md +360 -0
package/skills/compose/verify/SKILL.md +140 -0

package/package.json CHANGED Viewed

@@ -1,8 +1,15 @@
 {
   "name": "@super-pocock-ai/compose-workflow",
-  "version": "2.0.0",
+  "version": "2.0.6",
   "description": "Compose 工作流技能，用于编排开发流程",
-  "files": ["skills"],
-  "keywords": ["opencode", "skills", "compose", "workflow"],
+  "files": [
+    "skills"
+  ],
+  "keywords": [
+    "opencode",
+    "skills",
+    "compose",
+    "workflow"
+  ],
   "license": "MIT"
 }

package/skills/compose/brainstorm/SKILL.md ADDED Viewed

@@ -0,0 +1,200 @@
+---
+name: compose:brainstorm
+hidden: true
+description: "You MUST use this before any creative work - creating features, building components, adding functionality, or modifying behavior. Explores user intent, requirements and design before implementation."
+---
+# Brainstorming Ideas Into Designs
+Help turn ideas into fully formed designs and specs through natural collaborative dialogue.
+Start by understanding the current project context, then ask questions one at a time to refine the idea. Once you understand what you're building, present the design and get user approval.
+<HARD-GATE>
+Do NOT invoke any implementation skill, write any code, scaffold any project, or take any implementation action until you have presented a design and the user has approved it. This applies to EVERY project regardless of perceived simplicity.
+**Autonomous override:** When no user is available to approve, this HARD-GATE does not apply: skip design presentation and approval, proceed directly to implementation, treating every approval gate as auto-approved.
+</HARD-GATE>
+## Anti-Pattern: "This Is Too Simple To Need A Design"
+Every project goes through this process. A todo list, a single-function utility, a config change — all of them. "Simple" projects are where unexamined assumptions cause the most wasted work. The design can be short (a few sentences for truly simple projects), but you MUST present it and get approval.
+**Autonomous override:** When no user is available, skip this anti-pattern guidance — proceed directly to exploring context and implementing; do not present a design for approval.
+## Checklist
+You MUST create a task for each of these items and complete them in order:
+**Autonomous mode (no user available):** Skip steps 2–8. Only do: explore project context (step 1) → implement directly (invoke compose:plan or compose:execute). Do NOT present a design, ask questions, write a spec, or wait for approval.
+1. **Explore project context** — check files, docs, recent commits
+2. **Offer visual companion** (if topic will involve visual questions) — this is its own message, not combined with a clarifying question. See the Visual Companion section below.
+3. **Ask clarifying questions** — one at a time, understand purpose/constraints/success criteria
+4. **Propose 2-3 approaches** — with trade-offs and your recommendation
+5. **Present design** — in sections scaled to their complexity, get user approval after each section
+6. **Write design doc** (optional, multi-step features only) — save to the `specs/` directory given in the `<compose_docs_dir>` block of your prompt, as `YYYY-MM-DD-<topic>-design.md`, and commit. For single-step fixes or small changes, keep the design in conversation context only.
+7. **Spec self-review** (if doc written) — quick inline check for placeholders, contradictions, ambiguity, scope (see below)
+8. **User reviews written spec** (if doc written) — ask user to review the spec file before proceeding
+9. **Transition to implementation** — invoke compose:plan to create implementation plan
+## Process Flow
+- **Explore project context**
+- **Visual questions ahead?** — Yes → offer Visual Companion (own message, no other content)
+- **Ask clarifying questions**
+- **Propose 2-3 approaches**
+- **Present design sections**
+- **User approves design?** — No → revise, back to present design sections
+- **Write design doc**
+- **Spec self-review** (fix inline)
+- **User reviews spec?** — Changes requested → back to write design doc / Approved → **invoke compose:plan**
+**The terminal state is invoking compose:plan.** Do NOT invoke frontend-design, mcp-builder, or any other implementation skill. The ONLY skill you invoke after brainstorming is compose:plan.
+## The Process
+**Understanding the idea:**
+- Check out the current project state first (files, docs, recent commits)
+- Before asking detailed questions, assess scope: if the request describes multiple independent subsystems (e.g., "build a platform with chat, file storage, billing, and analytics"), flag this immediately. Don't spend questions refining details of a project that needs to be decomposed first.
+- If the project is too large for a single spec, help the user decompose into sub-projects: what are the independent pieces, how do they relate, what order should they be built? Then brainstorm the first sub-project through the normal design flow. Each sub-project gets its own spec → plan → implementation cycle.
+- For appropriately-scoped projects, ask questions one at a time to refine the idea
+- When the question has a known set of likely answers, use `compose:ask` with those answers as options
+- For open-ended questions, use `compose:ask` with 2-3 suggested answers as options — the user can always type their own answer
+- If no user is available, make reasonable assumptions from project context and proceed
+- Only one question per tool call — if a topic needs more exploration, break it into multiple questions
+- Focus on understanding: purpose, constraints, success criteria
+**Exploring approaches:**
+- Propose 2-3 different approaches with trade-offs
+- Present options conversationally with your recommendation and reasoning
+- Lead with your recommended option and explain why
+**Presenting the design:**
+- Once you believe you understand what you're building, present the design
+- Scale each section to its complexity: a few sentences if straightforward, up to 200-300 words if nuanced
+- After presenting each section, use `compose:ask`:
+  - header: `Design Review`
+  - question: `Does this <section-name> look right?`
+  - options:
+    - label: `Looks good`, description: `Approve and continue`
+    - label: `Needs changes`, description: `I have feedback`
+  If no user is available, treat as approved and continue.
+- Cover: architecture, components, data flow, error handling, testing
+- Be ready to go back and clarify if something doesn't make sense
+**Design for isolation and clarity:**
+- Break the system into smaller units that each have one clear purpose, communicate through well-defined interfaces, and can be understood and tested independently
+- For each unit, you should be able to answer: what does it do, how do you use it, and what does it depend on?
+- Can someone understand what a unit does without reading its internals? Can you change the internals without breaking consumers? If not, the boundaries need work.
+- Smaller, well-bounded units are also easier for you to work with - you reason better about code you can hold in context at once, and your edits are more reliable when files are focused. When a file grows large, that's often a signal that it's doing too much.
+**Working in existing codebases:**
+- Explore the current structure before proposing changes. Follow existing patterns.
+- Where existing code has problems that affect the work (e.g., a file that's grown too large, unclear boundaries, tangled responsibilities), include targeted improvements as part of the design - the way a good developer improves code they're working in.
+- Don't propose unrelated refactoring. Stay focused on what serves the current goal.
+## After the Design
+**Documentation (optional, multi-step features only):**
+For features with multiple tasks or significant architectural decisions:
+- Write the validated design (spec) to the `specs/` directory given in `<compose_docs_dir>`, as `YYYY-MM-DD-<topic>-design.md`
+  - (User preferences for spec location override this default)
+- Use elements-of-style:writing-clearly-and-concisely skill if available
+- Commit the design document to git
+For single bug fixes or small changes, skip the written spec — the design presented in conversation is sufficient.
+**Spec Self-Review (if doc written):**
+After writing the spec document, look at it with fresh eyes:
+1. **Placeholder scan:** Any "TBD", "TODO", incomplete sections, or vague requirements? Fix them.
+2. **Internal consistency:** Do any sections contradict each other? Does the architecture match the feature descriptions?
+3. **Scope check:** Is this focused enough for a single implementation plan, or does it need decomposition?
+4. **Ambiguity check:** Could any requirement be interpreted two different ways? If so, pick one and make it explicit.
+Fix any issues inline. No need to re-review — just fix and move on.
+**User Review Gate (if doc written):**
+After spec self-review passes, use `compose:ask`:
+- header: `Spec Review`
+- question: `Spec written and committed to <path>. Ready to proceed?`
+- options:
+  - label: `Approved`, description: `Proceed to compose:plan`
+  - label: `Changes needed`, description: `I have revisions`
+If no user is available, treat as approved and invoke compose:plan.
+If "Changes needed" or custom feedback, apply changes and re-run spec review. Only proceed on approval.
+**Implementation:**
+- Invoke compose:plan to create a detailed implementation plan
+- Do NOT invoke any other skill. compose:plan is the next step.
+## Spec Section Anchors
+When writing the spec, give every `##` section heading a stable anchor ID so downstream plan tasks and reviewers can reference exact spec locations. Put the ID at the start of the heading text:
+```markdown
+## [S1] Problem
+## [S2] Solution overview
+## [S3] Coverage gate behavior
+```
+Rules:
+- **ID format** is `S` followed by a number (`[S1]`, `[S2]`, `[S3]`, ...), unique within the spec — no two sections share an ID, and no section is left without one.
+- **Number sections in document order** when first authoring the spec (top section is `[S1]`, the next is `[S2]`, and so on).
+- **The ID is stable.** If a heading is later reworded, keep its existing ID and do NOT renumber the other sections. Downstream `covers:` references and review verdicts depend on these IDs not drifting — renumbering would silently break every reference that points at them.
+These anchors are the index the plan and reviewers use to trace each task and each review verdict back to the exact spec section it serves.
+## Key Principles
+- **One question at a time** - Don't overwhelm with multiple questions
+- **Multiple choice preferred** - Easier to answer than open-ended when possible
+- **YAGNI ruthlessly** - Remove unnecessary features from all designs
+- **Explore alternatives** - Always propose 2-3 approaches before settling
+- **Incremental validation** - Present design, get approval before moving on
+- **Be flexible** - Go back and clarify when something doesn't make sense
+## Visual Companion
+A browser-based companion for showing mockups, diagrams, and visual options during brainstorming. Available as a tool — not a mode. Accepting the companion means it's available for questions that benefit from visual treatment; it does NOT mean every question goes through the browser.
+**Offering the companion:** When you anticipate visual content (mockups, layouts, diagrams):
+1. **Check memory** for a `visual-companion` preference in the `compose-preferences` memory file. If found, honor it.
+2. **If no saved preference,** offer consent using `compose:ask` (this MUST be its own message — do not combine with other content):
+   - header: `Visual Companion`
+   - question: `Some upcoming questions may benefit from browser-based mockups and diagrams. This feature is token-intensive and requires opening a local URL.`
+   - options:
+     - label: `Yes, always`, description: `Enable visuals for this and future sessions`
+     - label: `No, never`, description: `Skip visuals for this and future sessions`
+     - label: `Yes, this time`, description: `Enable visuals for this session only`
+     - label: `No, this time`, description: `Skip visuals for this session only`
+   If no user is available, skip the visual companion and use text-only.
+3. **If "Yes, always" or "No, never":** Save to the `compose-preferences` memory file.
+If declined, proceed with text-only brainstorming.
+**Per-question decision:** Even after the user accepts, decide FOR EACH QUESTION whether to use the browser or the terminal. The test: **would the user understand this better by seeing it than reading it?**
+- **Use the browser** for content that IS visual — mockups, wireframes, layout comparisons, architecture diagrams, side-by-side visual designs
+- **Use the terminal** for content that is text — requirements questions, conceptual choices, tradeoff lists, A/B/C/D text options, scope decisions
+A question about a UI topic is not automatically a visual question. "What does personality mean in this context?" is a conceptual question — use the terminal. "Which wizard layout works better?" is a visual question — use the browser.
+If they agree to the companion, read the detailed guide before proceeding:
+`<compose:brainstorm>/visual-companion.md`

package/skills/compose/debug/SKILL.md ADDED Viewed

@@ -0,0 +1,297 @@
+---
+name: compose:debug
+hidden: true
+description: Use when encountering any bug, test failure, or unexpected behavior, before proposing fixes
+---
+# Systematic Debugging
+## Overview
+Random fixes waste time and create new bugs. Quick patches mask underlying issues.
+**Core principle:** ALWAYS find root cause before attempting fixes. Symptom fixes are failure.
+**Violating the letter of this process is violating the spirit of debugging.**
+## The Iron Law
+```
+NO FIXES WITHOUT ROOT CAUSE INVESTIGATION FIRST
+```
+If you haven't completed Phase 1, you cannot propose fixes.
+## When to Use
+Use for ANY technical issue:
+- Test failures
+- Bugs in production
+- Unexpected behavior
+- Performance problems
+- Build failures
+- Integration issues
+**Use this ESPECIALLY when:**
+- Under time pressure (emergencies make guessing tempting)
+- "Just one quick fix" seems obvious
+- You've already tried multiple fixes
+- Previous fix didn't work
+- You don't fully understand the issue
+**Don't skip when:**
+- Issue seems simple (simple bugs have root causes too)
+- You're in a hurry (rushing guarantees rework)
+- Manager wants it fixed NOW (systematic is faster than thrashing)
+## The Four Phases
+You MUST complete each phase before proceeding to the next.
+### Phase 1: Root Cause Investigation
+**BEFORE attempting ANY fix:**
+1. **Read Error Messages Carefully**
+   - Don't skip past errors or warnings
+   - They often contain the exact solution
+   - Read stack traces completely
+   - Note line numbers, file paths, error codes
+2. **Reproduce Consistently**
+   - Can you trigger it reliably?
+   - What are the exact steps?
+   - Does it happen every time?
+   - If not reproducible → gather more data, don't guess
+3. **Check Recent Changes**
+   - What changed that could cause this?
+   - Git diff, recent commits
+   - New dependencies, config changes
+   - Environmental differences
+4. **Gather Evidence in Multi-Component Systems**
+   **WHEN system has multiple components (CI → build → signing, API → service → database):**
+   **BEFORE proposing fixes, add diagnostic instrumentation:**
+   ```
+   For EACH component boundary:
+     - Log what data enters component
+     - Log what data exits component
+     - Verify environment/config propagation
+     - Check state at each layer
+   Run once to gather evidence showing WHERE it breaks
+   THEN analyze evidence to identify failing component
+   THEN investigate that specific component
+   ```
+   **Example (multi-layer system):**
+   ```bash
+   # Layer 1: Workflow
+   echo "=== Secrets available in workflow: ==="
+   echo "IDENTITY: ${IDENTITY:+SET}${IDENTITY:-UNSET}"
+   # Layer 2: Build script
+   echo "=== Env vars in build script: ==="
+   env | grep IDENTITY || echo "IDENTITY not in environment"
+   # Layer 3: Signing script
+   echo "=== Keychain state: ==="
+   security list-keychains
+   security find-identity -v
+   # Layer 4: Actual signing
+   codesign --sign "$IDENTITY" --verbose=4 "$APP"
+   ```
+   **This reveals:** Which layer fails (secrets → workflow ✓, workflow → build ✗)
+5. **Trace Data Flow**
+   **WHEN error is deep in call stack:**
+   See `root-cause-tracing.md` in this directory for the complete backward tracing technique.
+   **Quick version:**
+   - Where does bad value originate?
+   - What called this with bad value?
+   - Keep tracing up until you find the source
+   - Fix at source, not at symptom
+### Phase 2: Pattern Analysis
+**Find the pattern before fixing:**
+1. **Find Working Examples**
+   - Locate similar working code in same codebase
+   - What works that's similar to what's broken?
+2. **Compare Against References**
+   - If implementing pattern, read reference implementation COMPLETELY
+   - Don't skim - read every line
+   - Understand the pattern fully before applying
+3. **Identify Differences**
+   - What's different between working and broken?
+   - List every difference, however small
+   - Don't assume "that can't matter"
+4. **Understand Dependencies**
+   - What other components does this need?
+   - What settings, config, environment?
+   - What assumptions does it make?
+### Phase 3: Hypothesis and Testing
+**Scientific method:**
+1. **Form Single Hypothesis**
+   - State clearly: "I think X is the root cause because Y"
+   - Write it down
+   - Be specific, not vague
+2. **Test Minimally**
+   - Make the SMALLEST possible change to test hypothesis
+   - One variable at a time
+   - Don't fix multiple things at once
+3. **Verify Before Continuing**
+   - Did it work? Yes → Phase 4
+   - Didn't work? Form NEW hypothesis
+   - DON'T add more fixes on top
+4. **When You Don't Know**
+   - Say "I don't understand X"
+   - Don't pretend to know
+   - Ask for help through `compose:ask` — present what you've tried and offer structured next-step options. If no user is available, take the most promising next step and continue.
+   - Research more
+### Phase 4: Implementation
+**Fix the root cause, not the symptom:**
+1. **Create Failing Test Case**
+   - Simplest possible reproduction
+   - Automated test if possible
+   - One-off test script if no framework
+   - MUST have before fixing
+   - Use the `compose:tdd` skill for writing proper failing tests
+2. **Implement Single Fix**
+   - Address the root cause identified
+   - ONE change at a time
+   - No "while I'm here" improvements
+   - No bundled refactoring
+3. **Verify Fix**
+   - Test passes now?
+   - No other tests broken?
+   - Issue actually resolved?
+4. **If Fix Doesn't Work**
+   - STOP
+   - Count: How many fixes have you tried?
+   - If < 3: Return to Phase 1, re-analyze with new information
+   - **If ≥ 3: STOP and question the architecture (step 5 below)**
+   - DON'T attempt Fix #4 without architectural discussion
+5. **If 3+ Fixes Failed: Question Architecture**
+   **Pattern indicating architectural problem:**
+   - Each fix reveals new shared state/coupling/problem in different place
+   - Fixes require "massive refactoring" to implement
+   - Each fix creates new symptoms elsewhere
+   **STOP and question fundamentals:**
+   - Is this pattern fundamentally sound?
+   - Are we "sticking with it through sheer inertia"?
+   - Should we refactor architecture vs. continue fixing symptoms?
+   **Use `compose:ask` to present the architectural concern** with options like "Continue fixing" / "Propose refactor" / "Discuss first". If no user is available, choose "Propose refactor" and proceed.
+   This is NOT a failed hypothesis - this is a wrong architecture.
+## Red Flags - STOP and Follow Process
+If you catch yourself thinking:
+- "Quick fix for now, investigate later"
+- "Just try changing X and see if it works"
+- "Add multiple changes, run tests"
+- "Skip the test, I'll manually verify"
+- "It's probably X, let me fix that"
+- "I don't fully understand but this might work"
+- "Pattern says X but I'll adapt it differently"
+- "Here are the main problems: [lists fixes without investigation]"
+- Proposing solutions before tracing data flow
+- **"One more fix attempt" (when already tried 2+)**
+- **Each fix reveals new problem in different place**
+**ALL of these mean: STOP. Return to Phase 1.**
+**If 3+ fixes failed:** Question the architecture (see Phase 4.5)
+## your human partner's Signals You're Doing It Wrong
+**Watch for these redirections:**
+- "Is that not happening?" - You assumed without verifying
+- "Will it show us...?" - You should have added evidence gathering
+- "Stop guessing" - You're proposing fixes without understanding
+- "Ultrathink this" - Question fundamentals, not just symptoms
+- "We're stuck?" (frustrated) - Your approach isn't working
+**When you see these:** STOP. Return to Phase 1.
+## Common Rationalizations
+| Excuse | Reality |
+|--------|---------|
+| "Issue is simple, don't need process" | Simple issues have root causes too. Process is fast for simple bugs. |
+| "Emergency, no time for process" | Systematic debugging is FASTER than guess-and-check thrashing. |
+| "Just try this first, then investigate" | First fix sets the pattern. Do it right from the start. |
+| "I'll write test after confirming fix works" | Untested fixes don't stick. Test first proves it. |
+| "Multiple fixes at once saves time" | Can't isolate what worked. Causes new bugs. |
+| "Reference too long, I'll adapt the pattern" | Partial understanding guarantees bugs. Read it completely. |
+| "I see the problem, let me fix it" | Seeing symptoms ≠ understanding root cause. |
+| "One more fix attempt" (after 2+ failures) | 3+ failures = architectural problem. Question pattern, don't fix again. |
+## Quick Reference
+| Phase | Key Activities | Success Criteria |
+|-------|---------------|------------------|
+| **1. Root Cause** | Read errors, reproduce, check changes, gather evidence | Understand WHAT and WHY |
+| **2. Pattern** | Find working examples, compare | Identify differences |
+| **3. Hypothesis** | Form theory, test minimally | Confirmed or new hypothesis |
+| **4. Implementation** | Create test, fix, verify | Bug resolved, tests pass |
+## When Process Reveals "No Root Cause"
+If systematic investigation reveals issue is truly environmental, timing-dependent, or external:
+1. You've completed the process
+2. Document what you investigated
+3. Implement appropriate handling (retry, timeout, error message)
+4. Add monitoring/logging for future investigation
+**But:** 95% of "no root cause" cases are incomplete investigation.
+## Supporting Techniques
+These techniques are part of systematic debugging and available in this directory:
+- **`root-cause-tracing.md`** - Trace bugs backward through call stack to find original trigger
+- **`defense-in-depth.md`** - Add validation at multiple layers after finding root cause
+- **`condition-based-waiting.md`** - Replace arbitrary timeouts with condition polling
+**Related skills:**
+- **compose:tdd** - For creating failing test case (Phase 4, Step 1)
+- **compose:verify** - Verify fix worked before claiming success
+## Real-World Impact
+From debugging sessions:
+- Systematic approach: 15-30 minutes to fix
+- Random fixes approach: 2-3 hours of thrashing
+- First-time fix rate: 95% vs 40%
+- New bugs introduced: Near zero vs common

package/skills/compose/plan/SKILL.md ADDED Viewed

@@ -0,0 +1,183 @@
+---
+name: compose:plan
+hidden: true
+description: Use when you have a spec or requirements for a multi-step task, before touching code
+---
+# Writing Plans
+## Overview
+Write comprehensive implementation plans assuming the engineer has zero context for our codebase and questionable taste. Document everything they need to know: which files to touch for each task, code, testing, docs they might need to check, how to test it. Give them the whole plan as bite-sized tasks. DRY. YAGNI. TDD. Frequent commits.
+Assume they are a skilled developer, but know almost nothing about our toolset or problem domain. Assume they don't know good test design very well.
+**Announce at start:** "I'm using the compose:plan skill to create the implementation plan."
+**Context:** If working in an isolated worktree, it should have been created via the `compose:worktree` skill at execution time.
+**Save plans to:** the `plans/` directory given in the `<compose_docs_dir>` block of your prompt, as `YYYY-MM-DD-<feature-name>.md`.
+- (User preferences for plan location override this default)
+## Scope Check
+If the spec covers multiple independent subsystems, it should have been broken into sub-project specs during brainstorming. If it wasn't, suggest breaking this into separate plans — one per subsystem. Each plan should produce working, testable software on its own.
+## File Structure
+Before defining tasks, map out which files will be created or modified and what each one is responsible for. This is where decomposition decisions get locked in.
+- Design units with clear boundaries and well-defined interfaces. Each file should have one clear responsibility.
+- You reason best about code you can hold in context at once, and your edits are more reliable when files are focused. Prefer smaller, focused files over large ones that do too much.
+- Files that change together should live together. Split by responsibility, not by technical layer.
+- In existing codebases, follow established patterns. If the codebase uses large files, don't unilaterally restructure - but if a file you're modifying has grown unwieldy, including a split in the plan is reasonable.
+This structure informs the task decomposition. Each task should produce self-contained changes that make sense independently.
+## Task Right-Sizing
+A task is the smallest unit that carries its own test cycle and is worth a
+fresh reviewer's gate. When drawing task boundaries:
+- Fold setup, configuration, scaffolding, and documentation steps into the task whose deliverable needs them
+- Split only where a reviewer could meaningfully reject one task while approving its neighbor
+- Each task ends with an independently testable deliverable
+## Bite-Sized Task Granularity
+**Each step is one action (2-5 minutes):**
+- "Write the failing test" - step
+- "Run it to make sure it fails" - step
+- "Implement the minimal code to make the test pass" - step
+- "Run the tests and make sure they pass" - step
+- "Commit" - step
+## Plan Document Header
+**Every plan MUST start with this header:**
+```markdown
+# [Feature Name] Implementation Plan
+> **For agentic workers:** REQUIRED SUB-SKILL: Use compose:subagent (recommended) or compose:execute to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
+**Goal:** [One sentence describing what this builds]
+**Architecture:** [2-3 sentences about approach]
+**Tech Stack:** [Key technologies/libraries]
+## Global Constraints
+[Project-wide requirements that bind EVERY task — version floors, dependency
+limits, naming and copy rules, platform requirements, exact values. One line
+each, copied verbatim from the spec. Implementers and reviewers downstream
+implicitly inherit this section without being told individually.]
+---
+```
+## Task Structure
+````markdown
+### Task N: [Component Name]
+**Covers:** [S3, S7]
+<!-- spec section anchors this task implements; every task that produces
+     spec-required behavior must list at least one. Omit only for pure
+     scaffolding tasks (e.g. project setup) that map to no spec section. -->
+**Files:**
+- Create: `exact/path/to/file.py`
+- Modify: `exact/path/to/existing.py:123-145`
+- Test: `tests/exact/path/to/test.py`
+**Interfaces:**
+- Consumes: [what this task uses from earlier tasks — exact signatures, types]
+- Produces: [what later tasks rely on — exact function names, parameter and
+  return types. An implementer sees only its own task; this block is how it
+  learns the names and types neighboring tasks use.]
+- [ ] **Step 1: Write the failing test**
+```python
+def test_specific_behavior():
+    result = function(input)
+    assert result == expected
+```
+- [ ] **Step 2: Run test to verify it fails**
+Run: `pytest tests/path/test.py::test_name -v`
+Expected: FAIL with "function not defined"
+- [ ] **Step 3: Write minimal implementation**
+```python
+def function(input):
+    return expected
+```
+- [ ] **Step 4: Run test to verify it passes**
+Run: `pytest tests/path/test.py::test_name -v`
+Expected: PASS
+- [ ] **Step 5: Commit**
+```bash
+git add tests/path/test.py src/path/file.py
+git commit -m "feat: add specific feature"
+```
+````
+## No Placeholders
+Every step must contain the actual content an engineer needs. These are **plan failures** — never write them:
+- "TBD", "TODO", "implement later", "fill in details"
+- "Add appropriate error handling" / "add validation" / "handle edge cases"
+- "Write tests for the above" (without actual test code)
+- "Similar to Task N" (repeat the code — the engineer may be reading tasks out of order)
+- Steps that describe what to do without showing how (code blocks required for code steps)
+- References to types, functions, or methods not defined in any task
+## Remember
+- Exact file paths always
+- Complete code in every step — if a step changes code, show the code
+- Exact commands with expected output
+- DRY, YAGNI, TDD, frequent commits
+## Self-Review
+After writing the complete plan, look at the spec with fresh eyes and check the plan against it. This is a checklist you run yourself — not a subagent dispatch.
+**1. Spec coverage:** Skim each `[Sn]` section in the spec. Can you point to a task whose **Covers:** lists it? Every spec section must be covered by at least one task. Conversely, every `Covers:` ID must resolve to a real spec section. List any gap in either direction and add or fix the task.
+**2. Placeholder scan:** Search your plan for red flags — any of the patterns from the "No Placeholders" section above. Fix them.
+**3. Type consistency:** Do the types, method signatures, and property names you used in later tasks match what you defined in earlier tasks? A function called `clearLayers()` in Task 3 but `clearFullLayers()` in Task 7 is a bug.
+If you find issues, fix them inline. No need to re-review — just fix and move on. If you find a spec requirement with no task, add the task.
+## Execution Handoff
+After saving the plan, determine execution approach:
+1. **Check memory** for a saved `execution-style` preference in the `compose-preferences` memory file. If found (`subagent` or `inline`), use it and skip to the handler below.
+2. **If no saved preference,** ask through `compose:ask`:
+   - header: `Execution`
+   - question: `Plan saved. How would you like to execute it?`
+   - options:
+     - label: `Subagent, always`, description: `Fresh subagent per task — remember for future sessions`
+     - label: `Subagent, this time`, description: `Fresh subagent per task — just this once`
+     - label: `Inline, always`, description: `Execute in this session — remember for future sessions`
+     - label: `Inline, this time`, description: `Execute in this session — just this once`
+   If no user is available, default to Inline for ≤ 3 tasks or tightly coupled tasks, Subagent for > 3 independent tasks.
+3. **If "always" variant:** Save to the `compose-preferences` memory file as `execution-style: subagent` or `execution-style: inline`.
+**If Subagent:** Use compose:subagent — fresh subagent per task + two-stage review.
+**If Inline:** Use compose:execute — batch execution with checkpoints

package/skills/compose/tdd/SKILL.md ADDED Viewed

@@ -0,0 +1,360 @@
+---
+name: compose:tdd
+hidden: true
+description: Use when implementing any feature or bugfix, before writing implementation code
+---
+# Test-Driven Development (TDD)
+## Overview
+Write the test first. Watch it fail. Write minimal code to pass.
+**Core principle:** If you didn't watch the test fail, you don't know if it tests the right thing.
+**Violating the letter of the rules is violating the spirit of the rules.**
+## When to Use
+**Always:**
+- New features
+- Bug fixes
+- Refactoring
+- Behavior changes
+**Exceptions (raise them through `compose:ask`; if no user is available, use your best judgment and proceed):**
+- Throwaway prototypes
+- Generated code
+- Configuration files
+Thinking "skip TDD just this once"? Stop. That's rationalization.
+## The Iron Law
+```
+NO PRODUCTION CODE WITHOUT A FAILING TEST FIRST
+```
+Write code before the test? Delete it. Start over.
+**No exceptions:**
+- Don't keep it as "reference"
+- Don't "adapt" it while writing tests
+- Don't look at it
+- Delete means delete
+Implement fresh from tests. Period.
+## Red-Green-Refactor
+**The cycle:** RED → verify fails → GREEN → verify passes → REFACTOR → verify still green → next test → RED …
+1. **RED** — Write one failing test
+   - Verify it fails correctly (wrong failure → fix the test, back to RED)
+2. **GREEN** — Write minimal code to pass
+   - Verify all tests pass (still failing → keep coding, stay in GREEN)
+3. **REFACTOR** — Clean up while staying green
+   - Verify tests still pass after each cleanup (broke something → undo, stay in REFACTOR)
+4. **Next** — back to RED with the next behavior
+### RED - Write Failing Test
+Write one minimal test showing what should happen.
+<Good>
+```typescript
+test('retries failed operations 3 times', async () => {
+  let attempts = 0;
+  const operation = () => {
+    attempts++;
+    if (attempts < 3) throw new Error('fail');
+    return 'success';
+  };
+  const result = await retryOperation(operation);
+  expect(result).toBe('success');
+  expect(attempts).toBe(3);
+});
+```
+Clear name, tests real behavior, one thing
+</Good>
+<Bad>
+```typescript
+test('retry works', async () => {
+  const mock = jest.fn()
+    .mockRejectedValueOnce(new Error())
+    .mockRejectedValueOnce(new Error())
+    .mockResolvedValueOnce('success');
+  await retryOperation(mock);
+  expect(mock).toHaveBeenCalledTimes(3);
+});
+```
+Vague name, tests mock not code
+</Bad>
+**Requirements:**
+- One behavior
+- Clear name
+- Real code (no mocks unless unavoidable)
+### Verify RED - Watch It Fail
+**MANDATORY. Never skip.**
+```bash
+npm test path/to/test.test.ts
+```
+Confirm:
+- Test fails (not errors)
+- Failure message is expected
+- Fails because feature missing (not typos)
+**Test passes?** You're testing existing behavior. Fix test.
+**Test errors?** Fix error, re-run until it fails correctly.
+### GREEN - Minimal Code
+Write simplest code to pass the test.
+<Good>
+```typescript
+async function retryOperation<T>(fn: () => Promise<T>): Promise<T> {
+  for (let i = 0; i < 3; i++) {
+    try {
+      return await fn();
+    } catch (e) {
+      if (i === 2) throw e;
+    }
+  }
+  throw new Error('unreachable');
+}
+```
+Just enough to pass
+</Good>
+<Bad>
+```typescript
+async function retryOperation<T>(
+  fn: () => Promise<T>,
+  options?: {
+    maxRetries?: number;
+    backoff?: 'linear' | 'exponential';
+    onRetry?: (attempt: number) => void;
+  }
+): Promise<T> {
+  // YAGNI
+}
+```
+Over-engineered
+</Bad>
+Don't add features, refactor other code, or "improve" beyond the test.
+### Verify GREEN - Watch It Pass
+**MANDATORY.**
+```bash
+npm test path/to/test.test.ts
+```
+Confirm:
+- Test passes
+- Other tests still pass
+- Output pristine (no errors, warnings)
+**Test fails?** Fix code, not test.
+**Other tests fail?** Fix now.
+### REFACTOR - Clean Up
+After green only:
+- Remove duplication
+- Improve names
+- Extract helpers
+Keep tests green. Don't add behavior.
+### Repeat
+Next failing test for next feature.
+## Good Tests
+| Quality | Good | Bad |
+|---------|------|-----|
+| **Minimal** | One thing. "and" in name? Split it. | `test('validates email and domain and whitespace')` |
+| **Clear** | Name describes behavior | `test('test1')` |
+| **Shows intent** | Demonstrates desired API | Obscures what code should do |
+## Why Order Matters
+**"I'll write tests after to verify it works"**
+Tests written after code pass immediately. Passing immediately proves nothing:
+- Might test wrong thing
+- Might test implementation, not behavior
+- Might miss edge cases you forgot
+- You never saw it catch the bug
+Test-first forces you to see the test fail, proving it actually tests something.
+**"I already manually tested all the edge cases"**
+Manual testing is ad-hoc. You think you tested everything but:
+- No record of what you tested
+- Can't re-run when code changes
+- Easy to forget cases under pressure
+- "It worked when I tried it" ≠ comprehensive
+Automated tests are systematic. They run the same way every time.
+**"Deleting X hours of work is wasteful"**
+Sunk cost fallacy. The time is already gone. Your choice now:
+- Delete and rewrite with TDD (X more hours, high confidence)
+- Keep it and add tests after (30 min, low confidence, likely bugs)
+The "waste" is keeping code you can't trust. Working code without real tests is technical debt.
+**"TDD is dogmatic, being pragmatic means adapting"**
+TDD IS pragmatic:
+- Finds bugs before commit (faster than debugging after)
+- Prevents regressions (tests catch breaks immediately)
+- Documents behavior (tests show how to use code)
+- Enables refactoring (change freely, tests catch breaks)
+"Pragmatic" shortcuts = debugging in production = slower.
+**"Tests after achieve the same goals - it's spirit not ritual"**
+No. Tests-after answer "What does this do?" Tests-first answer "What should this do?"
+Tests-after are biased by your implementation. You test what you built, not what's required. You verify remembered edge cases, not discovered ones.
+Tests-first force edge case discovery before implementing. Tests-after verify you remembered everything (you didn't).
+30 minutes of tests after ≠ TDD. You get coverage, lose proof tests work.
+## Common Rationalizations
+| Excuse | Reality |
+|--------|---------|
+| "Too simple to test" | Simple code breaks. Test takes 30 seconds. |
+| "I'll test after" | Tests passing immediately prove nothing. |
+| "Tests after achieve same goals" | Tests-after = "what does this do?" Tests-first = "what should this do?" |
+| "Already manually tested" | Ad-hoc ≠ systematic. No record, can't re-run. |
+| "Deleting X hours is wasteful" | Sunk cost fallacy. Keeping unverified code is technical debt. |
+| "Keep as reference, write tests first" | You'll adapt it. That's testing after. Delete means delete. |
+| "Need to explore first" | Fine. Throw away exploration, start with TDD. |
+| "Test hard = design unclear" | Listen to test. Hard to test = hard to use. |
+| "TDD will slow me down" | TDD faster than debugging. Pragmatic = test-first. |
+| "Manual test faster" | Manual doesn't prove edge cases. You'll re-test every change. |
+| "Existing code has no tests" | You're improving it. Add tests for existing code. |
+## Red Flags - STOP and Start Over
+- Code before test
+- Test after implementation
+- Test passes immediately
+- Can't explain why test failed
+- Tests added "later"
+- Rationalizing "just this once"
+- "I already manually tested it"
+- "Tests after achieve the same purpose"
+- "It's about spirit not ritual"
+- "Keep as reference" or "adapt existing code"
+- "Already spent X hours, deleting is wasteful"
+- "TDD is dogmatic, I'm being pragmatic"
+- "This is different because..."
+**All of these mean: Delete code. Start over with TDD.**
+## Example: Bug Fix
+**Bug:** Empty email accepted
+**RED**
+```typescript
+test('rejects empty email', async () => {
+  const result = await submitForm({ email: '' });
+  expect(result.error).toBe('Email required');
+});
+```
+**Verify RED**
+```bash
+$ npm test
+FAIL: expected 'Email required', got undefined
+```
+**GREEN**
+```typescript
+function submitForm(data: FormData) {
+  if (!data.email?.trim()) {
+    return { error: 'Email required' };
+  }
+  // ...
+}
+```
+**Verify GREEN**
+```bash
+$ npm test
+PASS
+```
+**REFACTOR**
+Extract validation for multiple fields if needed.
+## Verification Checklist
+Before marking work complete:
+- [ ] Every new function/method has a test
+- [ ] Watched each test fail before implementing
+- [ ] Each test failed for expected reason (feature missing, not typo)
+- [ ] Wrote minimal code to pass each test
+- [ ] All tests pass
+- [ ] Output pristine (no errors, warnings)
+- [ ] Tests use real code (mocks only if unavoidable)
+- [ ] Edge cases and errors covered
+Can't check all boxes? You skipped TDD. Start over.
+## When Stuck
+| Problem | Solution |
+|---------|----------|
+| Don't know how to test | Write wished-for API. Write assertion first. Ask through `compose:ask`; if no user is available, use the simplest testable approach. |
+| Test too complicated | Design too complicated. Simplify interface. |
+| Must mock everything | Code too coupled. Use dependency injection. |
+| Test setup huge | Extract helpers. Still complex? Simplify design. |
+## Debugging Integration
+Bug found? Write failing test reproducing it. Follow TDD cycle. Test proves fix and prevents regression.
+Never fix bugs without a test.
+## Testing Anti-Patterns
+When adding mocks or test utilities, read @testing-anti-patterns.md to avoid common pitfalls:
+- Testing mock behavior instead of real behavior
+- Adding test-only methods to production classes
+- Mocking without understanding dependencies
+## Final Rule
+```
+Production code → test exists and failed first
+Otherwise → not TDD
+```
+No exceptions without your human partner's permission.

package/skills/compose/verify/SKILL.md ADDED Viewed

@@ -0,0 +1,140 @@
+---
+name: compose:verify
+hidden: true
+description: Use when about to claim work is complete, fixed, or passing, before committing or creating PRs - requires running verification commands and confirming output before making any success claims; evidence before assertions always
+---
+# Verification Before Completion
+## Overview
+Claiming work is complete without verification is dishonesty, not efficiency.
+**Core principle:** Evidence before claims, always.
+**Violating the letter of this rule is violating the spirit of this rule.**
+## The Iron Law
+```
+NO COMPLETION CLAIMS WITHOUT FRESH VERIFICATION EVIDENCE
+```
+If you haven't run the verification command in this message, you cannot claim it passes.
+## The Gate Function
+```
+BEFORE claiming any status or expressing satisfaction:
+1. IDENTIFY: What command proves this claim?
+2. RUN: Execute the FULL command (fresh, complete)
+3. READ: Full output, check exit code, count failures
+4. VERIFY: Does output confirm the claim?
+   - If NO: State actual status with evidence
+   - If YES: State claim WITH evidence
+5. ONLY THEN: Make the claim
+Skip any step = lying, not verifying
+```
+## Common Failures
+| Claim | Requires | Not Sufficient |
+|-------|----------|----------------|
+| Tests pass | Test command output: 0 failures | Previous run, "should pass" |
+| Linter clean | Linter output: 0 errors | Partial check, extrapolation |
+| Build succeeds | Build command: exit 0 | Linter passing, logs look good |
+| Bug fixed | Test original symptom: passes | Code changed, assumed fixed |
+| Regression test works | Red-green cycle verified | Test passes once |
+| Agent completed | VCS diff shows changes | Agent reports "success" |
+| Requirements met | Line-by-line checklist | Tests passing |
+## Red Flags - STOP
+- Using "should", "probably", "seems to"
+- Expressing satisfaction before verification ("Great!", "Perfect!", "Done!", etc.)
+- About to commit/push/PR without verification
+- Trusting agent success reports
+- Relying on partial verification
+- Thinking "just this once"
+- Tired and wanting work over
+- **ANY wording implying success without having run verification**
+## Rationalization Prevention
+| Excuse | Reality |
+|--------|---------|
+| "Should work now" | RUN the verification |
+| "I'm confident" | Confidence ≠ evidence |
+| "Just this once" | No exceptions |
+| "Linter passed" | Linter ≠ compiler |
+| "Agent said success" | Verify independently |
+| "I'm tired" | Exhaustion ≠ excuse |
+| "Partial check is enough" | Partial proves nothing |
+| "Different words so rule doesn't apply" | Spirit over letter |
+## Key Patterns
+**Tests:**
+```
+✅ [Run test command] [See: 34/34 pass] "All tests pass"
+❌ "Should pass now" / "Looks correct"
+```
+**Regression tests (TDD Red-Green):**
+```
+✅ Write → Run (pass) → Revert fix → Run (MUST FAIL) → Restore → Run (pass)
+❌ "I've written a regression test" (without red-green verification)
+```
+**Build:**
+```
+✅ [Run build] [See: exit 0] "Build passes"
+❌ "Linter passed" (linter doesn't check compilation)
+```
+**Requirements:**
+```
+✅ Re-read plan → Create checklist → Verify each → Report gaps or completion
+❌ "Tests pass, phase complete"
+```
+**Agent delegation:**
+```
+✅ Agent reports success → Check VCS diff → Verify changes → Report actual state
+❌ Trust agent report
+```
+## Why This Matters
+From 24 failure memories:
+- your human partner said "I don't believe you" - trust broken
+- Undefined functions shipped - would crash
+- Missing requirements shipped - incomplete features
+- Time wasted on false completion → redirect → rework
+- Violates: "Honesty is a core value. If you lie, you'll be replaced."
+## When To Apply
+**ALWAYS before:**
+- ANY variation of success/completion claims
+- ANY expression of satisfaction
+- ANY positive statement about work state
+- Committing, PR creation, task completion
+- Moving to next task
+- Delegating to agents
+**Rule applies to:**
+- Exact phrases
+- Paraphrases and synonyms
+- Implications of success
+- ANY communication suggesting completion/correctness
+## The Bottom Line
+**No shortcuts for verification.**
+Run the command. Read the output. THEN claim the result.
+This is non-negotiable.