bigpowers 2.1.3 → 2.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (164) hide show
  1. package/.pi/package.json +16 -0
  2. package/.pi/prompts/assess-impact.md +76 -0
  3. package/.pi/prompts/audit-code.md +156 -0
  4. package/.pi/prompts/build-epic.md +44 -0
  5. package/.pi/prompts/change-request.md +105 -0
  6. package/.pi/prompts/commit-message.md +135 -0
  7. package/.pi/prompts/compose-workflow.md +40 -0
  8. package/.pi/prompts/craft-skill.md +150 -0
  9. package/.pi/prompts/deepen-architecture.md +235 -0
  10. package/.pi/prompts/define-language.md +79 -0
  11. package/.pi/prompts/define-success.md +62 -0
  12. package/.pi/prompts/delegate-task.md +76 -0
  13. package/.pi/prompts/design-interface.md +96 -0
  14. package/.pi/prompts/develop-tdd.md +375 -0
  15. package/.pi/prompts/diagnose-root.md +23 -0
  16. package/.pi/prompts/dispatch-agents.md +83 -0
  17. package/.pi/prompts/edit-document.md +22 -0
  18. package/.pi/prompts/elaborate-spec.md +81 -0
  19. package/.pi/prompts/enforce-first.md +77 -0
  20. package/.pi/prompts/evolve-skill.md +38 -0
  21. package/.pi/prompts/execute-plan.md +54 -0
  22. package/.pi/prompts/fix-bug.md +36 -0
  23. package/.pi/prompts/grill-me.md +95 -0
  24. package/.pi/prompts/grill-with-docs.md +37 -0
  25. package/.pi/prompts/guard-git.md +212 -0
  26. package/.pi/prompts/hook-commits.md +93 -0
  27. package/.pi/prompts/inspect-quality.md +105 -0
  28. package/.pi/prompts/investigate-bug.md +117 -0
  29. package/.pi/prompts/kickoff-branch.md +99 -0
  30. package/.pi/prompts/map-codebase.md +70 -0
  31. package/.pi/prompts/migrate-spec.md +482 -0
  32. package/.pi/prompts/model-domain.md +227 -0
  33. package/.pi/prompts/orchestrate-project.md +161 -0
  34. package/.pi/prompts/organize-workspace.md +159 -0
  35. package/.pi/prompts/plan-refactor.md +77 -0
  36. package/.pi/prompts/plan-release.md +145 -0
  37. package/.pi/prompts/plan-work.md +161 -0
  38. package/.pi/prompts/release-branch.md +158 -0
  39. package/.pi/prompts/request-review.md +70 -0
  40. package/.pi/prompts/research-first.md +62 -0
  41. package/.pi/prompts/reset-baseline.md +20 -0
  42. package/.pi/prompts/respond-review.md +70 -0
  43. package/.pi/prompts/run-evals.md +56 -0
  44. package/.pi/prompts/run-planning.md +26 -0
  45. package/.pi/prompts/scope-work.md +23 -0
  46. package/.pi/prompts/search-skills.md +21 -0
  47. package/.pi/prompts/seed-conventions.md +132 -0
  48. package/.pi/prompts/session-state.md +146 -0
  49. package/.pi/prompts/setup-environment.md +23 -0
  50. package/.pi/prompts/simulate-agents.md +25 -0
  51. package/.pi/prompts/slice-tasks.md +23 -0
  52. package/.pi/prompts/spike-prototype.md +94 -0
  53. package/.pi/prompts/stocktake-skills.md +40 -0
  54. package/.pi/prompts/survey-context.md +129 -0
  55. package/.pi/prompts/terse-mode.md +37 -0
  56. package/.pi/prompts/trace-requirement.md +68 -0
  57. package/.pi/prompts/using-bigpowers.md +105 -0
  58. package/.pi/prompts/validate-fix.md +98 -0
  59. package/.pi/prompts/verify-work.md +125 -0
  60. package/.pi/prompts/visual-dashboard.md +51 -0
  61. package/.pi/prompts/wire-observability.md +92 -0
  62. package/.pi/prompts/write-document.md +244 -0
  63. package/.pi/skills/assess-impact/SKILL.md +77 -0
  64. package/.pi/skills/audit-code/SKILL.md +157 -0
  65. package/.pi/skills/build-epic/SKILL.md +45 -0
  66. package/.pi/skills/change-request/SKILL.md +106 -0
  67. package/.pi/skills/commit-message/SKILL.md +136 -0
  68. package/.pi/skills/compose-workflow/SKILL.md +41 -0
  69. package/.pi/skills/craft-skill/SKILL.md +151 -0
  70. package/.pi/skills/deepen-architecture/SKILL.md +236 -0
  71. package/.pi/skills/define-language/SKILL.md +80 -0
  72. package/.pi/skills/define-success/SKILL.md +63 -0
  73. package/.pi/skills/delegate-task/SKILL.md +77 -0
  74. package/.pi/skills/design-interface/SKILL.md +97 -0
  75. package/.pi/skills/develop-tdd/SKILL.md +376 -0
  76. package/.pi/skills/diagnose-root/SKILL.md +24 -0
  77. package/.pi/skills/dispatch-agents/SKILL.md +84 -0
  78. package/.pi/skills/edit-document/SKILL.md +23 -0
  79. package/.pi/skills/elaborate-spec/SKILL.md +82 -0
  80. package/.pi/skills/enforce-first/SKILL.md +78 -0
  81. package/.pi/skills/evolve-skill/SKILL.md +39 -0
  82. package/.pi/skills/execute-plan/SKILL.md +55 -0
  83. package/.pi/skills/fix-bug/SKILL.md +37 -0
  84. package/.pi/skills/grill-me/SKILL.md +96 -0
  85. package/.pi/skills/grill-with-docs/SKILL.md +38 -0
  86. package/.pi/skills/guard-git/SKILL.md +213 -0
  87. package/.pi/skills/hook-commits/SKILL.md +94 -0
  88. package/.pi/skills/inspect-quality/SKILL.md +106 -0
  89. package/.pi/skills/investigate-bug/SKILL.md +118 -0
  90. package/.pi/skills/kickoff-branch/SKILL.md +100 -0
  91. package/.pi/skills/map-codebase/SKILL.md +71 -0
  92. package/.pi/skills/migrate-spec/SKILL.md +483 -0
  93. package/.pi/skills/model-domain/SKILL.md +228 -0
  94. package/.pi/skills/orchestrate-project/SKILL.md +162 -0
  95. package/.pi/skills/organize-workspace/SKILL.md +160 -0
  96. package/.pi/skills/plan-refactor/SKILL.md +78 -0
  97. package/.pi/skills/plan-release/SKILL.md +146 -0
  98. package/.pi/skills/plan-work/SKILL.md +162 -0
  99. package/.pi/skills/release-branch/SKILL.md +159 -0
  100. package/.pi/skills/request-review/SKILL.md +71 -0
  101. package/.pi/skills/research-first/SKILL.md +63 -0
  102. package/.pi/skills/reset-baseline/SKILL.md +21 -0
  103. package/.pi/skills/respond-review/SKILL.md +71 -0
  104. package/.pi/skills/run-evals/SKILL.md +57 -0
  105. package/.pi/skills/run-planning/SKILL.md +27 -0
  106. package/.pi/skills/scope-work/SKILL.md +24 -0
  107. package/.pi/skills/search-skills/SKILL.md +22 -0
  108. package/.pi/skills/seed-conventions/SKILL.md +133 -0
  109. package/.pi/skills/session-state/SKILL.md +147 -0
  110. package/.pi/skills/setup-environment/SKILL.md +24 -0
  111. package/.pi/skills/simulate-agents/SKILL.md +26 -0
  112. package/.pi/skills/slice-tasks/SKILL.md +24 -0
  113. package/.pi/skills/spike-prototype/SKILL.md +95 -0
  114. package/.pi/skills/stocktake-skills/SKILL.md +41 -0
  115. package/.pi/skills/survey-context/SKILL.md +130 -0
  116. package/.pi/skills/terse-mode/SKILL.md +38 -0
  117. package/.pi/skills/trace-requirement/SKILL.md +69 -0
  118. package/.pi/skills/using-bigpowers/SKILL.md +106 -0
  119. package/.pi/skills/validate-fix/SKILL.md +99 -0
  120. package/.pi/skills/verify-work/SKILL.md +126 -0
  121. package/.pi/skills/visual-dashboard/SKILL.md +52 -0
  122. package/.pi/skills/wire-observability/SKILL.md +93 -0
  123. package/.pi/skills/write-document/SKILL.md +245 -0
  124. package/CHANGELOG.md +14 -0
  125. package/CLAUDE.md +1 -1
  126. package/CONVENTIONS.md +16 -10
  127. package/README.md +30 -4
  128. package/build-epic/SKILL.md +1 -1
  129. package/deepen-architecture/SKILL.md +2 -0
  130. package/define-language/SKILL.md +2 -0
  131. package/develop-tdd/REFERENCE.md +61 -0
  132. package/develop-tdd/SKILL.md +19 -119
  133. package/diagnose-root/SKILL.md +2 -0
  134. package/edit-document/SKILL.md +2 -0
  135. package/fix-bug/SKILL.md +3 -1
  136. package/grill-me/SKILL.md +3 -1
  137. package/grill-with-docs/SKILL.md +3 -1
  138. package/investigate-bug/SKILL.md +5 -11
  139. package/map-codebase/SKILL.md +3 -1
  140. package/migrate-spec/REFERENCE-GSD.md +4 -4
  141. package/migrate-spec/REFERENCE.md +33 -6
  142. package/migrate-spec/SKILL.md +1 -14
  143. package/model-domain/SKILL.md +2 -0
  144. package/orchestrate-project/REFERENCE.md +1 -1
  145. package/package.json +3 -2
  146. package/plan-release/SKILL.md +1 -1
  147. package/plan-work/REFERENCE.md +104 -0
  148. package/plan-work/SKILL.md +17 -151
  149. package/release-branch/REFERENCE.md +55 -0
  150. package/release-branch/SKILL.md +19 -117
  151. package/request-review/SKILL.md +1 -1
  152. package/run-planning/SKILL.md +3 -2
  153. package/scope-work/SKILL.md +3 -1
  154. package/scripts/audit-compliance.sh +15 -3
  155. package/scripts/check-skill-size.sh +79 -0
  156. package/scripts/generate-reference-tables.sh +64 -0
  157. package/scripts/project-survey.sh +2 -2
  158. package/scripts/sync-skills.sh +51 -3
  159. package/scripts/validate-doctrine.sh +143 -0
  160. package/seed-conventions/REFERENCE.md +63 -0
  161. package/seed-conventions/SKILL.md +23 -177
  162. package/slice-tasks/SKILL.md +3 -1
  163. package/survey-context/SKILL.md +3 -1
  164. package/write-document/SKILL.md +4 -2
@@ -0,0 +1,24 @@
1
+ ---
2
+ name: diagnose-root
3
+ description: "Run 4-phase root cause analysis — reproduce, isolate, hypothesize, verify. Use when a bug is confirmed but root cause is unclear, after investigate-bug, or when user mentions root cause analysis.model: sonnet"
4
+ ---
5
+
6
+
7
+ # Diagnose Root
8
+
9
+ **Boundary**: Canonical, reusable 4-phase RCA engine. Invoked by `investigate-bug` (as step 2 of the end-to-end flow) and by `fix-bug` (when no bug file exists). Does not write the bug file — that is `investigate-bug`'s responsibility.
10
+
11
+ Four phases — do not skip. Update the active `specs/bugs/BUG-*.md` file at each phase.
12
+
13
+ ## Phases
14
+
15
+ 1. **Reproduce** — minimal steps; record environment; capture logs.
16
+ 2. **Isolate** — narrow to module/function; binary-search commits or config.
17
+ 3. **Hypothesize** — list ranked hypotheses with falsification test each.
18
+ 4. **Verify** — run falsification; confirm single root cause; link to fix plan.
19
+
20
+ > **HARD GATE** — Do not propose a fix until phase 4 confirms one root cause with evidence.
21
+
22
+ ## Verify
23
+
24
+ → verify: `BUG_FILE=$(ls -t specs/bugs/BUG-*.md 2>/dev/null | head -1); test -n "$BUG_FILE" && grep -cE "Reproduce|Isolate|Hypothesize|Verify" "$BUG_FILE" | awk '{if($1>=4) print "OK"; else print "INCOMPLETE"}' || echo "MISSING"`
@@ -0,0 +1,84 @@
1
+ ---
2
+ name: dispatch-agents
3
+ description: "Dispatch multiple subagents in parallel on independent tasks. No waiting between them — all run concurrently. Use when tasks are truly decoupled and speed matters. Distinct from delegate-task (concurrent here, no inter-task review gate)."
4
+ ---
5
+
6
+
7
+ # Dispatch Agents
8
+ > **HARD GATE** — **HARD GATE** — Agent work must be parallelizable and have explicit synchronization points. Do NOT dispatch work that has hidden dependencies between agents.
9
+
10
+
11
+ Run multiple subagents in parallel on independent tasks. Use when tasks are genuinely decoupled — no agent needs the output of another to start.
12
+
13
+ **Distinct from `delegate-task`:** This skill maximizes throughput via concurrency. There is no sequential review gate between tasks. Use `delegate-task` instead when a single task needs careful two-stage oversight before proceeding.
14
+
15
+ ## When to use
16
+
17
+ - Tasks that can run simultaneously without shared state
18
+ - Large plans that can be broken into parallel workstreams
19
+ - Exploration: gather information from multiple parts of the codebase at once
20
+
21
+ ## When NOT to use
22
+
23
+ - Task B depends on Task A's output
24
+ - You need to review Task A before Task B can start safely
25
+ - The tasks share a file and concurrent edits would conflict
26
+
27
+ ## Process
28
+
29
+ ### 1. Confirm independence
30
+
31
+ Before dispatching, verify each task pair is truly independent:
32
+ - No shared files being written
33
+ - No shared state (DB migrations, config files)
34
+ - No ordering dependency between outcomes
35
+
36
+ If any two tasks conflict, sequence them with `delegate-task` or `execute-plan` instead.
37
+
38
+ ### 2. Write task briefs
39
+
40
+ Before writing briefs, read `specs/state.yaml` if it exists — each agent gets only the decisions relevant to its task, nothing else.
41
+
42
+ For each task, use this minimal template (each agent starts cold — brief size directly controls token cost and hallucination risk):
43
+
44
+ ```
45
+ Goal: [one sentence — what success looks like]
46
+ In scope: [explicit file or module list]
47
+ Out of bounds: [what NOT to touch]
48
+ Verify: [runnable command]
49
+ Prior decisions: [relevant entries from specs/state.yaml — omit section if none apply]
50
+ ```
51
+
52
+ Do not include the full conversation, full file contents, or decisions unrelated to this agent's task.
53
+
54
+ ### 3. Iterative retrieval (max 3 cycles)
55
+
56
+ After each wave completes:
57
+ 1. **Dispatch** — run parallel agents with briefs.
58
+ 2. **Evaluate** — read outputs; list gaps vs goal.
59
+ 3. **Refine** — tighten briefs or spawn follow-up agents (max **3 cycles** total).
60
+
61
+ Stop when gaps empty or cycle 3 reached — escalate to user.
62
+
63
+ ### 4. Dispatch in parallel
64
+
65
+ Spawn all agents in a single message using multiple Agent tool calls. Each agent gets its own complete brief.
66
+
67
+ ```
68
+ Agent 1: brief for task A
69
+ Agent 2: brief for task B
70
+ Agent 3: brief for task C
71
+ ```
72
+
73
+ ### 5. Collect and review results
74
+
75
+ When all agents return:
76
+ - Review each result independently
77
+ - Run all verify commands
78
+ - Check diffs for scope violations or CONVENTIONS.md breaches
79
+
80
+ ### 6. Integrate
81
+
82
+ Merge accepted results. If any agent's result conflicts with another, resolve manually and note the conflict.
83
+
84
+ Report a summary: which tasks succeeded, which need revision, and overall verify status.
@@ -0,0 +1,23 @@
1
+ ---
2
+ name: edit-document
3
+ description: "Edit and improve documents by restructuring sections, improving clarity, and tightening prose. Use when user wants to edit, revise, restructure, or improve any document — including specs/ files, articles, READMEs, or technical writing."
4
+ ---
5
+
6
+
7
+ # Edit Document
8
+
9
+ **Distinct from `write-document`:** Use this skill when the document already exists and needs restructuring, clarity, or prose improvements. Use `write-document` to create a document from scratch.
10
+
11
+ > **HARD GATE** — Document edits must preserve intent and accuracy. Do NOT remove or contradict existing content without understanding why it was written. Check git history for context.
12
+
13
+ ## Process
14
+
15
+ 1. First, divide the document into sections based on its headings. Think about the main points made in each section.
16
+
17
+ Consider that information is a directed acyclic graph, and that pieces of information can depend on other pieces of information. Make sure that the order of the sections and their contents respects these dependencies.
18
+
19
+ Confirm the sections with the user.
20
+
21
+ 2. For each section:
22
+
23
+ 2a. Rewrite the section to improve clarity, coherence, and flow. Use maximum 240 characters per paragraph.
@@ -0,0 +1,82 @@
1
+ ---
2
+ name: elaborate-spec
3
+ description: "Refine a rough idea into a clear, detailed specification through dialogue. Does not produce code. Use when user has a vague idea, wants to think through a feature before planning, or needs to turn "I want X" into a concrete spec."
4
+ ---
5
+
6
+
7
+ # Elaborate Spec
8
+
9
+ Turn a rough idea into a clear specification through focused dialogue. No code is written during this skill — the output is shared understanding and a refined problem statement.
10
+
11
+ > **HARD GATE** — Do NOT proceed with planning or implementation until the problem space is clearly understood. Success criteria, actors, and scope must be explicit before drafting a plan.
12
+
13
+ ## Process
14
+
15
+ ### 1. Listen first
16
+
17
+ Let the user describe their idea in their own words. Do not interrupt or redirect. Take notes on:
18
+ - The core problem they're trying to solve
19
+ - Who is affected (actors)
20
+ - What success looks like to them
21
+ - Any constraints they've already identified
22
+
23
+ ### 2. Ask clarifying questions
24
+
25
+ Ask one question at a time. Work through these areas:
26
+
27
+ **Problem clarity**
28
+ - What is the current behavior (or lack of behavior) that prompted this?
29
+ - Who experiences this problem? How often?
30
+ - What's the cost of not solving it?
31
+
32
+ **Solution boundaries**
33
+ - What is explicitly IN scope?
34
+ - What is explicitly OUT of scope?
35
+ - Are there existing solutions (internal or external) this replaces or integrates with?
36
+
37
+ **Success criteria**
38
+ - How will you know this is done?
39
+ - What does the happy path look like end-to-end?
40
+ - What are the key failure modes to handle?
41
+
42
+ **Constraints**
43
+ - Any performance requirements?
44
+ - Any compatibility constraints (existing APIs, data formats)?
45
+ - Any non-negotiable implementation decisions already made?
46
+
47
+ ### 2.5. Multiple Interpretations (HARD GATE)
48
+
49
+ > **HARD GATE** — If the request admits ≥2 valid interpretations, do NOT guess. You must list them and ask the user to choose before proceeding. Proceeding with unresolved ambiguity is a failure of integrity.
50
+
51
+ Present the options clearly:
52
+ > "I see two ways to read this:
53
+ > 1. [Interpretation A] — my recommendation because [reason]
54
+ > 2. [Interpretation B]
55
+ > Which is closer to what you mean?"
56
+
57
+ ### 3. Surface hidden assumptions
58
+
59
+ Once the user has answered the main questions, probe for assumptions:
60
+ - "You mentioned X — does that mean Y is also true?"
61
+ - "What happens when Z fails?"
62
+ - "Is this for internal users, external users, or both?"
63
+
64
+ ### 4. Synthesize and confirm
65
+
66
+ Summarize your understanding in 3–5 bullet points aligned with [countable-story-format.md](file:///Users/danielvm/Developer/bigpowers/countable-story-format.md):
67
+ - The problem (feeds into §1 Business narrative)
68
+ - The solution and main flow (feeds into §5)
69
+ - The key constraints and alternative flows (feeds into §6)
70
+ - The success criteria (feeds into §17 Gherkin)
71
+ - What's out of scope (feeds into §18)
72
+
73
+ Ask: "Is this an accurate summary? Anything missing or wrong?"
74
+
75
+ ### 5. Suggest next skill
76
+
77
+ Once the spec is clear, recommend the next step:
78
+ - If domain model needs work → `model-domain`
79
+ - If ready to plan → `plan-release` (creates epic capsules with `epic.yaml` + story `.md` + `-tasks.yaml`) then `plan-work` per story
80
+ - If a spike is needed first → `spike-prototype`
81
+ - If architecture decisions are needed → `deepen-architecture` or `grill-me`
82
+ - If the plan depends on a specific library or API → `grill-me` in docs mode
@@ -0,0 +1,78 @@
1
+ ---
2
+ name: enforce-first
3
+ description: "Apply the F.I.R.S.T test quality rubric (Fast, Independent, Repeatable, Self-Validating, Timely) to a test suite or individual tests. Use when develop-tdd is writing tests, when test quality needs to be checked, or when user mentions F.I.R.S.T or "test quality"."
4
+ ---
5
+
6
+
7
+ # Enforce FIRST
8
+ > **HARD GATE** — **HARD GATE** — Before shipping, ALL enforcement checks must pass: lint, typecheck, tests, coverage gates. Do NOT disable or skip checks to get to green.
9
+
10
+
11
+ Apply the F.I.R.S.T rubric (Uncle Bob, Clean Code Chapter 9) to evaluate and improve tests.
12
+
13
+ This skill is typically invoked internally by `develop-tdd` during the test-writing phase. It can also be run standalone on an existing test suite.
14
+
15
+ ## The F.I.R.S.T Rubric
16
+
17
+ ### F — Fast
18
+
19
+ Tests must run quickly. Slow tests don't get run. They don't get trusted.
20
+
21
+ - [ ] No real network calls (use fakes/stubs for external I/O)
22
+ - [ ] No real database (use in-memory or transaction-rollback strategies)
23
+ - [ ] No `sleep` or arbitrary timeouts in test code
24
+ - [ ] The full suite runs in under 30 seconds (target; adjust to project size)
25
+
26
+ **Fix:** Replace slow I/O with named fake classes. Never inline anonymous stubs.
27
+
28
+ ### I — Independent
29
+
30
+ Tests must not depend on each other. Running in any order must produce the same result.
31
+
32
+ - [ ] No shared mutable state between tests
33
+ - [ ] Each test sets up its own data and tears it down
34
+ - [ ] No test assumes another test ran first
35
+ - [ ] Tests can be run individually (e.g. `npm test -- mytest.test.ts`) and pass
36
+
37
+ **Fix:** Move setup into `beforeEach`. Use factory functions to build test data.
38
+
39
+ ### R — Repeatable
40
+
41
+ Tests must pass consistently in any environment.
42
+
43
+ - [ ] No dependency on machine-specific paths, ports, or environment variables (unless explicitly injected)
44
+ - [ ] No dependency on current time without mocking the clock
45
+ - [ ] No flakiness — a test that sometimes fails is worse than no test
46
+ - [ ] Tests pass on CI the same way they pass locally
47
+
48
+ **Fix:** Inject time, randomness, and environment as parameters. Pin seeds for anything random.
49
+
50
+ ### S — Self-Validating
51
+
52
+ Tests must report pass or fail automatically. No human inspection required.
53
+
54
+ - [ ] Tests use assertions (`expect`, `assert`, etc.) — not just `console.log`
55
+ - [ ] Failure messages are descriptive enough to diagnose without reading the test body
56
+ - [ ] No tests that "pass" by default when the feature is broken
57
+
58
+ **Fix:** Add assertion messages. Use matchers that describe the expected behavior.
59
+
60
+ ### T — Timely
61
+
62
+ Tests must be written at the right time — before or immediately with the code they test.
63
+
64
+ - [ ] Tests are written in the same commit as the code (or the commit before, if TDD)
65
+ - [ ] No "I'll add tests later" patterns
66
+ - [ ] Bug fixes include a regression test that would have caught the bug
67
+
68
+ **Fix:** Run `develop-tdd` — it enforces the timely principle by design.
69
+
70
+ ## Applying the rubric
71
+
72
+ For each failing criterion:
73
+ 1. Identify which tests violate it
74
+ 2. Describe the fix
75
+ 3. Apply the fix
76
+ 4. Re-run the suite to confirm it still passes
77
+
78
+ Report: "F.I.R.S.T audit complete. X criteria passed, Y fixed."
@@ -0,0 +1,39 @@
1
+ ---
2
+ name: evolve-skill
3
+ description: "Benchmark-gated skill evolution — consume bigpowers-benchmark report, propose plan-work change, edit skill via craft-skill, re-run benchmark, record ADR. Use when a skill underperforms on benchmark or stocktake finds systemic gap.model: opus"
4
+ ---
5
+
6
+
7
+ # Evolve Skill
8
+
9
+ > **HARD GATE** — No skill change ships without benchmark score ≥ pre-change baseline. Learning is measured and versioned — never implicit.
10
+
11
+ ## Loop
12
+
13
+ 1. Run `bigpowers-benchmark` (external repo); save report path in state.yaml.
14
+ 2. Identify target skill + measurable gap from report.
15
+ 3. `plan-work` — minimal change proposal with verify commands.
16
+ 4. Edit via `craft-skill` / direct SKILL.md edit; run `sync-skills.sh`.
17
+ 5. Re-run benchmark; compare scores.
18
+ 6. Record decision in `specs/adr/` + `session-state`; revert if regression.
19
+
20
+ ## Verify
21
+
22
+ → verify: benchmark report shows post-change score ≥ baseline (document paths in state.yaml)
23
+
24
+ See [REFERENCE.md](REFERENCE.md) for ADR template.
25
+
26
+ ---
27
+
28
+ # Evolve Skill — ADR snippet
29
+
30
+ ```markdown
31
+ ## ADR-XXXX: Evolve <skill-name>
32
+
33
+ **Status:** Accepted
34
+ **Benchmark:** before X% / after Y%
35
+ **Change:** one-sentence summary
36
+ **Evidence:** path/to/benchmark-report.md
37
+ ```
38
+
39
+ Benchmark repo: `/Users/danielvm/Developer/bigpowers-benchmark/`
@@ -0,0 +1,55 @@
1
+ ---
2
+ name: execute-plan
3
+ description: "Batch-execute tasks from the active epic capsule sequentially, with a human checkpoint after each step. Use when user has an approved plan and wants step-by-step oversight."
4
+ ---
5
+
6
+
7
+ # Execute Plan
8
+
9
+ Execute tasks from the **active epic** (`specs/epics/eNN-slug/epic.yaml` story `tasks[]`) one at a time, showing evidence after each step before proceeding.
10
+
11
+ > **HARD GATE** — Do NOT proceed if on `main` or `master`. Run `kickoff-branch` first.
12
+ >
13
+ > **HARD GATE** — Active epic must exist with runnable `verify` on each task. If missing, run `plan-release` then `plan-work` or `build-epic`.
14
+
15
+ ## Process
16
+
17
+ ### 1. Read the plan
18
+
19
+ Read `specs/state.yaml` (`active_epic`, `active_story`) and the matching `specs/epics/*/epic.yaml`. Parse `depends-on` in task descriptions for execution waves.
20
+
21
+ > **CONTEXT ISOLATION** — Spawn each skill with a **fresh context window**. Pass decisions only through `specs/state.yaml` `handoff` — never rely on prior chat history.
22
+
23
+ Confirm with the user: step count, skip/reorder, stop-after step.
24
+
25
+ ### 2. Execute step by step
26
+
27
+ For each task in the active story:
28
+
29
+ **a. Announce** — task `desc` and `verify` command.
30
+
31
+ **b. Execute** — code or `delegate-task` / `dispatch-agents` for waves.
32
+
33
+ **c. Run verify** — must be green before advancing.
34
+
35
+ **d. Log** — non-obvious decisions in `specs/state.yaml` under `decisions[]` or `handoff` block.
36
+
37
+ **e. Checkpoint** — ask to proceed unless autonomous mode requested.
38
+
39
+ **f. Story UAT** — after last task, run manual verification script from story notes or `verify-work`.
40
+
41
+ On verify failure: fix and re-run; never advance on red.
42
+
43
+ Update `specs/execution-status.yaml` when a story/epic completes (`bash scripts/sync-status-from-epics.sh` or direct edit).
44
+
45
+ ### 3. Blockers
46
+
47
+ Report blocker; ask skip/adapt/stop; update epic capsule if plan changes.
48
+
49
+ ### 4. Final report
50
+
51
+ Suggest: `verify-work` → `run-evals` → `audit-code` → `simulate-agents` → `commit-message` → `release-branch`
52
+
53
+ ## Rules
54
+
55
+ - **Loop until behavioral correctness is verified**: if a verify command passes but the observed behavior is still wrong, return to step 1 and run the execution cycle again.
@@ -0,0 +1,37 @@
1
+ ---
2
+ name: fix-bug
3
+ description: "Bug fix orchestrator — active_flow fix_bug; reads specs/bugs/BUG-*.md; chains investigate-bug, develop-tdd, validate-fix. Use when user reports a defect."
4
+ ---
5
+
6
+
7
+ # Fix Bug
8
+
9
+ **Boundary**: Orchestrator flow — chains `investigate-bug` (entry point + RCA via `diagnose-root`) → `develop-tdd` → `validate-fix`. Does not implement RCA or write bug files directly.
10
+
11
+ Orchestrates **fix_bug** flow without mixing epic build state.
12
+
13
+ > **HARD GATE** — Set `specs/state.yaml` `active_flow: fix_bug`.
14
+
15
+ ## Process
16
+
17
+ 1. If no `specs/bugs/BUG-*.md`, run `investigate-bug` first — it handles history check, RCA (via `diagnose-root`), fix approach, and writes the bug file.
18
+ 2. `develop-tdd` against the bug file's verify steps.
19
+ 3. `validate-fix` — re-run failing test, full suite, lint.
20
+ 4. `bash scripts/sync-bugs-registry.sh` — refresh `specs/bugs/registry.yaml`.
21
+ 5. Clear `active_flow` or return to `build_epic` when done.
22
+
23
+ ## Bug file SoT
24
+
25
+ One markdown file per bug with frontmatter:
26
+
27
+ ```yaml
28
+ bug_id: BUG-001
29
+ status: open
30
+ severity: high
31
+ scope: api
32
+ title: Short title
33
+ ```
34
+
35
+ ## Verify
36
+
37
+ → verify: `test -d specs/bugs && bash scripts/sync-bugs-registry.sh`
@@ -0,0 +1,96 @@
1
+ ---
2
+ name: grill-me
3
+ description: "Interactive assumption-surfacing Q&A that stress-tests a plan through relentless questioning until every decision is resolved. Use when user wants to challenge a plan, validate decisions from conversation/context, or mentions "grill me". For doc-grounded variant, use grill-with-docs."
4
+ ---
5
+
6
+
7
+ # Grill Me
8
+
9
+ > **Use this vs grill-with-docs:** `grill-me` surfaces assumptions from the conversation and context alone — no documentation fetching. Use `grill-with-docs` (the doc-grounded variant) when the plan relies on a specific library or external API and every challenge must cite a real doc URL.
10
+
11
+ Two modes. Default is **Design**. Switch to **Docs** by saying "grill me with docs" or when the plan relies on a specific library or external API.
12
+
13
+ > **HARD GATE** — Do NOT accept a design until every hard decision has been stress-tested. "Seems right" is not a decision. Grilling must identify and resolve tensions before build begins.
14
+
15
+ ## Design mode (default)
16
+
17
+ Interview relentlessly about every aspect of this plan until reaching shared understanding. Walk each branch of the design tree, resolving dependencies between decisions one-by-one. For each question, provide your recommended answer. Ask one question at a time.
18
+
19
+ If a question can be answered by exploring the codebase, explore it instead.
20
+
21
+ ## Docs mode
22
+
23
+ Ground every challenge in real documentation — no assumption about a library's behavior goes unchecked. See [REFERENCE.md](REFERENCE.md) for the full process.
24
+
25
+ Short form:
26
+ 1. List every external library, third-party API, and framework behavior relied upon.
27
+ 2. Fetch the actual docs for each (`WebFetch` the official API reference).
28
+ 3. Challenge each plan assumption against the real docs: correct method signature? right version? deprecated?
29
+ 4. Report confirmed ✓, corrected ✗ (with the real behavior), and uncertain → `spike-prototype`.
30
+ 5. Update the plan for each confirmed discrepancy.
31
+
32
+ ---
33
+
34
+ # Docs Mode — Full Process
35
+
36
+ Triggered by "grill me with docs" or when a plan depends on a specific library or external API.
37
+
38
+ **Why this matters:** AI agents hallucinate API methods, argument orders, and behaviors. Every assumption about an external dependency must be validated against the actual docs before code is written.
39
+
40
+ ## Step 1 — Identify the dependencies
41
+
42
+ From the plan or conversation, list:
43
+ - Every external library being used
44
+ - Every third-party API being called
45
+ - Every framework behavior being relied upon
46
+
47
+ Ask: "Which of these are you most confident about? Which are you less sure of?"
48
+
49
+ ## Step 2 — Fetch the relevant docs
50
+
51
+ For each dependency, fetch the actual documentation:
52
+
53
+ ```
54
+ WebFetch the official docs for [library/API]
55
+ ```
56
+
57
+ Prioritize:
58
+ - The API reference for the specific method being used
59
+ - The changelog for the version in use (breaking changes)
60
+ - Migration guides if upgrading from a previous version
61
+ - Known gotchas / FAQ sections
62
+
63
+ ## Step 3 — Challenge each assumption
64
+
65
+ For every assumption in the plan, find the corresponding doc section and ask:
66
+
67
+ - "Does the real API actually work this way? Show me the doc."
68
+ - "Is this method available in the version you're using?"
69
+ - "Does this argument order match the actual signature?"
70
+ - "Are there rate limits, quotas, or timeout behaviors that affect this design?"
71
+ - "Is this marked as deprecated in the current version?"
72
+
73
+ Ask one question at a time. For each challenge, cite the specific URL and section.
74
+
75
+ ## Step 4 — Surface hallucinations
76
+
77
+ When an assumption doesn't match the docs:
78
+
79
+ > "Your plan uses `library.doThing(a, b)` but the [docs](URL) show the signature is `doThing(config: {a, b})` with a config object. This will fail at runtime."
80
+
81
+ Document each discrepancy clearly.
82
+
83
+ ## Step 5 — Update the plan
84
+
85
+ For each confirmed discrepancy, recommend a concrete fix:
86
+ - Correct method signature
87
+ - Correct argument order
88
+ - Alternative approach that matches what the library actually supports
89
+ - Whether a spike (`spike-prototype`) is needed to validate a remaining uncertainty
90
+
91
+ ## Step 6 — Sign off
92
+
93
+ When all major assumptions have been validated against docs, report:
94
+ - Which assumptions were confirmed ✓
95
+ - Which were corrected ✗ + what the correct approach is
96
+ - Which remain uncertain → recommend `spike-prototype`
@@ -0,0 +1,38 @@
1
+ ---
2
+ name: grill-with-docs
3
+ description: "Doc-grounded variant of grill-me — stress-tests plan assumptions by fetching and citing real library or API documentation. Every challenge must cite a real URL. Use when the plan depends on a specific library or external API.model: opus"
4
+ ---
5
+
6
+
7
+ # Grill With Docs
8
+
9
+ > **Use this vs grill-me:** `grill-with-docs` is the doc-grounded variant of `grill-me`. Use it when the plan relies on external libraries or APIs and every challenge must be grounded in and cite a real documentation URL. Use `grill-me` for context-only assumption surfacing without fetching docs.
10
+
11
+ > **HARD GATE** — Every challenge must cite a real documentation URL. No hallucinated APIs.
12
+
13
+ ## Process
14
+
15
+ 1. Read the plan or design under test (`specs/release-plan.yaml + epic shards`, INTERFACE-OPTIONS.md, etc.).
16
+ 2. List assumptions that depend on external libraries or APIs.
17
+ 3. For each assumption: fetch or quote official docs; challenge with "docs say X, plan says Y."
18
+ 4. Resolve or update the plan inline; unresolved items block `plan-work`.
19
+
20
+ ## Docs mode rules
21
+
22
+ - Cite URL + quoted snippet (method name, parameter, version).
23
+ - If docs contradict the plan, plan loses until updated.
24
+ - Prefer official docs over blog posts.
25
+
26
+ ## Verify
27
+
28
+ → verify: dialogue log contains at least one `https://` doc URL per challenged assumption
29
+
30
+ See [REFERENCE.md](REFERENCE.md) for question templates.
31
+
32
+ ---
33
+
34
+ # Grill With Docs — Question templates
35
+
36
+ - "Docs at [URL] show signature `foo(bar?: Baz)`. Your plan calls `foo(bar, baz)` — which is correct?"
37
+ - "The changelog at [URL] deprecates X in v3. Your plan still uses X — migrate or pin version?"
38
+ - "Error handling in [URL] throws `NetworkError`. Your plan catches `Error` only — is that sufficient?"