bigpowers 2.2.0 → 2.4.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.pi/package.json +16 -0
- package/.pi/prompts/assess-impact.md +76 -0
- package/.pi/prompts/audit-code.md +156 -0
- package/.pi/prompts/build-epic.md +44 -0
- package/.pi/prompts/change-request.md +105 -0
- package/.pi/prompts/commit-message.md +135 -0
- package/.pi/prompts/compose-workflow.md +40 -0
- package/.pi/prompts/craft-skill.md +150 -0
- package/.pi/prompts/deepen-architecture.md +235 -0
- package/.pi/prompts/define-language.md +79 -0
- package/.pi/prompts/define-success.md +62 -0
- package/.pi/prompts/delegate-task.md +76 -0
- package/.pi/prompts/design-interface.md +96 -0
- package/.pi/prompts/develop-tdd.md +375 -0
- package/.pi/prompts/diagnose-root.md +23 -0
- package/.pi/prompts/dispatch-agents.md +83 -0
- package/.pi/prompts/edit-document.md +22 -0
- package/.pi/prompts/elaborate-spec.md +81 -0
- package/.pi/prompts/enforce-first.md +77 -0
- package/.pi/prompts/evolve-skill.md +38 -0
- package/.pi/prompts/execute-plan.md +54 -0
- package/.pi/prompts/fix-bug.md +36 -0
- package/.pi/prompts/grill-me.md +95 -0
- package/.pi/prompts/grill-with-docs.md +37 -0
- package/.pi/prompts/guard-git.md +212 -0
- package/.pi/prompts/hook-commits.md +93 -0
- package/.pi/prompts/inspect-quality.md +105 -0
- package/.pi/prompts/investigate-bug.md +117 -0
- package/.pi/prompts/kickoff-branch.md +99 -0
- package/.pi/prompts/map-codebase.md +70 -0
- package/.pi/prompts/migrate-spec.md +482 -0
- package/.pi/prompts/model-domain.md +227 -0
- package/.pi/prompts/orchestrate-project.md +161 -0
- package/.pi/prompts/organize-workspace.md +159 -0
- package/.pi/prompts/plan-refactor.md +77 -0
- package/.pi/prompts/plan-release.md +145 -0
- package/.pi/prompts/plan-work.md +161 -0
- package/.pi/prompts/release-branch.md +158 -0
- package/.pi/prompts/request-review.md +70 -0
- package/.pi/prompts/research-first.md +62 -0
- package/.pi/prompts/reset-baseline.md +20 -0
- package/.pi/prompts/respond-review.md +70 -0
- package/.pi/prompts/run-evals.md +56 -0
- package/.pi/prompts/run-planning.md +26 -0
- package/.pi/prompts/scope-work.md +23 -0
- package/.pi/prompts/search-skills.md +21 -0
- package/.pi/prompts/seed-conventions.md +132 -0
- package/.pi/prompts/session-state.md +146 -0
- package/.pi/prompts/setup-environment.md +23 -0
- package/.pi/prompts/simulate-agents.md +25 -0
- package/.pi/prompts/slice-tasks.md +23 -0
- package/.pi/prompts/spike-prototype.md +94 -0
- package/.pi/prompts/stocktake-skills.md +40 -0
- package/.pi/prompts/survey-context.md +129 -0
- package/.pi/prompts/terse-mode.md +37 -0
- package/.pi/prompts/trace-requirement.md +68 -0
- package/.pi/prompts/using-bigpowers.md +105 -0
- package/.pi/prompts/validate-fix.md +98 -0
- package/.pi/prompts/verify-work.md +125 -0
- package/.pi/prompts/visual-dashboard.md +51 -0
- package/.pi/prompts/wire-observability.md +92 -0
- package/.pi/prompts/write-document.md +244 -0
- package/.pi/skills/assess-impact/SKILL.md +77 -0
- package/.pi/skills/audit-code/SKILL.md +157 -0
- package/.pi/skills/build-epic/SKILL.md +45 -0
- package/.pi/skills/change-request/SKILL.md +106 -0
- package/.pi/skills/commit-message/SKILL.md +136 -0
- package/.pi/skills/compose-workflow/SKILL.md +41 -0
- package/.pi/skills/craft-skill/SKILL.md +151 -0
- package/.pi/skills/deepen-architecture/SKILL.md +236 -0
- package/.pi/skills/define-language/SKILL.md +80 -0
- package/.pi/skills/define-success/SKILL.md +63 -0
- package/.pi/skills/delegate-task/SKILL.md +77 -0
- package/.pi/skills/design-interface/SKILL.md +97 -0
- package/.pi/skills/develop-tdd/SKILL.md +376 -0
- package/.pi/skills/diagnose-root/SKILL.md +24 -0
- package/.pi/skills/dispatch-agents/SKILL.md +84 -0
- package/.pi/skills/edit-document/SKILL.md +23 -0
- package/.pi/skills/elaborate-spec/SKILL.md +82 -0
- package/.pi/skills/enforce-first/SKILL.md +78 -0
- package/.pi/skills/evolve-skill/SKILL.md +39 -0
- package/.pi/skills/execute-plan/SKILL.md +55 -0
- package/.pi/skills/fix-bug/SKILL.md +37 -0
- package/.pi/skills/grill-me/SKILL.md +96 -0
- package/.pi/skills/grill-with-docs/SKILL.md +38 -0
- package/.pi/skills/guard-git/SKILL.md +213 -0
- package/.pi/skills/hook-commits/SKILL.md +94 -0
- package/.pi/skills/inspect-quality/SKILL.md +106 -0
- package/.pi/skills/investigate-bug/SKILL.md +118 -0
- package/.pi/skills/kickoff-branch/SKILL.md +100 -0
- package/.pi/skills/map-codebase/SKILL.md +71 -0
- package/.pi/skills/migrate-spec/SKILL.md +483 -0
- package/.pi/skills/model-domain/SKILL.md +228 -0
- package/.pi/skills/orchestrate-project/SKILL.md +162 -0
- package/.pi/skills/organize-workspace/SKILL.md +160 -0
- package/.pi/skills/plan-refactor/SKILL.md +78 -0
- package/.pi/skills/plan-release/SKILL.md +146 -0
- package/.pi/skills/plan-work/SKILL.md +162 -0
- package/.pi/skills/release-branch/SKILL.md +159 -0
- package/.pi/skills/request-review/SKILL.md +71 -0
- package/.pi/skills/research-first/SKILL.md +63 -0
- package/.pi/skills/reset-baseline/SKILL.md +21 -0
- package/.pi/skills/respond-review/SKILL.md +71 -0
- package/.pi/skills/run-evals/SKILL.md +57 -0
- package/.pi/skills/run-planning/SKILL.md +27 -0
- package/.pi/skills/scope-work/SKILL.md +24 -0
- package/.pi/skills/search-skills/SKILL.md +22 -0
- package/.pi/skills/seed-conventions/SKILL.md +133 -0
- package/.pi/skills/session-state/SKILL.md +147 -0
- package/.pi/skills/setup-environment/SKILL.md +24 -0
- package/.pi/skills/simulate-agents/SKILL.md +26 -0
- package/.pi/skills/slice-tasks/SKILL.md +24 -0
- package/.pi/skills/spike-prototype/SKILL.md +95 -0
- package/.pi/skills/stocktake-skills/SKILL.md +41 -0
- package/.pi/skills/survey-context/SKILL.md +130 -0
- package/.pi/skills/terse-mode/SKILL.md +38 -0
- package/.pi/skills/trace-requirement/SKILL.md +69 -0
- package/.pi/skills/using-bigpowers/SKILL.md +106 -0
- package/.pi/skills/validate-fix/SKILL.md +99 -0
- package/.pi/skills/verify-work/SKILL.md +126 -0
- package/.pi/skills/visual-dashboard/SKILL.md +52 -0
- package/.pi/skills/wire-observability/SKILL.md +93 -0
- package/.pi/skills/write-document/SKILL.md +245 -0
- package/CHANGELOG.md +14 -0
- package/README.md +27 -1
- package/deepen-architecture/SKILL.md +2 -0
- package/define-language/SKILL.md +2 -0
- package/diagnose-root/SKILL.md +2 -0
- package/edit-document/SKILL.md +2 -0
- package/fix-bug/SKILL.md +3 -1
- package/grill-me/SKILL.md +3 -1
- package/grill-with-docs/SKILL.md +3 -1
- package/investigate-bug/SKILL.md +5 -11
- package/map-codebase/SKILL.md +3 -1
- package/model-domain/SKILL.md +2 -0
- package/package.json +11 -2
- package/plan-release/SKILL.md +1 -1
- package/plan-work/SKILL.md +3 -1
- package/release-branch/SKILL.md +4 -2
- package/run-planning/SKILL.md +3 -2
- package/scope-work/SKILL.md +3 -1
- package/scripts/sync-skills.sh +48 -3
- package/scripts/validate-doctrine.sh +24 -9
- package/seed-conventions/SKILL.md +2 -0
- package/slice-tasks/SKILL.md +3 -1
- package/survey-context/SKILL.md +3 -1
- package/write-document/SKILL.md +2 -0
|
@@ -0,0 +1,24 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: diagnose-root
|
|
3
|
+
description: "Run 4-phase root cause analysis — reproduce, isolate, hypothesize, verify. Use when a bug is confirmed but root cause is unclear, after investigate-bug, or when user mentions root cause analysis.model: sonnet"
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
|
|
7
|
+
# Diagnose Root
|
|
8
|
+
|
|
9
|
+
**Boundary**: Canonical, reusable 4-phase RCA engine. Invoked by `investigate-bug` (as step 2 of the end-to-end flow) and by `fix-bug` (when no bug file exists). Does not write the bug file — that is `investigate-bug`'s responsibility.
|
|
10
|
+
|
|
11
|
+
Four phases — do not skip. Update the active `specs/bugs/BUG-*.md` file at each phase.
|
|
12
|
+
|
|
13
|
+
## Phases
|
|
14
|
+
|
|
15
|
+
1. **Reproduce** — minimal steps; record environment; capture logs.
|
|
16
|
+
2. **Isolate** — narrow to module/function; binary-search commits or config.
|
|
17
|
+
3. **Hypothesize** — list ranked hypotheses with falsification test each.
|
|
18
|
+
4. **Verify** — run falsification; confirm single root cause; link to fix plan.
|
|
19
|
+
|
|
20
|
+
> **HARD GATE** — Do not propose a fix until phase 4 confirms one root cause with evidence.
|
|
21
|
+
|
|
22
|
+
## Verify
|
|
23
|
+
|
|
24
|
+
→ verify: `BUG_FILE=$(ls -t specs/bugs/BUG-*.md 2>/dev/null | head -1); test -n "$BUG_FILE" && grep -cE "Reproduce|Isolate|Hypothesize|Verify" "$BUG_FILE" | awk '{if($1>=4) print "OK"; else print "INCOMPLETE"}' || echo "MISSING"`
|
|
@@ -0,0 +1,84 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: dispatch-agents
|
|
3
|
+
description: "Dispatch multiple subagents in parallel on independent tasks. No waiting between them — all run concurrently. Use when tasks are truly decoupled and speed matters. Distinct from delegate-task (concurrent here, no inter-task review gate)."
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
|
|
7
|
+
# Dispatch Agents
|
|
8
|
+
> **HARD GATE** — **HARD GATE** — Agent work must be parallelizable and have explicit synchronization points. Do NOT dispatch work that has hidden dependencies between agents.
|
|
9
|
+
|
|
10
|
+
|
|
11
|
+
Run multiple subagents in parallel on independent tasks. Use when tasks are genuinely decoupled — no agent needs the output of another to start.
|
|
12
|
+
|
|
13
|
+
**Distinct from `delegate-task`:** This skill maximizes throughput via concurrency. There is no sequential review gate between tasks. Use `delegate-task` instead when a single task needs careful two-stage oversight before proceeding.
|
|
14
|
+
|
|
15
|
+
## When to use
|
|
16
|
+
|
|
17
|
+
- Tasks that can run simultaneously without shared state
|
|
18
|
+
- Large plans that can be broken into parallel workstreams
|
|
19
|
+
- Exploration: gather information from multiple parts of the codebase at once
|
|
20
|
+
|
|
21
|
+
## When NOT to use
|
|
22
|
+
|
|
23
|
+
- Task B depends on Task A's output
|
|
24
|
+
- You need to review Task A before Task B can start safely
|
|
25
|
+
- The tasks share a file and concurrent edits would conflict
|
|
26
|
+
|
|
27
|
+
## Process
|
|
28
|
+
|
|
29
|
+
### 1. Confirm independence
|
|
30
|
+
|
|
31
|
+
Before dispatching, verify each task pair is truly independent:
|
|
32
|
+
- No shared files being written
|
|
33
|
+
- No shared state (DB migrations, config files)
|
|
34
|
+
- No ordering dependency between outcomes
|
|
35
|
+
|
|
36
|
+
If any two tasks conflict, sequence them with `delegate-task` or `execute-plan` instead.
|
|
37
|
+
|
|
38
|
+
### 2. Write task briefs
|
|
39
|
+
|
|
40
|
+
Before writing briefs, read `specs/state.yaml` if it exists — each agent gets only the decisions relevant to its task, nothing else.
|
|
41
|
+
|
|
42
|
+
For each task, use this minimal template (each agent starts cold — brief size directly controls token cost and hallucination risk):
|
|
43
|
+
|
|
44
|
+
```
|
|
45
|
+
Goal: [one sentence — what success looks like]
|
|
46
|
+
In scope: [explicit file or module list]
|
|
47
|
+
Out of bounds: [what NOT to touch]
|
|
48
|
+
Verify: [runnable command]
|
|
49
|
+
Prior decisions: [relevant entries from specs/state.yaml — omit section if none apply]
|
|
50
|
+
```
|
|
51
|
+
|
|
52
|
+
Do not include the full conversation, full file contents, or decisions unrelated to this agent's task.
|
|
53
|
+
|
|
54
|
+
### 3. Iterative retrieval (max 3 cycles)
|
|
55
|
+
|
|
56
|
+
After each wave completes:
|
|
57
|
+
1. **Dispatch** — run parallel agents with briefs.
|
|
58
|
+
2. **Evaluate** — read outputs; list gaps vs goal.
|
|
59
|
+
3. **Refine** — tighten briefs or spawn follow-up agents (max **3 cycles** total).
|
|
60
|
+
|
|
61
|
+
Stop when gaps empty or cycle 3 reached — escalate to user.
|
|
62
|
+
|
|
63
|
+
### 4. Dispatch in parallel
|
|
64
|
+
|
|
65
|
+
Spawn all agents in a single message using multiple Agent tool calls. Each agent gets its own complete brief.
|
|
66
|
+
|
|
67
|
+
```
|
|
68
|
+
Agent 1: brief for task A
|
|
69
|
+
Agent 2: brief for task B
|
|
70
|
+
Agent 3: brief for task C
|
|
71
|
+
```
|
|
72
|
+
|
|
73
|
+
### 5. Collect and review results
|
|
74
|
+
|
|
75
|
+
When all agents return:
|
|
76
|
+
- Review each result independently
|
|
77
|
+
- Run all verify commands
|
|
78
|
+
- Check diffs for scope violations or CONVENTIONS.md breaches
|
|
79
|
+
|
|
80
|
+
### 6. Integrate
|
|
81
|
+
|
|
82
|
+
Merge accepted results. If any agent's result conflicts with another, resolve manually and note the conflict.
|
|
83
|
+
|
|
84
|
+
Report a summary: which tasks succeeded, which need revision, and overall verify status.
|
|
@@ -0,0 +1,23 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: edit-document
|
|
3
|
+
description: "Edit and improve documents by restructuring sections, improving clarity, and tightening prose. Use when user wants to edit, revise, restructure, or improve any document — including specs/ files, articles, READMEs, or technical writing."
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
|
|
7
|
+
# Edit Document
|
|
8
|
+
|
|
9
|
+
**Distinct from `write-document`:** Use this skill when the document already exists and needs restructuring, clarity, or prose improvements. Use `write-document` to create a document from scratch.
|
|
10
|
+
|
|
11
|
+
> **HARD GATE** — Document edits must preserve intent and accuracy. Do NOT remove or contradict existing content without understanding why it was written. Check git history for context.
|
|
12
|
+
|
|
13
|
+
## Process
|
|
14
|
+
|
|
15
|
+
1. First, divide the document into sections based on its headings. Think about the main points made in each section.
|
|
16
|
+
|
|
17
|
+
Consider that information is a directed acyclic graph, and that pieces of information can depend on other pieces of information. Make sure that the order of the sections and their contents respects these dependencies.
|
|
18
|
+
|
|
19
|
+
Confirm the sections with the user.
|
|
20
|
+
|
|
21
|
+
2. For each section:
|
|
22
|
+
|
|
23
|
+
2a. Rewrite the section to improve clarity, coherence, and flow. Use maximum 240 characters per paragraph.
|
|
@@ -0,0 +1,82 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: elaborate-spec
|
|
3
|
+
description: "Refine a rough idea into a clear, detailed specification through dialogue. Does not produce code. Use when user has a vague idea, wants to think through a feature before planning, or needs to turn "I want X" into a concrete spec."
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
|
|
7
|
+
# Elaborate Spec
|
|
8
|
+
|
|
9
|
+
Turn a rough idea into a clear specification through focused dialogue. No code is written during this skill — the output is shared understanding and a refined problem statement.
|
|
10
|
+
|
|
11
|
+
> **HARD GATE** — Do NOT proceed with planning or implementation until the problem space is clearly understood. Success criteria, actors, and scope must be explicit before drafting a plan.
|
|
12
|
+
|
|
13
|
+
## Process
|
|
14
|
+
|
|
15
|
+
### 1. Listen first
|
|
16
|
+
|
|
17
|
+
Let the user describe their idea in their own words. Do not interrupt or redirect. Take notes on:
|
|
18
|
+
- The core problem they're trying to solve
|
|
19
|
+
- Who is affected (actors)
|
|
20
|
+
- What success looks like to them
|
|
21
|
+
- Any constraints they've already identified
|
|
22
|
+
|
|
23
|
+
### 2. Ask clarifying questions
|
|
24
|
+
|
|
25
|
+
Ask one question at a time. Work through these areas:
|
|
26
|
+
|
|
27
|
+
**Problem clarity**
|
|
28
|
+
- What is the current behavior (or lack of behavior) that prompted this?
|
|
29
|
+
- Who experiences this problem? How often?
|
|
30
|
+
- What's the cost of not solving it?
|
|
31
|
+
|
|
32
|
+
**Solution boundaries**
|
|
33
|
+
- What is explicitly IN scope?
|
|
34
|
+
- What is explicitly OUT of scope?
|
|
35
|
+
- Are there existing solutions (internal or external) this replaces or integrates with?
|
|
36
|
+
|
|
37
|
+
**Success criteria**
|
|
38
|
+
- How will you know this is done?
|
|
39
|
+
- What does the happy path look like end-to-end?
|
|
40
|
+
- What are the key failure modes to handle?
|
|
41
|
+
|
|
42
|
+
**Constraints**
|
|
43
|
+
- Any performance requirements?
|
|
44
|
+
- Any compatibility constraints (existing APIs, data formats)?
|
|
45
|
+
- Any non-negotiable implementation decisions already made?
|
|
46
|
+
|
|
47
|
+
### 2.5. Multiple Interpretations (HARD GATE)
|
|
48
|
+
|
|
49
|
+
> **HARD GATE** — If the request admits ≥2 valid interpretations, do NOT guess. You must list them and ask the user to choose before proceeding. Proceeding with unresolved ambiguity is a failure of integrity.
|
|
50
|
+
|
|
51
|
+
Present the options clearly:
|
|
52
|
+
> "I see two ways to read this:
|
|
53
|
+
> 1. [Interpretation A] — my recommendation because [reason]
|
|
54
|
+
> 2. [Interpretation B]
|
|
55
|
+
> Which is closer to what you mean?"
|
|
56
|
+
|
|
57
|
+
### 3. Surface hidden assumptions
|
|
58
|
+
|
|
59
|
+
Once the user has answered the main questions, probe for assumptions:
|
|
60
|
+
- "You mentioned X — does that mean Y is also true?"
|
|
61
|
+
- "What happens when Z fails?"
|
|
62
|
+
- "Is this for internal users, external users, or both?"
|
|
63
|
+
|
|
64
|
+
### 4. Synthesize and confirm
|
|
65
|
+
|
|
66
|
+
Summarize your understanding in 3–5 bullet points aligned with [countable-story-format.md](file:///Users/danielvm/Developer/bigpowers/countable-story-format.md):
|
|
67
|
+
- The problem (feeds into §1 Business narrative)
|
|
68
|
+
- The solution and main flow (feeds into §5)
|
|
69
|
+
- The key constraints and alternative flows (feeds into §6)
|
|
70
|
+
- The success criteria (feeds into §17 Gherkin)
|
|
71
|
+
- What's out of scope (feeds into §18)
|
|
72
|
+
|
|
73
|
+
Ask: "Is this an accurate summary? Anything missing or wrong?"
|
|
74
|
+
|
|
75
|
+
### 5. Suggest next skill
|
|
76
|
+
|
|
77
|
+
Once the spec is clear, recommend the next step:
|
|
78
|
+
- If domain model needs work → `model-domain`
|
|
79
|
+
- If ready to plan → `plan-release` (creates epic capsules with `epic.yaml` + story `.md` + `-tasks.yaml`) then `plan-work` per story
|
|
80
|
+
- If a spike is needed first → `spike-prototype`
|
|
81
|
+
- If architecture decisions are needed → `deepen-architecture` or `grill-me`
|
|
82
|
+
- If the plan depends on a specific library or API → `grill-me` in docs mode
|
|
@@ -0,0 +1,78 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: enforce-first
|
|
3
|
+
description: "Apply the F.I.R.S.T test quality rubric (Fast, Independent, Repeatable, Self-Validating, Timely) to a test suite or individual tests. Use when develop-tdd is writing tests, when test quality needs to be checked, or when user mentions F.I.R.S.T or "test quality"."
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
|
|
7
|
+
# Enforce FIRST
|
|
8
|
+
> **HARD GATE** — **HARD GATE** — Before shipping, ALL enforcement checks must pass: lint, typecheck, tests, coverage gates. Do NOT disable or skip checks to get to green.
|
|
9
|
+
|
|
10
|
+
|
|
11
|
+
Apply the F.I.R.S.T rubric (Uncle Bob, Clean Code Chapter 9) to evaluate and improve tests.
|
|
12
|
+
|
|
13
|
+
This skill is typically invoked internally by `develop-tdd` during the test-writing phase. It can also be run standalone on an existing test suite.
|
|
14
|
+
|
|
15
|
+
## The F.I.R.S.T Rubric
|
|
16
|
+
|
|
17
|
+
### F — Fast
|
|
18
|
+
|
|
19
|
+
Tests must run quickly. Slow tests don't get run. They don't get trusted.
|
|
20
|
+
|
|
21
|
+
- [ ] No real network calls (use fakes/stubs for external I/O)
|
|
22
|
+
- [ ] No real database (use in-memory or transaction-rollback strategies)
|
|
23
|
+
- [ ] No `sleep` or arbitrary timeouts in test code
|
|
24
|
+
- [ ] The full suite runs in under 30 seconds (target; adjust to project size)
|
|
25
|
+
|
|
26
|
+
**Fix:** Replace slow I/O with named fake classes. Never inline anonymous stubs.
|
|
27
|
+
|
|
28
|
+
### I — Independent
|
|
29
|
+
|
|
30
|
+
Tests must not depend on each other. Running in any order must produce the same result.
|
|
31
|
+
|
|
32
|
+
- [ ] No shared mutable state between tests
|
|
33
|
+
- [ ] Each test sets up its own data and tears it down
|
|
34
|
+
- [ ] No test assumes another test ran first
|
|
35
|
+
- [ ] Tests can be run individually (e.g. `npm test -- mytest.test.ts`) and pass
|
|
36
|
+
|
|
37
|
+
**Fix:** Move setup into `beforeEach`. Use factory functions to build test data.
|
|
38
|
+
|
|
39
|
+
### R — Repeatable
|
|
40
|
+
|
|
41
|
+
Tests must pass consistently in any environment.
|
|
42
|
+
|
|
43
|
+
- [ ] No dependency on machine-specific paths, ports, or environment variables (unless explicitly injected)
|
|
44
|
+
- [ ] No dependency on current time without mocking the clock
|
|
45
|
+
- [ ] No flakiness — a test that sometimes fails is worse than no test
|
|
46
|
+
- [ ] Tests pass on CI the same way they pass locally
|
|
47
|
+
|
|
48
|
+
**Fix:** Inject time, randomness, and environment as parameters. Pin seeds for anything random.
|
|
49
|
+
|
|
50
|
+
### S — Self-Validating
|
|
51
|
+
|
|
52
|
+
Tests must report pass or fail automatically. No human inspection required.
|
|
53
|
+
|
|
54
|
+
- [ ] Tests use assertions (`expect`, `assert`, etc.) — not just `console.log`
|
|
55
|
+
- [ ] Failure messages are descriptive enough to diagnose without reading the test body
|
|
56
|
+
- [ ] No tests that "pass" by default when the feature is broken
|
|
57
|
+
|
|
58
|
+
**Fix:** Add assertion messages. Use matchers that describe the expected behavior.
|
|
59
|
+
|
|
60
|
+
### T — Timely
|
|
61
|
+
|
|
62
|
+
Tests must be written at the right time — before or immediately with the code they test.
|
|
63
|
+
|
|
64
|
+
- [ ] Tests are written in the same commit as the code (or the commit before, if TDD)
|
|
65
|
+
- [ ] No "I'll add tests later" patterns
|
|
66
|
+
- [ ] Bug fixes include a regression test that would have caught the bug
|
|
67
|
+
|
|
68
|
+
**Fix:** Run `develop-tdd` — it enforces the timely principle by design.
|
|
69
|
+
|
|
70
|
+
## Applying the rubric
|
|
71
|
+
|
|
72
|
+
For each failing criterion:
|
|
73
|
+
1. Identify which tests violate it
|
|
74
|
+
2. Describe the fix
|
|
75
|
+
3. Apply the fix
|
|
76
|
+
4. Re-run the suite to confirm it still passes
|
|
77
|
+
|
|
78
|
+
Report: "F.I.R.S.T audit complete. X criteria passed, Y fixed."
|
|
@@ -0,0 +1,39 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: evolve-skill
|
|
3
|
+
description: "Benchmark-gated skill evolution — consume bigpowers-benchmark report, propose plan-work change, edit skill via craft-skill, re-run benchmark, record ADR. Use when a skill underperforms on benchmark or stocktake finds systemic gap.model: opus"
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
|
|
7
|
+
# Evolve Skill
|
|
8
|
+
|
|
9
|
+
> **HARD GATE** — No skill change ships without benchmark score ≥ pre-change baseline. Learning is measured and versioned — never implicit.
|
|
10
|
+
|
|
11
|
+
## Loop
|
|
12
|
+
|
|
13
|
+
1. Run `bigpowers-benchmark` (external repo); save report path in state.yaml.
|
|
14
|
+
2. Identify target skill + measurable gap from report.
|
|
15
|
+
3. `plan-work` — minimal change proposal with verify commands.
|
|
16
|
+
4. Edit via `craft-skill` / direct SKILL.md edit; run `sync-skills.sh`.
|
|
17
|
+
5. Re-run benchmark; compare scores.
|
|
18
|
+
6. Record decision in `specs/adr/` + `session-state`; revert if regression.
|
|
19
|
+
|
|
20
|
+
## Verify
|
|
21
|
+
|
|
22
|
+
→ verify: benchmark report shows post-change score ≥ baseline (document paths in state.yaml)
|
|
23
|
+
|
|
24
|
+
See [REFERENCE.md](REFERENCE.md) for ADR template.
|
|
25
|
+
|
|
26
|
+
---
|
|
27
|
+
|
|
28
|
+
# Evolve Skill — ADR snippet
|
|
29
|
+
|
|
30
|
+
```markdown
|
|
31
|
+
## ADR-XXXX: Evolve <skill-name>
|
|
32
|
+
|
|
33
|
+
**Status:** Accepted
|
|
34
|
+
**Benchmark:** before X% / after Y%
|
|
35
|
+
**Change:** one-sentence summary
|
|
36
|
+
**Evidence:** path/to/benchmark-report.md
|
|
37
|
+
```
|
|
38
|
+
|
|
39
|
+
Benchmark repo: `/Users/danielvm/Developer/bigpowers-benchmark/`
|
|
@@ -0,0 +1,55 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: execute-plan
|
|
3
|
+
description: "Batch-execute tasks from the active epic capsule sequentially, with a human checkpoint after each step. Use when user has an approved plan and wants step-by-step oversight."
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
|
|
7
|
+
# Execute Plan
|
|
8
|
+
|
|
9
|
+
Execute tasks from the **active epic** (`specs/epics/eNN-slug/epic.yaml` story `tasks[]`) one at a time, showing evidence after each step before proceeding.
|
|
10
|
+
|
|
11
|
+
> **HARD GATE** — Do NOT proceed if on `main` or `master`. Run `kickoff-branch` first.
|
|
12
|
+
>
|
|
13
|
+
> **HARD GATE** — Active epic must exist with runnable `verify` on each task. If missing, run `plan-release` then `plan-work` or `build-epic`.
|
|
14
|
+
|
|
15
|
+
## Process
|
|
16
|
+
|
|
17
|
+
### 1. Read the plan
|
|
18
|
+
|
|
19
|
+
Read `specs/state.yaml` (`active_epic`, `active_story`) and the matching `specs/epics/*/epic.yaml`. Parse `depends-on` in task descriptions for execution waves.
|
|
20
|
+
|
|
21
|
+
> **CONTEXT ISOLATION** — Spawn each skill with a **fresh context window**. Pass decisions only through `specs/state.yaml` `handoff` — never rely on prior chat history.
|
|
22
|
+
|
|
23
|
+
Confirm with the user: step count, skip/reorder, stop-after step.
|
|
24
|
+
|
|
25
|
+
### 2. Execute step by step
|
|
26
|
+
|
|
27
|
+
For each task in the active story:
|
|
28
|
+
|
|
29
|
+
**a. Announce** — task `desc` and `verify` command.
|
|
30
|
+
|
|
31
|
+
**b. Execute** — code or `delegate-task` / `dispatch-agents` for waves.
|
|
32
|
+
|
|
33
|
+
**c. Run verify** — must be green before advancing.
|
|
34
|
+
|
|
35
|
+
**d. Log** — non-obvious decisions in `specs/state.yaml` under `decisions[]` or `handoff` block.
|
|
36
|
+
|
|
37
|
+
**e. Checkpoint** — ask to proceed unless autonomous mode requested.
|
|
38
|
+
|
|
39
|
+
**f. Story UAT** — after last task, run manual verification script from story notes or `verify-work`.
|
|
40
|
+
|
|
41
|
+
On verify failure: fix and re-run; never advance on red.
|
|
42
|
+
|
|
43
|
+
Update `specs/execution-status.yaml` when a story/epic completes (`bash scripts/sync-status-from-epics.sh` or direct edit).
|
|
44
|
+
|
|
45
|
+
### 3. Blockers
|
|
46
|
+
|
|
47
|
+
Report blocker; ask skip/adapt/stop; update epic capsule if plan changes.
|
|
48
|
+
|
|
49
|
+
### 4. Final report
|
|
50
|
+
|
|
51
|
+
Suggest: `verify-work` → `run-evals` → `audit-code` → `simulate-agents` → `commit-message` → `release-branch`
|
|
52
|
+
|
|
53
|
+
## Rules
|
|
54
|
+
|
|
55
|
+
- **Loop until behavioral correctness is verified**: if a verify command passes but the observed behavior is still wrong, return to step 1 and run the execution cycle again.
|
|
@@ -0,0 +1,37 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: fix-bug
|
|
3
|
+
description: "Bug fix orchestrator — active_flow fix_bug; reads specs/bugs/BUG-*.md; chains investigate-bug, develop-tdd, validate-fix. Use when user reports a defect."
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
|
|
7
|
+
# Fix Bug
|
|
8
|
+
|
|
9
|
+
**Boundary**: Orchestrator flow — chains `investigate-bug` (entry point + RCA via `diagnose-root`) → `develop-tdd` → `validate-fix`. Does not implement RCA or write bug files directly.
|
|
10
|
+
|
|
11
|
+
Orchestrates **fix_bug** flow without mixing epic build state.
|
|
12
|
+
|
|
13
|
+
> **HARD GATE** — Set `specs/state.yaml` `active_flow: fix_bug`.
|
|
14
|
+
|
|
15
|
+
## Process
|
|
16
|
+
|
|
17
|
+
1. If no `specs/bugs/BUG-*.md`, run `investigate-bug` first — it handles history check, RCA (via `diagnose-root`), fix approach, and writes the bug file.
|
|
18
|
+
2. `develop-tdd` against the bug file's verify steps.
|
|
19
|
+
3. `validate-fix` — re-run failing test, full suite, lint.
|
|
20
|
+
4. `bash scripts/sync-bugs-registry.sh` — refresh `specs/bugs/registry.yaml`.
|
|
21
|
+
5. Clear `active_flow` or return to `build_epic` when done.
|
|
22
|
+
|
|
23
|
+
## Bug file SoT
|
|
24
|
+
|
|
25
|
+
One markdown file per bug with frontmatter:
|
|
26
|
+
|
|
27
|
+
```yaml
|
|
28
|
+
bug_id: BUG-001
|
|
29
|
+
status: open
|
|
30
|
+
severity: high
|
|
31
|
+
scope: api
|
|
32
|
+
title: Short title
|
|
33
|
+
```
|
|
34
|
+
|
|
35
|
+
## Verify
|
|
36
|
+
|
|
37
|
+
→ verify: `test -d specs/bugs && bash scripts/sync-bugs-registry.sh`
|
|
@@ -0,0 +1,96 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: grill-me
|
|
3
|
+
description: "Interactive assumption-surfacing Q&A that stress-tests a plan through relentless questioning until every decision is resolved. Use when user wants to challenge a plan, validate decisions from conversation/context, or mentions "grill me". For doc-grounded variant, use grill-with-docs."
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
|
|
7
|
+
# Grill Me
|
|
8
|
+
|
|
9
|
+
> **Use this vs grill-with-docs:** `grill-me` surfaces assumptions from the conversation and context alone — no documentation fetching. Use `grill-with-docs` (the doc-grounded variant) when the plan relies on a specific library or external API and every challenge must cite a real doc URL.
|
|
10
|
+
|
|
11
|
+
Two modes. Default is **Design**. Switch to **Docs** by saying "grill me with docs" or when the plan relies on a specific library or external API.
|
|
12
|
+
|
|
13
|
+
> **HARD GATE** — Do NOT accept a design until every hard decision has been stress-tested. "Seems right" is not a decision. Grilling must identify and resolve tensions before build begins.
|
|
14
|
+
|
|
15
|
+
## Design mode (default)
|
|
16
|
+
|
|
17
|
+
Interview relentlessly about every aspect of this plan until reaching shared understanding. Walk each branch of the design tree, resolving dependencies between decisions one-by-one. For each question, provide your recommended answer. Ask one question at a time.
|
|
18
|
+
|
|
19
|
+
If a question can be answered by exploring the codebase, explore it instead.
|
|
20
|
+
|
|
21
|
+
## Docs mode
|
|
22
|
+
|
|
23
|
+
Ground every challenge in real documentation — no assumption about a library's behavior goes unchecked. See [REFERENCE.md](REFERENCE.md) for the full process.
|
|
24
|
+
|
|
25
|
+
Short form:
|
|
26
|
+
1. List every external library, third-party API, and framework behavior relied upon.
|
|
27
|
+
2. Fetch the actual docs for each (`WebFetch` the official API reference).
|
|
28
|
+
3. Challenge each plan assumption against the real docs: correct method signature? right version? deprecated?
|
|
29
|
+
4. Report confirmed ✓, corrected ✗ (with the real behavior), and uncertain → `spike-prototype`.
|
|
30
|
+
5. Update the plan for each confirmed discrepancy.
|
|
31
|
+
|
|
32
|
+
---
|
|
33
|
+
|
|
34
|
+
# Docs Mode — Full Process
|
|
35
|
+
|
|
36
|
+
Triggered by "grill me with docs" or when a plan depends on a specific library or external API.
|
|
37
|
+
|
|
38
|
+
**Why this matters:** AI agents hallucinate API methods, argument orders, and behaviors. Every assumption about an external dependency must be validated against the actual docs before code is written.
|
|
39
|
+
|
|
40
|
+
## Step 1 — Identify the dependencies
|
|
41
|
+
|
|
42
|
+
From the plan or conversation, list:
|
|
43
|
+
- Every external library being used
|
|
44
|
+
- Every third-party API being called
|
|
45
|
+
- Every framework behavior being relied upon
|
|
46
|
+
|
|
47
|
+
Ask: "Which of these are you most confident about? Which are you less sure of?"
|
|
48
|
+
|
|
49
|
+
## Step 2 — Fetch the relevant docs
|
|
50
|
+
|
|
51
|
+
For each dependency, fetch the actual documentation:
|
|
52
|
+
|
|
53
|
+
```
|
|
54
|
+
WebFetch the official docs for [library/API]
|
|
55
|
+
```
|
|
56
|
+
|
|
57
|
+
Prioritize:
|
|
58
|
+
- The API reference for the specific method being used
|
|
59
|
+
- The changelog for the version in use (breaking changes)
|
|
60
|
+
- Migration guides if upgrading from a previous version
|
|
61
|
+
- Known gotchas / FAQ sections
|
|
62
|
+
|
|
63
|
+
## Step 3 — Challenge each assumption
|
|
64
|
+
|
|
65
|
+
For every assumption in the plan, find the corresponding doc section and ask:
|
|
66
|
+
|
|
67
|
+
- "Does the real API actually work this way? Show me the doc."
|
|
68
|
+
- "Is this method available in the version you're using?"
|
|
69
|
+
- "Does this argument order match the actual signature?"
|
|
70
|
+
- "Are there rate limits, quotas, or timeout behaviors that affect this design?"
|
|
71
|
+
- "Is this marked as deprecated in the current version?"
|
|
72
|
+
|
|
73
|
+
Ask one question at a time. For each challenge, cite the specific URL and section.
|
|
74
|
+
|
|
75
|
+
## Step 4 — Surface hallucinations
|
|
76
|
+
|
|
77
|
+
When an assumption doesn't match the docs:
|
|
78
|
+
|
|
79
|
+
> "Your plan uses `library.doThing(a, b)` but the [docs](URL) show the signature is `doThing(config: {a, b})` with a config object. This will fail at runtime."
|
|
80
|
+
|
|
81
|
+
Document each discrepancy clearly.
|
|
82
|
+
|
|
83
|
+
## Step 5 — Update the plan
|
|
84
|
+
|
|
85
|
+
For each confirmed discrepancy, recommend a concrete fix:
|
|
86
|
+
- Correct method signature
|
|
87
|
+
- Correct argument order
|
|
88
|
+
- Alternative approach that matches what the library actually supports
|
|
89
|
+
- Whether a spike (`spike-prototype`) is needed to validate a remaining uncertainty
|
|
90
|
+
|
|
91
|
+
## Step 6 — Sign off
|
|
92
|
+
|
|
93
|
+
When all major assumptions have been validated against docs, report:
|
|
94
|
+
- Which assumptions were confirmed ✓
|
|
95
|
+
- Which were corrected ✗ + what the correct approach is
|
|
96
|
+
- Which remain uncertain → recommend `spike-prototype`
|
|
@@ -0,0 +1,38 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: grill-with-docs
|
|
3
|
+
description: "Doc-grounded variant of grill-me — stress-tests plan assumptions by fetching and citing real library or API documentation. Every challenge must cite a real URL. Use when the plan depends on a specific library or external API.model: opus"
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
|
|
7
|
+
# Grill With Docs
|
|
8
|
+
|
|
9
|
+
> **Use this vs grill-me:** `grill-with-docs` is the doc-grounded variant of `grill-me`. Use it when the plan relies on external libraries or APIs and every challenge must be grounded in and cite a real documentation URL. Use `grill-me` for context-only assumption surfacing without fetching docs.
|
|
10
|
+
|
|
11
|
+
> **HARD GATE** — Every challenge must cite a real documentation URL. No hallucinated APIs.
|
|
12
|
+
|
|
13
|
+
## Process
|
|
14
|
+
|
|
15
|
+
1. Read the plan or design under test (`specs/release-plan.yaml + epic shards`, INTERFACE-OPTIONS.md, etc.).
|
|
16
|
+
2. List assumptions that depend on external libraries or APIs.
|
|
17
|
+
3. For each assumption: fetch or quote official docs; challenge with "docs say X, plan says Y."
|
|
18
|
+
4. Resolve or update the plan inline; unresolved items block `plan-work`.
|
|
19
|
+
|
|
20
|
+
## Docs mode rules
|
|
21
|
+
|
|
22
|
+
- Cite URL + quoted snippet (method name, parameter, version).
|
|
23
|
+
- If docs contradict the plan, plan loses until updated.
|
|
24
|
+
- Prefer official docs over blog posts.
|
|
25
|
+
|
|
26
|
+
## Verify
|
|
27
|
+
|
|
28
|
+
→ verify: dialogue log contains at least one `https://` doc URL per challenged assumption
|
|
29
|
+
|
|
30
|
+
See [REFERENCE.md](REFERENCE.md) for question templates.
|
|
31
|
+
|
|
32
|
+
---
|
|
33
|
+
|
|
34
|
+
# Grill With Docs — Question templates
|
|
35
|
+
|
|
36
|
+
- "Docs at [URL] show signature `foo(bar?: Baz)`. Your plan calls `foo(bar, baz)` — which is correct?"
|
|
37
|
+
- "The changelog at [URL] deprecates X in v3. Your plan still uses X — migrate or pin version?"
|
|
38
|
+
- "Error handling in [URL] throws `NetworkError`. Your plan catches `Error` only — is that sufficient?"
|