bigpowers 2.26.0 → 2.28.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -1,3 +1,23 @@
1
+ # [2.28.0](https://github.com/danielvm-git/bigpowers/compare/v2.27.0...v2.28.0) (2026-06-22)
2
+
3
+
4
+ ### Features
5
+
6
+ * **audit-plan:** new skill — evaluate project plan for bigpowers readiness ([509deba](https://github.com/danielvm-git/bigpowers/commit/509deba5a45b3f688d087b11091fc854449c7e46))
7
+
8
+ # [2.27.0](https://github.com/danielvm-git/bigpowers/compare/v2.26.0...v2.27.0) (2026-06-22)
9
+
10
+
11
+ ### Bug Fixes
12
+
13
+ * **quick-fix:** add HARD GATE callout — completes 35/35 skills ([9782b85](https://github.com/danielvm-git/bigpowers/commit/9782b85de96fabc77c14fe8f7be3fd30b98e9cfb))
14
+
15
+
16
+ ### Features
17
+
18
+ * **2.9.0:** implement e22/e23/e24 — Depth release ([c33d66c](https://github.com/danielvm-git/bigpowers/commit/c33d66c6a386a6c273ab1f4c2a71981309e1cda6))
19
+ * **plan-release:** scope 2.9.0 Depth — e22/e23/e24 ([5a64698](https://github.com/danielvm-git/bigpowers/commit/5a646981c520f4874c4f2807d53af40e17b093fb))
20
+
1
21
  # [2.26.0](https://github.com/danielvm-git/bigpowers/compare/v2.25.0...v2.26.0) (2026-06-22)
2
22
 
3
23
 
package/SKILL-INDEX.md CHANGED
@@ -3,8 +3,8 @@
3
3
  > **DO NOT EDIT** — This file is auto-generated by `scripts/generate-skill-index.sh`.
4
4
  > Edit `SKILL.md` source files or `skills-lock.json` instead. Run `bash scripts/sync-skills.sh` to regenerate.
5
5
 
6
- **Generated:** 2026-06-22T03:20:14Z
7
- **Skills:** 68
6
+ **Generated:** 2026-06-22T13:26:01Z
7
+ **Skills:** 70
8
8
 
9
9
  ---
10
10
 
@@ -12,14 +12,14 @@
12
12
 
13
13
  | Phase | Count | Skills |
14
14
  |---|---|---|
15
- | Discover | 6 | `elaborate-spec, map-codebase, research-first, search-skills, survey-context, using-bigpowers` |
15
+ | Discover | 7 | `audit-plan, elaborate-spec, map-codebase, research-first, search-skills, survey-context, using-bigpowers` |
16
16
  | Design | 7 | `deepen-architecture, define-language, define-success, design-interface, grill-me, grill-with-docs, model-domain` |
17
17
  | Plan | 9 | `assess-impact, change-request, plan-refactor, plan-release, plan-work, run-planning, scope-work, seed-conventions, slice-tasks` |
18
18
  | Build | 18 | `align-grid, build-epic, craft-skill, deploy, develop-tdd, execute-plan, guard-git, hook-commits, kickoff-branch, orchestrate-project, publish-package, quick-fix, setup-environment, smoke-test, spike-prototype, validate-contracts, wire-ci, wire-observability` |
19
19
  | Verify | 12 | `audit-code, diagnose-root, enforce-first, fix-bug, inspect-quality, investigate-bug, request-review, respond-review, run-evals, trace-requirement, validate-fix, verify-work` |
20
20
  | Release | 2 | `commit-message, release-branch` |
21
- | Sustain | 13 | `compose-workflow, delegate-task, dispatch-agents, edit-document, evolve-skill, migrate-spec, organize-workspace, reset-baseline, session-state, simulate-agents, stocktake-skills, terse-mode, write-document` |
22
- | **TOTAL** | **67** | |
21
+ | Sustain | 14 | `compose-workflow, delegate-task, dispatch-agents, edit-document, evolve-skill, migrate-spec, organize-workspace, reset-baseline, run-benchmark, session-state, simulate-agents, stocktake-skills, terse-mode, write-document` |
22
+ | **TOTAL** | **69** | |
23
23
 
24
24
  ---
25
25
 
@@ -27,75 +27,77 @@
27
27
 
28
28
  | # | Phase | Skill | Description | Status |
29
29
  |---|---|---|---|---|
30
- | 1 | Discover | `elaborate-spec` | Refine a rough idea into a clear, detailed specification through dialogue. Does | ✅ Active |
31
- | 2 | Discover | `map-codebase` | "Derives the tech-stack doc from scratch by scanning the codebase analyzes s | ✅ Active |
32
- | 3 | Discover | `research-first` | Look-before-build search registries, repo, existing skills, and web for prio | ✅ Active |
33
- | 4 | Discover | `search-skills` | Find the right bigpowers skill from natural-language intent using a local lexica | ✅ Active |
34
- | 5 | Discover | `survey-context` | Per-task context bootstrap reads existing specs/ and tech-architecture docs | ✅ Active |
35
- | 6 | Discover | `using-bigpowers` | One-time bootstrap that introduces the bigpowers skills system, the PMBOK lifecy | ✅ Active |
36
- | 7 | Design | `deepen-architecture` | Find deepening opportunities in a codebase, informed by the domain language in s | ✅ Active |
37
- | 8 | Design | `define-language` | Extract a DDD-style ubiquitous language glossary from the current conversation, | ✅ Active |
38
- | 9 | Design | `define-success` | Convert an imperative task statement into explicit "step verify: <cmd>" pair | ✅ Active |
39
- | 10 | Design | `design-interface` | Generate multiple radically different interface designs for a module using paral | ✅ Active |
40
- | 11 | Design | `grill-me` | Interactive assumption-surfacing Q&A that stress-tests a plan through relentless | ✅ Active |
41
- | 12 | Design | `grill-with-docs` | Doc-grounded variant of grill-me — stress-tests plan assumptions by fetching a | ✅ Active |
42
- | 13 | Design | `model-domain` | Grilling session that challenges your plan against the existing domain model, sh | ✅ Active |
43
- | 14 | Plan | `assess-impact` | Analyze the blast radius of a proposed change before any code is written. Maps d | ✅ Active |
44
- | 15 | Plan | `change-request` | Add a new requirement or reorder epics by WSJF against specs/release-plan.yaml a | ✅ Active |
45
- | 16 | Plan | `plan-refactor` | Create a detailed refactor plan with tiny commits via user interview, then save | ✅ Active |
46
- | 17 | Plan | `plan-release` | "RELEASE-INDEX BUILDER Sequence elaborated epics into specs/release-plan.yam | ✅ Active |
47
- | 18 | Plan | `plan-work` | "PLANNING SPINE STEP 3 of 3 Plan the work: write detailed implementation tas | ✅ Active |
48
- | 19 | Plan | `run-planning` | "DISCOVER-PHASE ADVANCERDrive the discover-phase checklist (specs/planning- | ✅ Active |
49
- | 20 | Plan | `scope-work` | "PLANNING SPINE STEP 1 of 3 Scope the work: define what is in and out of sco | ✅ Active |
50
- | 21 | Plan | `seed-conventions` | Generate CLAUDE.md and CONVENTIONS.md for a brand-new project through a brief in | ✅ Active |
51
- | 22 | Plan | `slice-tasks` | "PLANNING SPINE STEP 2 of 3 Slice the work: break a scoped PRD into vertical | ✅ Active |
52
- | 23 | Build | `align-grid` | "Build editorial/magazine/report webpages on a GENUINE Müller-Brockmann modular | ✅ Active |
53
- | 24 | Build | `build-epic` | Eight-step epic build cycle reads state.yaml, execution-status.yaml, and one | ✅ Active |
54
- | 25 | Build | `craft-skill` | Create new bigpowers skills with proper structure, progressive disclosure, and b | ✅ Active |
55
- | 26 | Build | `deploy` | "Build verify artifact deploy wait smoke deployment pipeline. Pl | ✅ Active |
56
- | 27 | Build | `develop-tdd` | Test-driven development with red-green-refactor loop using vertical slices. Use | ✅ Active |
57
- | 28 | Build | `execute-plan` | Batch-execute tasks from the active epic capsule sequentially, with a human chec | ✅ Active |
58
- | 29 | Build | `guard-git` | Block dangerous git commands (push, force push, reset --hard, clean, branch -D, | ✅ Active |
59
- | 30 | Build | `hook-commits` | Set up pre-commit hooks with lint-staged (Prettier), type checking, and tests in | ✅ Active |
60
- | 31 | Build | `kickoff-branch` | Create a git worktree and feature branch, then verify a clean test baseline befo | ✅ Active |
61
- | 32 | Build | `orchestrate-project` | Meta-skill that enforces the 6-phase core loop (discover elaborate plan | ✅ Active |
62
- | 33 | Build | `publish-package` | "Package registry publishing for npm, crates.io, PyPI, and Homebrew. Verifies pr | ✅ Active |
63
- | 34 | Build | `quick-fix` | "Streamlined fast-path for trivial data-only fixes no TDD, no branching cere | ✅ Active |
64
- | 35 | Build | `setup-environment` | Pre-install dependencies and configure tools before development work begins. Use | ✅ Active |
65
- | 36 | Build | `smoke-test` | "Post-deploy health-check against a live URL. Validates HTTP status, response co | ✅ Active |
66
- | 37 | Build | `spike-prototype` | Throw-away prototype for unknown problem spaces. Output is learning notes in spe | ✅ Active |
67
- | 38 | Build | `validate-contracts` | "Assert data shape consistency across system boundaries live API responses a | ✅ Active |
68
- | 39 | Build | `wire-ci` | "CI pipeline setup with pre-built templates and local validation. Generates GitH | ✅ Active |
69
- | 40 | Build | `wire-observability` | Add structured JSON logging, observability commands, and idempotent setup script | ✅ Active |
70
- | 41 | Verify | `audit-code` | Self-review checklist for the coding agent to run before dispatching a reviewer. | ✅ Active |
71
- | 42 | Verify | `diagnose-root` | Run 4-phase root cause analysis reproduce, isolate, hypothesize, verify. Use | ✅ Active |
72
- | 43 | Verify | `enforce-first` | Apply the F.I.R.S.T test quality rubric (Fast, Independent, Repeatable, Self-Val | ✅ Active |
73
- | 44 | Verify | `fix-bug` | Bug fix orchestrator active_flow fix_bug; reads specs/bugs/BUG-*.md; chains | ✅ Active |
74
- | 45 | Verify | `inspect-quality` | Interactive QA session where user reports bugs or issues conversationally, and t | ✅ Active |
75
- | 46 | Verify | `investigate-bug` | Investigate a bug or issue by exploring the codebase to find root cause, then wr | ✅ Active |
76
- | 47 | Verify | `request-review` | Dispatch a fresh reviewer agent with a clean context to critique the code after | ✅ Active |
77
- | 48 | Verify | `respond-review` | Act on a reviewer agent's feedback systematically categorize findings, apply | ✅ Active |
78
- | 49 | Verify | `run-evals` | Eval-Driven Development define capability and regression evals before buildi | ✅ Active |
79
- | 50 | Verify | `trace-requirement` | Link story IDs from specs/release-plan.yaml + epic capsule directories to the im | ✅ Active |
80
- | 51 | Verify | `validate-fix` | Prove a fix works before declaring done re-run the failing test, run the ful | ✅ Active |
81
- | 52 | Verify | `verify-work` | Multi-phase UAT gatecold-start smoke, build, typecheck, lint, tests, step-b | ✅ Active |
82
- | 53 | Release | `commit-message` | Reviews working-tree changes, then drafts a Conventional Commits title/body and | ✅ Active |
83
- | 54 | Release | `release-branch` | Make the merge/PR/keep/discard decision for a feature branch, verify coverage ga | ✅ Active |
84
- | 55 | Sustain | `compose-workflow` | Chain multiple bigpowers skills into a custom workflow recipe saved in specs/. U | ✅ Active |
85
- | 56 | Sustain | `delegate-task` | Delegate one complex task to a single subagent, review its work in two stages be | ✅ Active |
86
- | 57 | Sustain | `dispatch-agents` | Dispatch multiple subagents in parallel on independent tasks. No waiting between | ✅ Active |
87
- | 58 | Sustain | `edit-document` | Edit and improve documents by restructuring sections, improving clarity, and tig | ✅ Active |
88
- | 59 | Sustain | `evolve-skill` | Benchmark-gated skill evolution consume bigpowers-benchmark report, propose | ✅ Active |
89
- | 60 | Sustain | `migrate-spec` | Detect GSD, spec-kit, or BMAD spec artifacts and transform them into bigpowers Y | ✅ Active |
90
- | 61 | Sustain | `organize-workspace` | Scans the active workspace for disposable artifacts—logs, caches, stale build | ✅ Active |
91
- | 62 | Sustain | `reset-baseline` | Restore the project to a known clean state between agent runs or experiments. Us | ✅ Active |
92
- | 63 | Sustain | `session-state` | Track implementation decisions and progress in specs/state.yaml to prevent conte | ✅ Active |
93
- | 64 | Sustain | `simulate-agents` | Run Mock User and Auditor agents against a feature in fresh contexts before huma | ✅ Active |
94
- | 65 | Sustain | `stocktake-skills` | Sequential subagent batch audit of the bigpowers skill catalog Quick Scan (c | ✅ Active |
95
- | 66 | Sustain | `terse-mode` | Fallback ultra-compressed communication mode. Cuts token usage ~75% by dropping | ✅ Active |
96
- | 67 | Sustain | `write-document` | Write, organize, and sync high-integrity technical documents using the BMAD meth | ✅ Active |
97
-
98
- **Total: 67 active skills.**
30
+ | 1 | Discover | `audit-plan` | Evaluate an incoming project plan against bigpowers principles and conventions, | ✅ Active |
31
+ | 2 | Discover | `elaborate-spec` | Refine a rough idea into a clear, detailed specification through dialogue. Does | ✅ Active |
32
+ | 3 | Discover | `map-codebase` | "Derives the tech-stack doc from scratch by scanning the codebase analyzes s | ✅ Active |
33
+ | 4 | Discover | `research-first` | Look-before-build search registries, repo, existing skills, and web for prio | ✅ Active |
34
+ | 5 | Discover | `search-skills` | Find the right bigpowers skill from natural-language intent using a local lexica | ✅ Active |
35
+ | 6 | Discover | `survey-context` | Per-task context bootstrap reads existing specs/ and tech-architecture docs | ✅ Active |
36
+ | 7 | Discover | `using-bigpowers` | One-time bootstrap that introduces the bigpowers skills system, the PMBOK lifecy | ✅ Active |
37
+ | 8 | Design | `deepen-architecture` | Find deepening opportunities in a codebase, informed by the domain language in s | ✅ Active |
38
+ | 9 | Design | `define-language` | Extract a DDD-style ubiquitous language glossary from the current conversation, | ✅ Active |
39
+ | 10 | Design | `define-success` | Convert an imperative task statement into explicit "step verify: <cmd>" pair | ✅ Active |
40
+ | 11 | Design | `design-interface` | Generate multiple radically different interface designs for a module using paral | ✅ Active |
41
+ | 12 | Design | `grill-me` | Interactive assumption-surfacing Q&A that stress-tests a plan through relentless | ✅ Active |
42
+ | 13 | Design | `grill-with-docs` | Doc-grounded variant of grill-me stress-tests plan assumptions by fetching a | ✅ Active |
43
+ | 14 | Design | `model-domain` | Grilling session that challenges your plan against the existing domain model, sh | ✅ Active |
44
+ | 15 | Plan | `assess-impact` | Analyze the blast radius of a proposed change before any code is written. Maps d | ✅ Active |
45
+ | 16 | Plan | `change-request` | Add a new requirement or reorder epics by WSJF against specs/release-plan.yaml a | ✅ Active |
46
+ | 17 | Plan | `plan-refactor` | Create a detailed refactor plan with tiny commits via user interview, then save | ✅ Active |
47
+ | 18 | Plan | `plan-release` | "RELEASE-INDEX BUILDERSequence elaborated epics into specs/release-plan.yam | ✅ Active |
48
+ | 19 | Plan | `plan-work` | "PLANNING SPINE STEP 3 of 3 Plan the work: write detailed implementation tas | ✅ Active |
49
+ | 20 | Plan | `run-planning` | "DISCOVER-PHASE ADVANCERDrive the discover-phase checklist (specs/planning- | ✅ Active |
50
+ | 21 | Plan | `scope-work` | "PLANNING SPINE STEP 1 of 3 Scope the work: define what is in and out of sco | ✅ Active |
51
+ | 22 | Plan | `seed-conventions` | Generate CLAUDE.md and CONVENTIONS.md for a brand-new project through a brief in | ✅ Active |
52
+ | 23 | Plan | `slice-tasks` | "PLANNING SPINE STEP 2 of 3 — Slice the work: break a scoped PRD into vertical | ✅ Active |
53
+ | 24 | Build | `align-grid` | "Build editorial/magazine/report webpages on a GENUINE Müller-Brockmann modular | ✅ Active |
54
+ | 25 | Build | `build-epic` | Eight-step epic build cycle reads state.yaml, execution-status.yaml, and one | ✅ Active |
55
+ | 26 | Build | `craft-skill` | Create new bigpowers skills with proper structure, progressive disclosure, and b | ✅ Active |
56
+ | 27 | Build | `deploy` | "Build verify artifact deploy wait → smoke deployment pipeline. Pl | ✅ Active |
57
+ | 28 | Build | `develop-tdd` | Test-driven development with red-green-refactor loop using vertical slices. Use | ✅ Active |
58
+ | 29 | Build | `execute-plan` | Batch-execute tasks from the active epic capsule sequentially, with a human chec | ✅ Active |
59
+ | 30 | Build | `guard-git` | Block dangerous git commands (push, force push, reset --hard, clean, branch -D, | ✅ Active |
60
+ | 31 | Build | `hook-commits` | Set up pre-commit hooks with lint-staged (Prettier), type checking, and tests in | ✅ Active |
61
+ | 32 | Build | `kickoff-branch` | Create a git worktree and feature branch, then verify a clean test baseline befo | ✅ Active |
62
+ | 33 | Build | `orchestrate-project` | Meta-skill that enforces the 6-phase core loop (discover elaborate plan | ✅ Active |
63
+ | 34 | Build | `publish-package` | "Package registry publishing for npm, crates.io, PyPI, and Homebrew. Verifies pr | ✅ Active |
64
+ | 35 | Build | `quick-fix` | "Streamlined fast-path for trivial data-only fixes no TDD, no branching cere | ✅ Active |
65
+ | 36 | Build | `setup-environment` | Pre-install dependencies and configure tools before development work begins. Use | ✅ Active |
66
+ | 37 | Build | `smoke-test` | "Post-deploy health-check against a live URL. Validates HTTP status, response co | ✅ Active |
67
+ | 38 | Build | `spike-prototype` | Throw-away prototype for unknown problem spaces. Output is learning notes in spe | ✅ Active |
68
+ | 39 | Build | `validate-contracts` | "Assert data shape consistency across system boundaries live API responses a | ✅ Active |
69
+ | 40 | Build | `wire-ci` | "CI pipeline setup with pre-built templates and local validation. Generates GitH | ✅ Active |
70
+ | 41 | Build | `wire-observability` | Add structured JSON logging, observability commands, and idempotent setup script | ✅ Active |
71
+ | 42 | Verify | `audit-code` | Self-review checklist for the coding agent to run before dispatching a reviewer. | ✅ Active |
72
+ | 43 | Verify | `diagnose-root` | Run 4-phase root cause analysis reproduce, isolate, hypothesize, verify. Use | ✅ Active |
73
+ | 44 | Verify | `enforce-first` | Apply the F.I.R.S.T test quality rubric (Fast, Independent, Repeatable, Self-Val | ✅ Active |
74
+ | 45 | Verify | `fix-bug` | Bug fix orchestrator active_flow fix_bug; reads specs/bugs/BUG-*.md; chains | ✅ Active |
75
+ | 46 | Verify | `inspect-quality` | Interactive QA session where user reports bugs or issues conversationally, and t | ✅ Active |
76
+ | 47 | Verify | `investigate-bug` | Investigate a bug or issue by exploring the codebase to find root cause, then wr | ✅ Active |
77
+ | 48 | Verify | `request-review` | Dispatch a fresh reviewer agent with a clean context to critique the code after | ✅ Active |
78
+ | 49 | Verify | `respond-review` | Act on a reviewer agent's feedback systematically categorize findings, apply | ✅ Active |
79
+ | 50 | Verify | `run-evals` | Eval-Driven Development define capability and regression evals before buildi | ✅ Active |
80
+ | 51 | Verify | `trace-requirement` | Link story IDs from specs/release-plan.yaml + epic capsule directories to the im | ✅ Active |
81
+ | 52 | Verify | `validate-fix` | Prove a fix works before declaring done re-run the failing test, run the ful | ✅ Active |
82
+ | 53 | Verify | `verify-work` | Multi-phase UAT gate — cold-start smoke, build, typecheck, lint, tests, step-b | ✅ Active |
83
+ | 54 | Release | `commit-message` | Reviews working-tree changes, then drafts a Conventional Commits title/body and | ✅ Active |
84
+ | 55 | Release | `release-branch` | Make the merge/PR/keep/discard decision for a feature branch, verify coverage ga | ✅ Active |
85
+ | 56 | Sustain | `compose-workflow` | Chain multiple bigpowers skills into a custom workflow recipe saved in specs/. U | ✅ Active |
86
+ | 57 | Sustain | `delegate-task` | Delegate one complex task to a single subagent, review its work in two stages be | ✅ Active |
87
+ | 58 | Sustain | `dispatch-agents` | Dispatch multiple subagents in parallel on independent tasks. No waiting between | ✅ Active |
88
+ | 59 | Sustain | `edit-document` | Edit and improve documents by restructuring sections, improving clarity, and tig | ✅ Active |
89
+ | 60 | Sustain | `evolve-skill` | Benchmark-gated skill evolution consume bigpowers-benchmark report, propose | ✅ Active |
90
+ | 61 | Sustain | `migrate-spec` | Detect GSD, spec-kit, or BMAD spec artifacts and transform them into bigpowers Y | ✅ Active |
91
+ | 62 | Sustain | `organize-workspace` | Scans the active workspace for disposable artifacts—logs, caches, stale build | ✅ Active |
92
+ | 63 | Sustain | `reset-baseline` | Restore the project to a known clean state between agent runs or experiments. Us | ✅ Active |
93
+ | 64 | Sustain | `run-benchmark` | Run skill quality benchmarks from specs/benchmarks/ definitions and write pass@k | ✅ Active |
94
+ | 65 | Sustain | `session-state` | Track implementation decisions and progress in specs/state.yaml to prevent conte | ✅ Active |
95
+ | 66 | Sustain | `simulate-agents` | Run Mock User and Auditor agents against a feature in fresh contexts before huma | ✅ Active |
96
+ | 67 | Sustain | `stocktake-skills` | Sequential subagent batch audit of the bigpowers skill catalog Quick Scan (c | ✅ Active |
97
+ | 68 | Sustain | `terse-mode` | Fallback ultra-compressed communication mode. Cuts token usage ~75% by dropping | ✅ Active |
98
+ | 69 | Sustain | `write-document` | Write, organize, and sync high-integrity technical documents using the BMAD meth | ✅ Active |
99
+
100
+ **Total: 69 active skills.**
99
101
 
100
102
  ---
101
103
 
@@ -0,0 +1,89 @@
1
+ ---
2
+ name: audit-plan
3
+ model: sonnet
4
+ description: Evaluate an incoming project plan against bigpowers principles and conventions, surface gaps, and produce a READY/NOT READY verdict before engagement begins. Use when a new project arrives, when adapting a foreign plan, or before running seed-conventions on an unfamiliar codebase.
5
+ ---
6
+
7
+ # Audit Plan
8
+
9
+ > **HARD GATE** — Do NOT start build skills (kickoff-branch, develop-tdd) until audit-plan returns a READY verdict. A plan missing test commands, scope boundaries, or success criteria will produce drift and rework downstream.
10
+
11
+ Assess an incoming project plan for alignment with bigpowers principles, identify what's missing, and produce a structured readiness report before any skill execution begins.
12
+
13
+ ## Three lenses
14
+
15
+ ### 1. Principles alignment
16
+ - Are stories vertical slices (not horizontal layers)?
17
+ - Is scope bounded — explicit in_scope + out_of_scope?
18
+ - Are success criteria defined (how do we know we're done)?
19
+ - Are HARD GATE candidates identifiable (critical decision points)?
20
+ - Is there a domain language / ubiquitous terminology?
21
+
22
+ ### 2. Conventions completeness
23
+ - Does `CLAUDE.md` or `AGENTS.md` exist?
24
+ - Does `CONVENTIONS.md` exist?
25
+ - Is the `specs/` directory layout in place?
26
+ - Are commit conventions documented (Conventional Commits)?
27
+ - Is the git workflow mode identified (`solo-git` | `team-pr`)?
28
+
29
+ ### 3. Bigpowers pre-flight (must all be answered before build)
30
+ | Question | Why |
31
+ |----------|-----|
32
+ | What is the **test command**? | `develop-tdd` verify steps require it |
33
+ | What is the **build command**? | `verify-work` mechanical gate |
34
+ | What is the **lint command**? | `audit-code` lint gate |
35
+ | What is the **typecheck command**? | `verify-work` typecheck gate |
36
+ | What **CI platform** is in use? | `wire-ci` configuration |
37
+ | **Solo or team**? | `release-branch` integration mode |
38
+ | Primary **language + framework**? | model routing + conventions |
39
+ | **Greenfield or existing** codebase? | determines whether to run `seed-conventions` or `migrate-spec` first |
40
+
41
+ ## Process
42
+
43
+ 1. **Ingest the plan** — accept a file path, pasted PRD text, or existing `specs/` artifacts. Read `CLAUDE.md` and `CONVENTIONS.md` if present.
44
+
45
+ 2. **Score each lens** — for every item above, mark:
46
+ - ✅ Present and adequate
47
+ - ⚠️ Present but incomplete — note what's missing
48
+ - ❌ Absent
49
+
50
+ 3. **Close gaps conversationally** — for each ❌ or ⚠️, ask one question at a time. Record each answer before moving to the next.
51
+
52
+ 4. **Write `specs/PLAN-AUDIT.md`**:
53
+
54
+ ```markdown
55
+ # Plan Audit — <project>
56
+ **Date:** YYYY-MM-DD · **Verdict:** READY | NOT READY
57
+
58
+ ## Principles Alignment
59
+ | Check | Status | Note |
60
+ | Vertical slices | ✅ | 4 stories, each shippable |
61
+ | Scope bounded | ⚠️ | in_scope present; out_of_scope missing |
62
+
63
+ ## Conventions Completeness
64
+ | Check | Status | Note |
65
+
66
+ ## Pre-flight Answers
67
+ | Command | Value |
68
+ | test | `npm test` |
69
+ | build | `npm run build` |
70
+
71
+ ## Open Gaps
72
+ - [ ] Add out_of_scope to scope definition (run scope-work)
73
+ - [ ] Create CLAUDE.md (run seed-conventions)
74
+
75
+ ## Verdict
76
+ READY — proceed with survey-context
77
+ NOT READY — N gaps remain; close before proceeding
78
+ ```
79
+
80
+ 5. **Recommend next skill**:
81
+ - READY → `survey-context`
82
+ - Needs bootstrapping → `seed-conventions`
83
+ - Needs spec elaboration → `elaborate-spec`
84
+ - Has foreign spec format → `migrate-spec`
85
+ - Plan assumptions need challenging → `grill-me`
86
+
87
+ ## Verify
88
+
89
+ → verify: `test -f specs/PLAN-AUDIT.md && grep -q 'Verdict' specs/PLAN-AUDIT.md && echo OK || echo FAIL`
@@ -72,7 +72,26 @@ Summarize your understanding in 3–5 bullet points aligned with [countable-stor
72
72
 
73
73
  Ask: "Is this an accurate summary? Anything missing or wrong?"
74
74
 
75
- ### 5. Suggest next skill
75
+ ### 5. Write specs/planning-context.yaml
76
+
77
+ After the user confirms the summary in step 4, persist the key decisions:
78
+
79
+ ```yaml
80
+ # specs/planning-context.yaml — written by elaborate-spec; consumed by scope-work and slice-tasks
81
+ feature_name: "<from step 1>"
82
+ problem_statement: "<one paragraph>"
83
+ constraints:
84
+ - "<constraint 1>"
85
+ out_of_scope:
86
+ - "<excluded item 1>"
87
+ key_decisions:
88
+ - decision: "<what was decided>"
89
+ rationale: "<why>"
90
+ ```
91
+
92
+ If `specs/planning-context.yaml` already exists, ask: `"Planning context from a prior session exists. Update it? [Y/n]"`. Overwrite on Y; leave unchanged on N.
93
+
94
+ ### 6. Suggest next skill
76
95
 
77
96
  Once the spec is clear, recommend the next step:
78
97
  - If domain model needs work → `model-domain`
@@ -10,15 +10,22 @@ model: opus
10
10
 
11
11
  ## Loop
12
12
 
13
- 1. Run `bigpowers-benchmark` (external repo); save report path in state.yaml.
14
- 2. Identify target skill + measurable gap from report.
15
- 3. `plan-work`minimal change proposal with verify commands.
16
- 4. Edit via `craft-skill` / direct SKILL.md edit; run `sync-skills.sh`.
17
- 5. Re-run benchmark; compare scores.
18
- 6. Record decision in `specs/adr/` + `session-state`; revert if regression.
13
+ 1. **Establish baseline** — Run `run-benchmark <skill> --baseline`. If no definition exists at `specs/benchmarks/<skill>.yaml`, create one following `specs/benchmarks/SCHEMA.md` first. Save report path in `state.yaml`. If `specs/benchmarks/reports/BASELINE-<skill>.yaml` already exists, skip this step.
14
+
15
+ 2. **Identify gap** Read the baseline report (`specs/benchmarks/reports/BASELINE-<skill>.yaml`). Find scenarios with `result: FAIL` or low `pass_at_k`. This is the measurable gap.
16
+
17
+ 3. **`plan-work`** Write a minimal change proposal targeting the failing scenarios. Include verify commands.
18
+
19
+ 4. **Edit** via `craft-skill` / direct SKILL.md edit; run `bash scripts/sync-skills.sh`.
20
+
21
+ 5. **Re-run benchmark** — `run-benchmark <skill>`. Compare new `pass_at_k` against baseline.
22
+ - **IMPROVED or STABLE** → advance to step 6.
23
+ - **REGRESSION** (`new pass_at_k < baseline`) → revert the change and loop back to step 3.
24
+
25
+ 6. **Record decision** — Write `specs/adr/NNNN-evolve-<skill>.md` with before/after `pass_at_k` scores. Update `session-state`.
19
26
 
20
27
  ## Verify
21
28
 
22
- → verify: benchmark report shows post-change score baseline (document paths in state.yaml)
29
+ → verify: `grep -c 'run-benchmark\|pass_at_k\|BASELINE-' evolve-skill/SKILL.md | awk '{if($1>=2) print "OK"; else print "FAIL"}'`
23
30
 
24
31
  See [REFERENCE.md](REFERENCE.md) for ADR template.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "bigpowers",
3
- "version": "2.26.0",
3
+ "version": "2.28.0",
4
4
  "description": "61 agent skills for spec-driven, test-first software development by solo developers",
5
5
  "main": "index.js",
6
6
  "scripts": {
@@ -7,6 +7,8 @@ model: sonnet
7
7
 
8
8
  # Quick Fix
9
9
 
10
+ > **HARD GATE** — ALL entry criteria must pass before invoking quick-fix. If any guardrail triggers during execution, abort immediately and fall back to `investigate-bug`. Do NOT use quick-fix for logic changes, multi-file edits, or diffs > 5 lines.
11
+
10
12
  Fast-track for trivial data-only fixes that do not require the full bug-fix chain.
11
13
 
12
14
  When a bug fix is purely data — an add-missing-key, a typo correction, a config value update — the standard 6-skill chain (investigate-bug → diagnose-root → develop-tdd → kickoff-branch → verify-work → release-branch) is wasteful overhead. Quick-fix collapses it to 2 skills: **quick-fix** then **release-branch**.
@@ -0,0 +1,70 @@
1
+ ---
2
+ name: run-benchmark
3
+ model: haiku
4
+ description: Run skill quality benchmarks from specs/benchmarks/ definitions and write pass@k reports. Use before and after evolve-skill to prove quality changes are improvements, not regressions.
5
+ ---
6
+
7
+ # Run Benchmark
8
+
9
+ > **HARD GATE** — Do NOT use benchmark scores to declare a skill "good" or "bad" in isolation. Benchmarks measure relative quality vs. a baseline — they catch regressions, they do not certify correctness.
10
+
11
+ Reads benchmark definitions from `specs/benchmarks/`, executes each scenario's grader, and writes a structured `pass@k` report that `evolve-skill` consumes.
12
+
13
+ ## Usage
14
+
15
+ ```bash
16
+ # Benchmark a single skill
17
+ run-benchmark <skill-name>
18
+
19
+ # Benchmark all skills with definitions
20
+ run-benchmark --all
21
+
22
+ # Pin current results as baseline
23
+ run-benchmark <skill-name> --baseline
24
+ ```
25
+
26
+ ## Process
27
+
28
+ 1. **Locate definition** — Read `specs/benchmarks/<skill>.yaml`. If absent, report: `"No benchmark definition found for <skill>. Create specs/benchmarks/<skill>.yaml first."` and stop.
29
+
30
+ 2. **Run each scenario** — For each scenario in `scenarios[]`:
31
+ - **Code grader:** Run `grader.command` in repo root via `bash -c`. Exit 0 → PASS. Non-zero → FAIL. Timeout: 15 seconds.
32
+ - **Rubric grader:** Present each criterion to the agent as a yes/no question about the scenario output. ≥ 80% yes → PASS, else FAIL.
33
+
34
+ 3. **Calculate pass@k** — `pass@k = sum(weight of PASS scenarios) / sum(all weights)`. Round to 2 decimal places.
35
+
36
+ 4. **Write report** to `specs/benchmarks/reports/BENCHMARK-<skill>-<YYYY-MM-DD>.yaml`:
37
+
38
+ ```yaml
39
+ skill: survey-context
40
+ run_date: "2026-06-22"
41
+ pass_at_k: 0.83
42
+ total_scenarios: 3
43
+ passed: 2
44
+ failed: 1
45
+ scenarios:
46
+ - id: s01
47
+ name: "detects active epic from state.yaml"
48
+ result: PASS
49
+ weight: 1.0
50
+ - id: s02
51
+ name: "reads release-plan.yaml and reports next epic"
52
+ result: PASS
53
+ weight: 1.0
54
+ - id: s03
55
+ name: "handles missing state.yaml gracefully"
56
+ result: FAIL
57
+ weight: 0.5
58
+ failure_note: "crashed instead of suggesting state.yaml creation"
59
+ ```
60
+
61
+ 5. **Baseline mode** (`--baseline`) — Copy the report to `specs/benchmarks/reports/BASELINE-<skill>.yaml`. This is the reference point for regression checks in `evolve-skill`.
62
+
63
+ 6. **Compare to baseline** — If a `BASELINE-<skill>.yaml` exists, compare `pass_at_k`. Report:
64
+ - `IMPROVED: 0.67 → 0.83`
65
+ - `REGRESSION: 0.83 → 0.67 — do NOT ship this change`
66
+ - `STABLE: 0.83 = 0.83`
67
+
68
+ ## Verify
69
+
70
+ → verify: `test -f run-benchmark/SKILL.md && grep -q 'pass_at_k\|pass.at.k' run-benchmark/SKILL.md && echo OK || echo FAIL`
@@ -37,6 +37,21 @@ Each key maps to a skill invocation. Optional keys can be skipped; required keys
37
37
 
38
38
  2. **Find next step** — Find the first workflow key with `status: pending`. If the key is `optional`, check if the user wants to run it. If not, mark it `skipped`.
39
39
 
40
+ 2a. **Context capsule check** — Before invoking `elaborate-spec`, check whether a fresh `specs/planning-context.yaml` exists:
41
+ ```bash
42
+ test -f specs/planning-context.yaml && python3 -c "
43
+ import yaml, datetime
44
+ d = yaml.safe_load(open('specs/planning-context.yaml'))
45
+ written = d.get('written_at','')
46
+ if written:
47
+ age = (datetime.datetime.now(datetime.timezone.utc) - datetime.datetime.fromisoformat(written)).total_seconds() / 3600
48
+ print(f'Context age: {age:.1f}h')
49
+ " 2>/dev/null || echo "No context or no written_at"
50
+ ```
51
+ - If context is **< 24h old**, ask: `"Planning context from Xh ago exists for '<feature_name>'. Re-run elaborate-spec? [y/N]"`. Skip elaborate-spec on N.
52
+ - If context is **≥ 24h old** or absent, run elaborate-spec normally.
53
+ - On planning cycle completion (all required keys done), clear the capsule: delete `specs/planning-context.yaml` and set `planning-status.yaml` `context_capsule: null`.
54
+
40
55
  3. **Invoke the matching skill** — Run the skill that matches the workflow key:
41
56
  - `survey-context` — where are we?
42
57
  - `scope-work` — what's in and out?
@@ -53,6 +68,10 @@ Each key maps to a skill invocation. Optional keys can be skipped; required keys
53
68
 
54
69
  In `specs/planning-status.yaml`:
55
70
  ```yaml
71
+ context_capsule: # written by elaborate-spec; cleared on cycle completion
72
+ written_at: "2026-06-22T03:00:00Z"
73
+ written_by: elaborate-spec
74
+ feature_name: "add dark mode"
56
75
  workflows:
57
76
  survey-context:
58
77
  required: true
@@ -18,6 +18,12 @@ Turn the current conversation into a bounded PRD at `specs/product/SCOPE_LATEST.
18
18
 
19
19
  ## Process
20
20
 
21
+ 0. **Read planning-context.yaml** — If `specs/planning-context.yaml` exists, read it before doing anything else:
22
+ ```bash
23
+ test -f specs/planning-context.yaml && echo "Context found" || echo "No context — starting fresh"
24
+ ```
25
+ Pre-populate `feature_name`, `constraints`, and `out_of_scope` from the file. Skip re-asking questions already answered by elaborate-spec. If the file is absent, proceed normally.
26
+
21
27
  1. **Gather context** — Read existing `specs/` artifacts (`release-plan.yaml`, `plans/TECH_STACK_LATEST.md`, `requirements/VISION_LATEST.yaml` if any). Understand what the project is building and why.
22
28
 
23
29
  2. **Interview (if needed)** — Clarify: What is the goal? Who are the users? What is definitely in scope? What is explicitly out of scope? What constraints exist (time, budget, tech)? How will success be measured?
@@ -23,6 +23,7 @@ PHASE_MAP=(
23
23
  [using-bigpowers]="Discover"
24
24
  [map-codebase]="Discover"
25
25
  [elaborate-spec]="Discover"
26
+ [audit-plan]="Discover"
26
27
  # Elaborate / Design
27
28
  [model-domain]="Design"
28
29
  [define-language]="Design"
@@ -84,6 +85,7 @@ PHASE_MAP=(
84
85
  [reset-baseline]="Sustain"
85
86
  [stocktake-skills]="Sustain"
86
87
  [evolve-skill]="Sustain"
88
+ [run-benchmark]="Sustain"
87
89
  [terse-mode]="Sustain"
88
90
  [delegate-task]="Sustain"
89
91
  [dispatch-agents]="Sustain"
@@ -0,0 +1,57 @@
1
+ #!/usr/bin/env bash
2
+ # Run all SKILL.md → verify: commands and report PASS/FAIL/SKIP.
3
+ # Exit 0 only when zero FAILs.
4
+ # Usage: bash scripts/run-skill-verify.sh [skill-name]
5
+ # No args: runs all skills
6
+ # With arg: runs only the named skill
7
+
8
+ set -uo pipefail
9
+
10
+ REPO_ROOT="$(cd "$(dirname "$0")/.." && pwd)"
11
+ cd "$REPO_ROOT"
12
+
13
+ PASS=0; FAIL=0; SKIP=0
14
+ TARGET="${1:-}"
15
+
16
+ run_skill() {
17
+ local skill_md="$1"
18
+ local skill
19
+ skill=$(dirname "$skill_md")
20
+
21
+ local cmd
22
+ cmd=$(grep '→ verify:' "$skill_md" 2>/dev/null | head -1 | sed 's/.*→ verify: *//')
23
+
24
+ if [ -z "$cmd" ]; then
25
+ echo "SKIP: $skill"
26
+ SKIP=$((SKIP + 1))
27
+ return
28
+ fi
29
+
30
+ local output
31
+ if output=$(timeout 10 bash -c "$cmd" 2>&1); then
32
+ echo "PASS: $skill"
33
+ PASS=$((PASS + 1))
34
+ else
35
+ echo "FAIL: $skill — $cmd"
36
+ echo " output: $(echo "$output" | head -1)"
37
+ FAIL=$((FAIL + 1))
38
+ fi
39
+ }
40
+
41
+ if [ -n "$TARGET" ]; then
42
+ if [ -f "$TARGET/SKILL.md" ]; then
43
+ run_skill "$TARGET/SKILL.md"
44
+ else
45
+ echo "ERROR: $TARGET/SKILL.md not found"
46
+ exit 1
47
+ fi
48
+ else
49
+ for skill_md in */SKILL.md; do
50
+ run_skill "$skill_md"
51
+ done
52
+ fi
53
+
54
+ echo ""
55
+ echo "Results: $PASS PASS, $FAIL FAIL, $SKIP SKIP"
56
+
57
+ [ "$FAIL" -eq 0 ]