@kennethsolomon/shipkit 3.10.2 → 3.11.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +92 -4
- package/commands/sk/context-budget.md +5 -0
- package/commands/sk/eval.md +5 -0
- package/commands/sk/health.md +5 -0
- package/commands/sk/help.md +32 -8
- package/commands/sk/learn.md +5 -0
- package/commands/sk/resume-session.md +5 -0
- package/commands/sk/safety-guard.md +5 -0
- package/commands/sk/save-session.md +5 -0
- package/commands/sk/set-profile.md +8 -0
- package/package.json +1 -1
- package/skills/sk:brainstorming/SKILL.md +13 -0
- package/skills/sk:context-budget/SKILL.md +126 -0
- package/skills/sk:eval/SKILL.md +188 -0
- package/skills/sk:health/SKILL.md +146 -0
- package/skills/sk:learn/SKILL.md +138 -0
- package/skills/sk:resume-session/SKILL.md +95 -0
- package/skills/sk:safety-guard/SKILL.md +134 -0
- package/skills/sk:save-session/SKILL.md +84 -0
- package/skills/sk:setup-claude/SKILL.md +39 -2
- package/skills/sk:setup-claude/templates/.claude/settings.json.template +110 -26
- package/skills/sk:setup-claude/templates/CLAUDE.md.template +8 -1
- package/skills/sk:setup-claude/templates/hooks/config-protection.sh +71 -0
- package/skills/sk:setup-claude/templates/hooks/console-log-warning.sh +42 -0
- package/skills/sk:setup-claude/templates/hooks/cost-tracker.sh +26 -0
- package/skills/sk:setup-claude/templates/hooks/post-edit-format.sh +53 -0
- package/skills/sk:setup-claude/templates/hooks/safety-guard.sh +72 -0
- package/skills/sk:setup-claude/templates/hooks/suggest-compact.sh +35 -0
- package/skills/sk:setup-optimizer/SKILL.md +59 -8
- package/skills/sk:start/SKILL.md +25 -0
package/README.md
CHANGED
|
@@ -48,6 +48,44 @@ That's it. `/sk:setup-claude` creates your project scaffolding: planning files,
|
|
|
48
48
|
|
|
49
49
|
`/sk:start` is the recommended entry point — it classifies your task and routes you to the optimal flow automatically. You can also jump directly to `/sk:brainstorm`, `/sk:debug`, or any other flow entry point.
|
|
50
50
|
|
|
51
|
+
### Updating ShipKit
|
|
52
|
+
|
|
53
|
+
```bash
|
|
54
|
+
# Update the package
|
|
55
|
+
npm install -g @kennethsolomon/shipkit && shipkit
|
|
56
|
+
|
|
57
|
+
# Then in each project, update CLAUDE.md + deploy new hooks:
|
|
58
|
+
/sk:setup-optimizer
|
|
59
|
+
```
|
|
60
|
+
|
|
61
|
+
`shipkit` re-installs all skills and commands globally. `/sk:setup-optimizer` updates each project's CLAUDE.md with new commands and deploys any missing hooks.
|
|
62
|
+
|
|
63
|
+
---
|
|
64
|
+
|
|
65
|
+
## Lifecycle Hooks
|
|
66
|
+
|
|
67
|
+
`/sk:setup-claude` installs lifecycle hooks that automate common tasks. Core hooks are always installed; enhanced hooks are opt-in.
|
|
68
|
+
|
|
69
|
+
**Core hooks (always installed):**
|
|
70
|
+
| Hook | Event | What it does |
|
|
71
|
+
|------|-------|-------------|
|
|
72
|
+
| `session-start` | SessionStart | Loads branch, recent commits, tech debt, code health |
|
|
73
|
+
| `session-stop` | Stop | Logs session accomplishments to `tasks/progress.md` |
|
|
74
|
+
| `pre-compact` | PreCompact | Saves git state before context compression |
|
|
75
|
+
| `validate-commit` | PreToolUse (git commit) | Validates conventional commit format, detects secrets |
|
|
76
|
+
| `validate-push` | PreToolUse (git push) | Warns before pushing to protected branches |
|
|
77
|
+
| `log-agent` | SubagentStart | Logs sub-agent invocations to `tasks/agent-audit.log` |
|
|
78
|
+
|
|
79
|
+
**Enhanced hooks (opt-in via `/sk:setup-claude` or `/sk:setup-optimizer`):**
|
|
80
|
+
| Hook | Event | What it does |
|
|
81
|
+
|------|-------|-------------|
|
|
82
|
+
| `config-protection` | PreToolUse (Edit/Write) | Blocks modifications to linter/formatter configs |
|
|
83
|
+
| `post-edit-format` | PostToolUse (Edit) | Auto-formats with Biome/Prettier/Pint/gofmt after edits |
|
|
84
|
+
| `console-log-warning` | Stop | Warns about `console.log`, `dd()`, `var_dump()` in modified files |
|
|
85
|
+
| `suggest-compact` | PreToolUse (Edit/Write) | Suggests `/compact` after 50+ tool calls |
|
|
86
|
+
| `cost-tracker` | Stop | Logs session metadata to `.claude/sessions/cost-log.jsonl` |
|
|
87
|
+
| `safety-guard` | PreToolUse (Bash/Edit/Write) | Enforces `/sk:safety-guard` freeze/careful mode |
|
|
88
|
+
|
|
51
89
|
---
|
|
52
90
|
|
|
53
91
|
## Pick Your Flow
|
|
@@ -166,15 +204,56 @@ Pre-existing issues are logged to `tasks/tech-debt.md` — not fixed inline.
|
|
|
166
204
|
|
|
167
205
|
Use these anytime — they're not part of any workflow.
|
|
168
206
|
|
|
207
|
+
### Intelligence
|
|
208
|
+
|
|
209
|
+
| Command | Usage | What it does |
|
|
210
|
+
|---------|-------|-------------|
|
|
211
|
+
| `/sk:learn` | `/sk:learn` | Extract reusable patterns from the session with confidence scoring (0.3-0.9) |
|
|
212
|
+
| `/sk:learn` | `/sk:learn --list` | Show all learned patterns |
|
|
213
|
+
| `/sk:context-budget` | `/sk:context-budget` | Audit token consumption across skills, agents, MCP tools, CLAUDE.md |
|
|
214
|
+
| `/sk:context-budget` | `/sk:context-budget --verbose` | Per-file token breakdown |
|
|
215
|
+
| `/sk:health` | `/sk:health` | Scorecard across 7 categories (0-70): tools, context, gates, memory, evals, security, cost |
|
|
216
|
+
| `/sk:eval` | `/sk:eval define auth` | Define eval criteria before coding |
|
|
217
|
+
| `/sk:eval` | `/sk:eval check auth` | Run evals during implementation |
|
|
218
|
+
| `/sk:eval` | `/sk:eval report` | Summary of all eval results with pass@k metrics |
|
|
219
|
+
|
|
220
|
+
### Session Management
|
|
221
|
+
|
|
222
|
+
| Command | Usage | What it does |
|
|
223
|
+
|---------|-------|-------------|
|
|
224
|
+
| `/sk:save-session` | `/sk:save-session` | Save branch, task, progress, open questions to `.claude/sessions/` |
|
|
225
|
+
| `/sk:save-session` | `/sk:save-session --name "auth-flow"` | Save with a custom name |
|
|
226
|
+
| `/sk:resume-session` | `/sk:resume-session` | List saved sessions and pick one to restore |
|
|
227
|
+
| `/sk:resume-session` | `/sk:resume-session --latest` | Auto-pick most recent session |
|
|
228
|
+
| `/sk:context` | `/sk:context` | Load all project context (automatic via hooks on session start) |
|
|
229
|
+
|
|
230
|
+
### Safety
|
|
231
|
+
|
|
232
|
+
| Command | Usage | What it does |
|
|
233
|
+
|---------|-------|-------------|
|
|
234
|
+
| `/sk:safety-guard` | `/sk:safety-guard careful` | Block destructive commands (rm -rf, force push, etc.) |
|
|
235
|
+
| `/sk:safety-guard` | `/sk:safety-guard freeze --dir src/` | Lock edits to `src/` only |
|
|
236
|
+
| `/sk:safety-guard` | `/sk:safety-guard guard --dir src/` | Both careful + freeze combined |
|
|
237
|
+
| `/sk:safety-guard` | `/sk:safety-guard off` | Disable all guards |
|
|
238
|
+
| `/sk:safety-guard` | `/sk:safety-guard status` | Show current mode + blocked action count |
|
|
239
|
+
|
|
240
|
+
### Code Quality
|
|
241
|
+
|
|
169
242
|
| Command | When to use |
|
|
170
243
|
|---------|------------|
|
|
171
244
|
| `/sk:scope-check` | Mid-implementation — detect scope creep (On Track / Minor / Significant / Out of Control) |
|
|
172
245
|
| `/sk:retro` | After shipping — analyze velocity, blockers, patterns, generate action items |
|
|
246
|
+
| `/sk:seo-audit` | Web projects — SEO audit with source + dev server scanning |
|
|
247
|
+
|
|
248
|
+
### Documentation & Setup
|
|
249
|
+
|
|
250
|
+
| Command | When to use |
|
|
251
|
+
|---------|------------|
|
|
173
252
|
| `/sk:reverse-doc` | Inherited codebase — generate architecture/design docs from existing code |
|
|
253
|
+
| `/sk:setup-optimizer` | Maintenance — diagnose, update workflow, deploy hooks, enrich CLAUDE.md |
|
|
254
|
+
| `/sk:mvp` | New idea — generate a complete MVP app from a single prompt |
|
|
174
255
|
| `/sk:status` | Quick view of workflow and task status |
|
|
175
256
|
| `/sk:dashboard` | Visual Kanban board across all git worktrees |
|
|
176
|
-
| `/sk:mvp` | Generate a complete MVP app from a single idea prompt |
|
|
177
|
-
| `/sk:seo-audit` | SEO audit for web projects |
|
|
178
257
|
|
|
179
258
|
---
|
|
180
259
|
|
|
@@ -193,7 +272,7 @@ Use these anytime — they're not part of any workflow.
|
|
|
193
272
|
## All Commands
|
|
194
273
|
|
|
195
274
|
<details>
|
|
196
|
-
<summary><strong>
|
|
275
|
+
<summary><strong>51 commands</strong> — click to expand</summary>
|
|
197
276
|
|
|
198
277
|
| Command | Purpose |
|
|
199
278
|
|---------|---------|
|
|
@@ -205,33 +284,42 @@ Use these anytime — they're not part of any workflow.
|
|
|
205
284
|
| `/sk:change` | Handle mid-workflow requirement changes |
|
|
206
285
|
| `/sk:config` | View/edit project config |
|
|
207
286
|
| `/sk:context` | Load project context (automatic via hooks) |
|
|
287
|
+
| `/sk:context-budget` | Audit context window token consumption |
|
|
208
288
|
| `/sk:dashboard` | Live Kanban board — sk:dashboard across worktrees |
|
|
209
289
|
| `/sk:debug` | Structured bug investigation |
|
|
210
290
|
| `/sk:e2e` | E2E Tests — behavioral verification |
|
|
291
|
+
| `/sk:eval` | Define, run, and report evals for agent reliability |
|
|
211
292
|
| `/sk:execute-plan` | Execute plan checkboxes in batches |
|
|
212
293
|
| `/sk:fast-track` | Small changes — skip planning, keep gates |
|
|
213
294
|
| `/sk:features` | Sync feature specs with codebase |
|
|
214
295
|
| `/sk:finish-feature` | Changelog + PR |
|
|
215
296
|
| `/sk:frontend-design` | UI mockup + optional Pencil visual design |
|
|
216
297
|
| `/sk:gates` | All quality gates in parallel batches |
|
|
298
|
+
| `/sk:health` | Harness self-audit scorecard |
|
|
217
299
|
| `/sk:help` | Show all commands |
|
|
218
300
|
| `/sk:hotfix` | Emergency fix workflow |
|
|
219
301
|
| `/sk:laravel-init` | Configure existing Laravel project |
|
|
220
302
|
| `/sk:laravel-new` | Scaffold fresh Laravel app |
|
|
303
|
+
| `/sk:learn` | Extract reusable patterns from sessions |
|
|
221
304
|
| `/sk:lint` | Auto-detect and run all linters |
|
|
222
305
|
| `/sk:mvp` | Generate MVP app from a prompt |
|
|
223
306
|
| `/sk:perf` | Performance audit |
|
|
224
307
|
| `/sk:plan` | Create/refresh planning files |
|
|
225
308
|
| `/sk:release` | Version bump + tag (`--android` / `--ios` for store audit) |
|
|
309
|
+
| `/sk:resume-session` | Resume a previously saved session |
|
|
226
310
|
| `/sk:retro` | Post-ship retrospective |
|
|
227
311
|
| `/sk:reverse-doc` | Generate docs from existing code |
|
|
228
312
|
| `/sk:review` | 7-dimension code review |
|
|
313
|
+
| `/sk:safety-guard` | Protect against destructive ops |
|
|
314
|
+
| `/sk:save-session` | Save session state for continuity |
|
|
229
315
|
| `/sk:schema-migrate` | Database schema change analysis |
|
|
230
316
|
| `/sk:scope-check` | Detect scope creep mid-implementation |
|
|
231
317
|
| `/sk:security-check` | OWASP security audit |
|
|
232
|
-
| `/sk:seo-audit` |
|
|
318
|
+
| `/sk:seo-audit` | SEO audit for web projects |
|
|
233
319
|
| `/sk:set-profile` | Switch model routing profile |
|
|
234
320
|
| `/sk:setup-claude` | Bootstrap project scaffolding |
|
|
321
|
+
| `/sk:setup-optimizer` | Diagnose + update workflow + deploy hooks + enrich CLAUDE.md |
|
|
322
|
+
| `/sk:skill-creator` | Create or improve skills |
|
|
235
323
|
| `/sk:smart-commit` | Conventional commit with approval |
|
|
236
324
|
| `/sk:start` | Smart entry point — classifies task, routes to optimal flow |
|
|
237
325
|
| `/sk:status` | Show workflow + task status |
|
|
@@ -0,0 +1,5 @@
|
|
|
1
|
+
---
|
|
2
|
+
description: "Audit context window token consumption and find optimization opportunities."
|
|
3
|
+
---
|
|
4
|
+
|
|
5
|
+
Use the `sk:context-budget` skill to inventory all components consuming context tokens (agents, skills, rules, MCP tools, CLAUDE.md), classify usage frequency, detect bloat, and recommend top 3 optimizations with estimated token savings.
|
|
@@ -0,0 +1,5 @@
|
|
|
1
|
+
---
|
|
2
|
+
description: "Define, run, and report on evaluations for agent reliability and code quality."
|
|
3
|
+
---
|
|
4
|
+
|
|
5
|
+
Use the `sk:eval` skill to define eval criteria before coding (`define`), verify during implementation (`check`), and summarize results after shipping (`report`). Supports code-based, model-based, and human graders with pass@k and pass^k metrics.
|
|
@@ -0,0 +1,5 @@
|
|
|
1
|
+
---
|
|
2
|
+
description: "Run harness self-audit and produce a health scorecard."
|
|
3
|
+
---
|
|
4
|
+
|
|
5
|
+
Use the `sk:health` skill to score your ShipKit setup across 7 categories (Tool Coverage, Context Efficiency, Quality Gates, Memory Persistence, Eval Coverage, Security Guardrails, Cost Efficiency). Produces a 0-70 scorecard with concrete findings and top 3 actions.
|
package/commands/sk/help.md
CHANGED
|
@@ -65,35 +65,55 @@ Requirements change mid-workflow? Run `/sk:change` — it classifies the scope a
|
|
|
65
65
|
|---------|-------------|
|
|
66
66
|
| `/sk:accessibility` | WCAG 2.1 AA audit on frontend code |
|
|
67
67
|
| `/sk:api-design` | Design REST/GraphQL contracts before implementation |
|
|
68
|
-
| `/sk:
|
|
68
|
+
| `/sk:autopilot` | Hands-free workflow — auto-skip, auto-advance, auto-commit |
|
|
69
|
+
| `/sk:brainstorm` | Explore requirements and design (includes search-first research) |
|
|
69
70
|
| `/sk:branch` | Create branch from current task |
|
|
70
|
-
| `/sk:change` | Handle mid-workflow requirement change
|
|
71
|
+
| `/sk:change` | Handle mid-workflow requirement change |
|
|
72
|
+
| `/sk:config` | View and edit project config |
|
|
73
|
+
| `/sk:context` | Load project context (automatic via hooks) |
|
|
74
|
+
| `/sk:context-budget` | Audit context window token consumption and find savings |
|
|
75
|
+
| `/sk:dashboard` | Read-only workflow Kanban board |
|
|
71
76
|
| `/sk:debug` | Structured bug investigation |
|
|
77
|
+
| `/sk:e2e` | E2E behavioral verification |
|
|
78
|
+
| `/sk:eval` | Define, run, and report evals for agent reliability |
|
|
72
79
|
| `/sk:execute-plan` | Implement plan in batches |
|
|
80
|
+
| `/sk:fast-track` | Small changes — skip planning, keep gates |
|
|
73
81
|
| `/sk:features` | Sync docs/sk:features/ specs with codebase |
|
|
74
82
|
| `/sk:finish-feature` | Changelog + PR creation |
|
|
75
|
-
| `/sk:frontend-design` | UI mockup +
|
|
83
|
+
| `/sk:frontend-design` | UI mockup + optional Pencil visual mockup |
|
|
84
|
+
| `/sk:gates` | All quality gates in parallel batches |
|
|
85
|
+
| `/sk:health` | Harness self-audit scorecard (7 categories, 0-70) |
|
|
76
86
|
| `/sk:hotfix` | Emergency fix workflow (skips design/TDD) |
|
|
77
87
|
| `/sk:laravel-init` | Configure existing Laravel project |
|
|
78
88
|
| `/sk:laravel-new` | Scaffold new Laravel project |
|
|
89
|
+
| `/sk:learn` | Extract reusable patterns from sessions |
|
|
79
90
|
| `/sk:lint` | Auto-detect and run all linters |
|
|
91
|
+
| `/sk:mvp` | Generate MVP app from a prompt |
|
|
80
92
|
| `/sk:perf` | Performance audit |
|
|
81
93
|
| `/sk:plan` | Create/refresh task planning files |
|
|
82
|
-
| `/sk:release` |
|
|
83
|
-
| `/sk:
|
|
94
|
+
| `/sk:release` | Version bump + tag (`--android` / `--ios` for store audit) |
|
|
95
|
+
| `/sk:resume-session` | Resume a previously saved session |
|
|
96
|
+
| `/sk:retro` | Post-ship retrospective |
|
|
97
|
+
| `/sk:reverse-doc` | Generate docs from existing code |
|
|
98
|
+
| `/sk:review` | 7-dimension self-review of branch changes |
|
|
99
|
+
| `/sk:safety-guard` | Protect against destructive ops (careful/freeze/guard) |
|
|
100
|
+
| `/sk:save-session` | Save session state for cross-session continuity |
|
|
84
101
|
| `/sk:schema-migrate` | Multi-ORM schema change analysis |
|
|
102
|
+
| `/sk:scope-check` | Detect scope creep mid-implementation |
|
|
85
103
|
| `/sk:security-check` | OWASP security audit |
|
|
104
|
+
| `/sk:seo-audit` | SEO audit for web projects |
|
|
105
|
+
| `/sk:set-profile` | Switch model routing profile |
|
|
86
106
|
| `/sk:setup-claude` | Bootstrap project scaffolding |
|
|
87
|
-
| `/sk:setup-optimizer` |
|
|
107
|
+
| `/sk:setup-optimizer` | Diagnose + update workflow + enrich CLAUDE.md |
|
|
88
108
|
| `/sk:skill-creator` | Create or improve skills |
|
|
89
109
|
| `/sk:smart-commit` | Conventional commit with approval |
|
|
110
|
+
| `/sk:start` | Smart entry point — classifies task, routes to optimal flow |
|
|
90
111
|
| `/sk:status` | Show workflow and task status |
|
|
112
|
+
| `/sk:team` | Parallel domain agents for full-stack tasks |
|
|
91
113
|
| `/sk:test` | Auto-detect and verify all tests pass |
|
|
92
114
|
| `/sk:update-task` | Mark task done, log completion |
|
|
93
115
|
| `/sk:write-plan` | Write plan to `tasks/todo.md` |
|
|
94
116
|
| `/sk:write-tests` | TDD: write failing tests first |
|
|
95
|
-
| `/sk:config` | View and edit project config |
|
|
96
|
-
| `/sk:set-profile` | Switch model routing profile |
|
|
97
117
|
|
|
98
118
|
---
|
|
99
119
|
|
|
@@ -113,9 +133,13 @@ ShipKit routes each skill to the right model automatically. Set once per project
|
|
|
113
133
|
| brainstorm, write-plan, debug, execute-plan, review | opus | opus | sonnet | sonnet |
|
|
114
134
|
| write-tests, frontend-design, api-design, security-check | opus | sonnet | sonnet | sonnet |
|
|
115
135
|
| change | opus | sonnet | sonnet | sonnet |
|
|
136
|
+
| autopilot, team | opus | opus | sonnet | sonnet |
|
|
116
137
|
| perf, schema-migrate, accessibility | opus | sonnet | sonnet | haiku |
|
|
138
|
+
| eval | sonnet | sonnet | sonnet | haiku |
|
|
117
139
|
| lint, test | sonnet | sonnet | haiku | haiku |
|
|
118
140
|
| smart-commit, branch, update-task | haiku | haiku | haiku | haiku |
|
|
141
|
+
| start, learn, context-budget, health | haiku | haiku | haiku | haiku |
|
|
142
|
+
| save-session, resume-session, safety-guard | haiku | haiku | haiku | haiku |
|
|
119
143
|
|
|
120
144
|
`opus` = inherit (uses your current session model).
|
|
121
145
|
Config lives in `.shipkit/config.json` — per project, gitignored by default.
|
|
@@ -0,0 +1,5 @@
|
|
|
1
|
+
---
|
|
2
|
+
description: "Extract reusable patterns from the current session into learned instincts."
|
|
3
|
+
---
|
|
4
|
+
|
|
5
|
+
Use the `sk:learn` skill to analyze the current session for extractable patterns (error resolutions, debugging techniques, workarounds, project conventions). Patterns are saved with confidence scoring and can be promoted from project-scoped to global.
|
|
@@ -0,0 +1,5 @@
|
|
|
1
|
+
---
|
|
2
|
+
description: "Resume a previously saved session with full context restoration."
|
|
3
|
+
---
|
|
4
|
+
|
|
5
|
+
Use the `sk:resume-session` skill to list available saved sessions from `.claude/sessions/`, select one, and restore its context (branch, task state, progress, open questions, next steps) into the current conversation.
|
|
@@ -0,0 +1,5 @@
|
|
|
1
|
+
---
|
|
2
|
+
description: "Protect against destructive operations with careful, freeze, and guard modes."
|
|
3
|
+
---
|
|
4
|
+
|
|
5
|
+
Use the `sk:safety-guard` skill to activate protection modes: `careful` (block destructive commands), `freeze --dir <path>` (lock edits to a directory), `guard --dir <path>` (both), `off` (disable), or `status` (show current mode).
|
|
@@ -0,0 +1,5 @@
|
|
|
1
|
+
---
|
|
2
|
+
description: "Save current session state for cross-session continuity."
|
|
3
|
+
---
|
|
4
|
+
|
|
5
|
+
Use the `sk:save-session` skill to persist the current session state (branch, task, progress, findings, open questions) to `.claude/sessions/` for resumption in a future conversation. Essential for EPIC-scope multi-session workflows.
|
|
@@ -30,6 +30,10 @@ Valid profiles: `full-sail` · `quality` · `balanced` · `budget`
|
|
|
30
30
|
| smart-commit, branch, update-task | haiku | haiku | haiku | haiku |
|
|
31
31
|
| autopilot, team | opus | opus | sonnet | sonnet |
|
|
32
32
|
| start | haiku | haiku | haiku | haiku |
|
|
33
|
+
| learn, context-budget, health | haiku | haiku | haiku | haiku |
|
|
34
|
+
| save-session, resume-session | haiku | haiku | haiku | haiku |
|
|
35
|
+
| safety-guard | haiku | haiku | haiku | haiku |
|
|
36
|
+
| eval | sonnet | sonnet | sonnet | haiku |
|
|
33
37
|
|
|
34
38
|
Note: `opus` = inherit (uses the current session model). Switch to Opus 4.5 in your session to get the full benefit.
|
|
35
39
|
|
|
@@ -70,6 +74,10 @@ Model assignments for this project:
|
|
|
70
74
|
smart-commit, branch, update-task → haiku
|
|
71
75
|
autopilot, team → <model>
|
|
72
76
|
start → haiku
|
|
77
|
+
learn, context-budget, health → haiku
|
|
78
|
+
save-session, resume-session → haiku
|
|
79
|
+
safety-guard → haiku
|
|
80
|
+
eval → <model>
|
|
73
81
|
|
|
74
82
|
Run /sk:config to see all settings or make further changes.
|
|
75
83
|
```
|
package/package.json
CHANGED
|
@@ -74,6 +74,19 @@ digraph brainstorming {
|
|
|
74
74
|
- Only one question per message - if a topic needs more exploration, break it into multiple questions
|
|
75
75
|
- Focus on understanding: purpose, constraints, success criteria
|
|
76
76
|
|
|
77
|
+
**Search-First Research (before proposing approaches):**
|
|
78
|
+
Before proposing custom solutions, check if the problem is already solved:
|
|
79
|
+
1. **Grep codebase** — does similar functionality already exist in this repo?
|
|
80
|
+
2. **Check package registries** — is there a well-maintained package for this? (npm, PyPI, Packagist, crates.io)
|
|
81
|
+
3. **Check existing skills** — does a ShipKit skill or MCP server already handle this?
|
|
82
|
+
|
|
83
|
+
Decision matrix:
|
|
84
|
+
- **Adopt** — existing solution covers 90%+ of requirements → use it directly
|
|
85
|
+
- **Extend** — existing solution covers 60-90% → extend or wrap it
|
|
86
|
+
- **Build custom** — nothing suitable exists → build from scratch (informed by what was found)
|
|
87
|
+
|
|
88
|
+
If a suitable package or existing solution is found, include it as one of the approaches.
|
|
89
|
+
|
|
77
90
|
**Exploring approaches:**
|
|
78
91
|
- Propose 2-3 different approaches with trade-offs
|
|
79
92
|
- Present options conversationally with your recommendation and reasoning
|
|
@@ -0,0 +1,126 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: sk:context-budget
|
|
3
|
+
description: "Audit context window token consumption and find optimization opportunities."
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# /sk:context-budget — Token Consumption Audit
|
|
7
|
+
|
|
8
|
+
Audits all components that consume context window tokens — agents, skills, rules, MCP tools, CLAUDE.md — and identifies optimization opportunities.
|
|
9
|
+
|
|
10
|
+
## Usage
|
|
11
|
+
|
|
12
|
+
```
|
|
13
|
+
/sk:context-budget # standard audit
|
|
14
|
+
/sk:context-budget --verbose # per-file breakdown
|
|
15
|
+
```
|
|
16
|
+
|
|
17
|
+
## Model Routing
|
|
18
|
+
|
|
19
|
+
Read `.shipkit/config.json` from the project root if it exists.
|
|
20
|
+
|
|
21
|
+
| Profile | Model |
|
|
22
|
+
|---------|-------|
|
|
23
|
+
| `full-sail` | haiku |
|
|
24
|
+
| `quality` | haiku |
|
|
25
|
+
| `balanced` | haiku |
|
|
26
|
+
| `budget` | haiku |
|
|
27
|
+
|
|
28
|
+
> Counting and classification is lightweight — haiku is sufficient.
|
|
29
|
+
|
|
30
|
+
## How It Works
|
|
31
|
+
|
|
32
|
+
### Phase 1: Inventory
|
|
33
|
+
|
|
34
|
+
Scan and count token estimates for every loaded component:
|
|
35
|
+
|
|
36
|
+
| Component | Location | Token Estimation |
|
|
37
|
+
|-----------|----------|------------------|
|
|
38
|
+
| CLAUDE.md | `CLAUDE.md` | `words * 1.3` |
|
|
39
|
+
| Global CLAUDE.md | `~/.claude/CLAUDE.md` | `words * 1.3` |
|
|
40
|
+
| Skills | `skills/*/SKILL.md` | `words * 1.3` |
|
|
41
|
+
| Commands | `commands/**/*.md` | `words * 1.3` |
|
|
42
|
+
| Agents | `.claude/agents/*.md` | `words * 1.3` |
|
|
43
|
+
| Rules | `.claude/rules/*.md` | `words * 1.3` |
|
|
44
|
+
| MCP tool schemas | count tools * ~500 tokens each | `tool_count * 500` |
|
|
45
|
+
| Hooks | `.claude/hooks/*.sh` (minimal overhead) | `words * 1.3` |
|
|
46
|
+
|
|
47
|
+
**Token estimation formula:**
|
|
48
|
+
- Prose/markdown: `word_count * 1.3`
|
|
49
|
+
- Code blocks: `char_count / 4`
|
|
50
|
+
- MCP tool schemas: ~500 tokens per tool definition
|
|
51
|
+
|
|
52
|
+
### Phase 2: Classify Usage Frequency
|
|
53
|
+
|
|
54
|
+
For each component, classify how often it's actually needed:
|
|
55
|
+
|
|
56
|
+
| Classification | Meaning | Action |
|
|
57
|
+
|---------------|---------|--------|
|
|
58
|
+
| **Always** | Loaded every session, always relevant | Keep as-is |
|
|
59
|
+
| **Sometimes** | Relevant to specific task types | Consider conditional loading |
|
|
60
|
+
| **Rarely** | Edge case, rarely triggered | Candidate for removal/extraction |
|
|
61
|
+
|
|
62
|
+
Classification heuristics:
|
|
63
|
+
- Skills used in the workflow (brainstorm, write-tests, gates, etc.) → Always
|
|
64
|
+
- Skills triggered by keywords (frontend-design, api-design) → Sometimes
|
|
65
|
+
- Niche skills (seo-audit, schema-migrate) → Rarely
|
|
66
|
+
- MCP tools: if >20 tools on one server → flag as over-subscribed
|
|
67
|
+
|
|
68
|
+
### Phase 3: Detect Issues
|
|
69
|
+
|
|
70
|
+
Flag these common problems:
|
|
71
|
+
|
|
72
|
+
1. **Bloated agents** — agent descriptions >200 lines
|
|
73
|
+
2. **Bloated skills** — skill definitions >400 lines
|
|
74
|
+
3. **Bloated rules** — rule files >100 lines
|
|
75
|
+
4. **MCP over-subscription** — servers with >20 tools (each costs ~500 tokens)
|
|
76
|
+
5. **CLI-wrapping MCPs** — MCP servers that just wrap CLI tools (overhead > benefit)
|
|
77
|
+
6. **Duplicate content** — same instructions in CLAUDE.md AND skill files
|
|
78
|
+
7. **CLAUDE.md bloat** — CLAUDE.md >200 lines (the target)
|
|
79
|
+
8. **Unused components** — skills/agents never referenced in workflow
|
|
80
|
+
|
|
81
|
+
### Phase 4: Report
|
|
82
|
+
|
|
83
|
+
Output a structured report:
|
|
84
|
+
|
|
85
|
+
```
|
|
86
|
+
=== Context Budget Audit ===
|
|
87
|
+
|
|
88
|
+
Component Breakdown:
|
|
89
|
+
CLAUDE.md ~1,200 tokens
|
|
90
|
+
Global CLAUDE.md ~800 tokens
|
|
91
|
+
Skills (42 files) ~18,000 tokens
|
|
92
|
+
Commands (35 files) ~8,000 tokens
|
|
93
|
+
Agents (8 files) ~3,200 tokens
|
|
94
|
+
Rules (5 files) ~1,500 tokens
|
|
95
|
+
MCP tools (3 servers) ~15,000 tokens (30 tools)
|
|
96
|
+
─────────────────────────────────
|
|
97
|
+
Total overhead: ~47,700 tokens
|
|
98
|
+
|
|
99
|
+
Context window: 200,000 tokens
|
|
100
|
+
Overhead: 47,700 tokens (23.8%)
|
|
101
|
+
Available for work: 152,300 tokens
|
|
102
|
+
|
|
103
|
+
Issues Found:
|
|
104
|
+
[HIGH] MCP server "playwright" has 28 tools (~14,000 tokens)
|
|
105
|
+
[MEDIUM] Skill sk:frontend-design is 380 lines (~500 tokens)
|
|
106
|
+
[LOW] Agent perf-auditor has 220 lines (~290 tokens)
|
|
107
|
+
|
|
108
|
+
Top 3 Optimizations:
|
|
109
|
+
1. Remove unused MCP tools from playwright (save ~7,000 tokens)
|
|
110
|
+
2. Consolidate duplicate workflow instructions (save ~1,200 tokens)
|
|
111
|
+
3. Trim agent descriptions to <150 lines (save ~400 tokens)
|
|
112
|
+
|
|
113
|
+
Potential savings: ~8,600 tokens (18% reduction)
|
|
114
|
+
```
|
|
115
|
+
|
|
116
|
+
### --verbose Mode
|
|
117
|
+
|
|
118
|
+
Adds per-file token breakdown:
|
|
119
|
+
|
|
120
|
+
```
|
|
121
|
+
Skills Breakdown:
|
|
122
|
+
sk:autopilot/SKILL.md ~620 tokens
|
|
123
|
+
sk:brainstorming/SKILL.md ~480 tokens
|
|
124
|
+
sk:gates/SKILL.md ~440 tokens
|
|
125
|
+
...
|
|
126
|
+
```
|
|
@@ -0,0 +1,188 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: sk:eval
|
|
3
|
+
description: "Define, run, and report on evaluations for agent reliability and code quality."
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# /sk:eval — Eval-Driven Development
|
|
7
|
+
|
|
8
|
+
A formal evaluation framework for measuring agent reliability and code quality. Define evals before coding, check during implementation, and report after shipping.
|
|
9
|
+
|
|
10
|
+
## Usage
|
|
11
|
+
|
|
12
|
+
```
|
|
13
|
+
/sk:eval define <feature> # create eval definition
|
|
14
|
+
/sk:eval check <feature> # run evals against current state
|
|
15
|
+
/sk:eval report # summary of all eval results
|
|
16
|
+
/sk:eval list # show all defined evals
|
|
17
|
+
```
|
|
18
|
+
|
|
19
|
+
## Model Routing
|
|
20
|
+
|
|
21
|
+
Read `.shipkit/config.json` from the project root if it exists.
|
|
22
|
+
|
|
23
|
+
| Profile | Model |
|
|
24
|
+
|---------|-------|
|
|
25
|
+
| `full-sail` | sonnet |
|
|
26
|
+
| `quality` | sonnet |
|
|
27
|
+
| `balanced` | sonnet |
|
|
28
|
+
| `budget` | haiku |
|
|
29
|
+
|
|
30
|
+
> Eval analysis needs reasoning for model-based graders — sonnet for balanced+.
|
|
31
|
+
|
|
32
|
+
## Eval Types
|
|
33
|
+
|
|
34
|
+
### Capability Evals
|
|
35
|
+
|
|
36
|
+
Test whether Claude can accomplish something new:
|
|
37
|
+
|
|
38
|
+
- "Can it generate a valid migration from a schema description?"
|
|
39
|
+
- "Can it write a test that covers all edge cases?"
|
|
40
|
+
- "Can it refactor without changing behavior?"
|
|
41
|
+
|
|
42
|
+
### Regression Evals
|
|
43
|
+
|
|
44
|
+
Ensure changes don't break existing behavior:
|
|
45
|
+
|
|
46
|
+
- "Does the login flow still work after auth refactor?"
|
|
47
|
+
- "Do all API endpoints still return correct status codes?"
|
|
48
|
+
- "Are all existing tests still passing?"
|
|
49
|
+
|
|
50
|
+
## Grader Types
|
|
51
|
+
|
|
52
|
+
### Code-Based (Deterministic)
|
|
53
|
+
|
|
54
|
+
Graded by running commands — pass/fail:
|
|
55
|
+
|
|
56
|
+
```yaml
|
|
57
|
+
grader: code
|
|
58
|
+
checks:
|
|
59
|
+
- command: "npm test"
|
|
60
|
+
expect: exit_code_0
|
|
61
|
+
- command: "grep -r 'TODO' src/"
|
|
62
|
+
expect: no_output
|
|
63
|
+
- command: "npx tsc --noEmit"
|
|
64
|
+
expect: exit_code_0
|
|
65
|
+
```
|
|
66
|
+
|
|
67
|
+
### Model-Based (LLM-as-Judge)
|
|
68
|
+
|
|
69
|
+
Graded by an LLM against a rubric — scored 1-5:
|
|
70
|
+
|
|
71
|
+
```yaml
|
|
72
|
+
grader: model
|
|
73
|
+
rubric: |
|
|
74
|
+
Score the implementation on:
|
|
75
|
+
1. Correctness — does it solve the stated problem?
|
|
76
|
+
2. Completeness — are all edge cases handled?
|
|
77
|
+
3. Code quality — is it readable and maintainable?
|
|
78
|
+
4. Security — are there any vulnerabilities?
|
|
79
|
+
5. Performance — any obvious inefficiencies?
|
|
80
|
+
threshold: 4.0
|
|
81
|
+
```
|
|
82
|
+
|
|
83
|
+
### Human (Manual Review)
|
|
84
|
+
|
|
85
|
+
Flagged for human review — generates a checklist:
|
|
86
|
+
|
|
87
|
+
```yaml
|
|
88
|
+
grader: human
|
|
89
|
+
checklist:
|
|
90
|
+
- "UI renders correctly on mobile"
|
|
91
|
+
- "Error messages are user-friendly"
|
|
92
|
+
- "Animation feels smooth (60fps)"
|
|
93
|
+
```
|
|
94
|
+
|
|
95
|
+
## Metrics
|
|
96
|
+
|
|
97
|
+
### pass@k
|
|
98
|
+
|
|
99
|
+
At least 1 success in k attempts. Used for capability evals where some variance is expected.
|
|
100
|
+
|
|
101
|
+
```
|
|
102
|
+
pass@3: Run the eval 3 times. Pass if at least 1 succeeds.
|
|
103
|
+
```
|
|
104
|
+
|
|
105
|
+
### pass^k
|
|
106
|
+
|
|
107
|
+
ALL k attempts must succeed. Used for regression evals where consistency is required.
|
|
108
|
+
|
|
109
|
+
```
|
|
110
|
+
pass^3: Run the eval 3 times. Pass only if all 3 succeed.
|
|
111
|
+
```
|
|
112
|
+
|
|
113
|
+
## Storage
|
|
114
|
+
|
|
115
|
+
### Eval Definition
|
|
116
|
+
|
|
117
|
+
Stored in `.claude/evals/[feature].md`:
|
|
118
|
+
|
|
119
|
+
```markdown
|
|
120
|
+
---
|
|
121
|
+
feature: user-authentication
|
|
122
|
+
type: capability
|
|
123
|
+
grader: code
|
|
124
|
+
created: 2026-03-25
|
|
125
|
+
pass_metric: pass@1
|
|
126
|
+
---
|
|
127
|
+
|
|
128
|
+
## Description
|
|
129
|
+
Verify the OAuth2 login flow works end-to-end.
|
|
130
|
+
|
|
131
|
+
## Checks
|
|
132
|
+
- [ ] `npm test -- --grep "auth"` passes
|
|
133
|
+
- [ ] `curl -s localhost:3000/auth/google` returns 302
|
|
134
|
+
- [ ] `grep -r "hardcoded.*secret" src/` returns nothing
|
|
135
|
+
|
|
136
|
+
## History
|
|
137
|
+
| Date | Result | Score | Notes |
|
|
138
|
+
|------|--------|-------|-------|
|
|
139
|
+
```
|
|
140
|
+
|
|
141
|
+
### Eval Results
|
|
142
|
+
|
|
143
|
+
Appended to `.claude/evals/[feature].log`:
|
|
144
|
+
|
|
145
|
+
```
|
|
146
|
+
[2026-03-25T10:30:00Z] PASS — pass@1 (1/1 succeeded)
|
|
147
|
+
check_1: npm test (exit 0) ✓
|
|
148
|
+
check_2: curl auth redirect (302) ✓
|
|
149
|
+
check_3: no hardcoded secrets ✓
|
|
150
|
+
```
|
|
151
|
+
|
|
152
|
+
## Workflow Integration
|
|
153
|
+
|
|
154
|
+
### Before Coding (define)
|
|
155
|
+
|
|
156
|
+
```
|
|
157
|
+
/sk:eval define user-authentication
|
|
158
|
+
```
|
|
159
|
+
|
|
160
|
+
Creates the eval definition with checks derived from the task requirements.
|
|
161
|
+
|
|
162
|
+
### During Implementation (check)
|
|
163
|
+
|
|
164
|
+
```
|
|
165
|
+
/sk:eval check user-authentication
|
|
166
|
+
```
|
|
167
|
+
|
|
168
|
+
Runs all checks and reports pass/fail. Use during step 5 (Write Tests + Implement) to verify progress.
|
|
169
|
+
|
|
170
|
+
### After Shipping (report)
|
|
171
|
+
|
|
172
|
+
```
|
|
173
|
+
/sk:eval report
|
|
174
|
+
```
|
|
175
|
+
|
|
176
|
+
Summary of all evals:
|
|
177
|
+
|
|
178
|
+
```
|
|
179
|
+
=== Eval Report ===
|
|
180
|
+
|
|
181
|
+
user-authentication PASS pass@1 (3 checks, 3 passed)
|
|
182
|
+
api-v2-endpoints PASS pass^3 (5 checks, 5 passed x3)
|
|
183
|
+
queue-reliability FAIL pass@3 (2 checks, 0/3 succeeded)
|
|
184
|
+
|
|
185
|
+
Overall: 2/3 passing (67%)
|
|
186
|
+
|
|
187
|
+
Action: queue-reliability needs investigation
|
|
188
|
+
```
|