claude-code-kit 0.7.0__py3-none-any.whl
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- claude_code_kit-0.7.0.dist-info/METADATA +384 -0
- claude_code_kit-0.7.0.dist-info/RECORD +209 -0
- claude_code_kit-0.7.0.dist-info/WHEEL +4 -0
- claude_code_kit-0.7.0.dist-info/entry_points.txt +4 -0
- claude_code_kit-0.7.0.dist-info/licenses/LICENSE +21 -0
- claude_kit/__init__.py +10 -0
- claude_kit/__main__.py +8 -0
- claude_kit/_payload/agents/acceptance-reviewer.md +60 -0
- claude_kit/_payload/agents/auditor.md +76 -0
- claude_kit/_payload/agents/dependency-scanner.md +84 -0
- claude_kit/_payload/agents/developer.md +187 -0
- claude_kit/_payload/agents/devils-advocate.md +62 -0
- claude_kit/_payload/agents/devops-engineer.md +134 -0
- claude_kit/_payload/agents/e2e-tester.md +152 -0
- claude_kit/_payload/agents/em-reviewer.md +105 -0
- claude_kit/_payload/agents/incident-responder.md +64 -0
- claude_kit/_payload/agents/merge-reviewer.md +194 -0
- claude_kit/_payload/agents/observability-engineer.md +94 -0
- claude_kit/_payload/agents/orchestrator.md +551 -0
- claude_kit/_payload/agents/owasp-reviewer.md +76 -0
- claude_kit/_payload/agents/policy-validator.md +63 -0
- claude_kit/_payload/agents/pr-raiser.md +138 -0
- claude_kit/_payload/agents/risk-classifier.md +50 -0
- claude_kit/_payload/agents/sdlc-code-reviewer.md +196 -0
- claude_kit/_payload/agents/secret-scanner.md +70 -0
- claude_kit/_payload/agents/security-reviewer.md +80 -0
- claude_kit/_payload/agents/senior-backend-dev.md +199 -0
- claude_kit/_payload/agents/senior-frontend-dev.md +181 -0
- claude_kit/_payload/agents/senior-tester.md +206 -0
- claude_kit/_payload/agents/spec-doc-writer.md +331 -0
- claude_kit/_payload/agents/story-planner.md +56 -0
- claude_kit/_payload/agents/technical-architect.md +139 -0
- claude_kit/_payload/agents/tester.md +193 -0
- claude_kit/_payload/agents/ui-designer.md +73 -0
- claude_kit/_payload/agents/unit-tester.md +119 -0
- claude_kit/_payload/catalog/mcp.yaml +54 -0
- claude_kit/_payload/catalog/org.yaml +145 -0
- claude_kit/_payload/catalog/profiles.yaml +96 -0
- claude_kit/_payload/catalog/stacks.yaml +96 -0
- claude_kit/_payload/commands/init.md +36 -0
- claude_kit/_payload/commands/sdlc.md +18 -0
- claude_kit/_payload/commands/status.md +20 -0
- claude_kit/_payload/hooks/hooks.json +58 -0
- claude_kit/_payload/hooks/scripts/audit-log.sh +18 -0
- claude_kit/_payload/hooks/scripts/guard-secrets.sh +26 -0
- claude_kit/_payload/hooks/scripts/lint-fix.sh +38 -0
- claude_kit/_payload/hooks/scripts/load-continuity.sh +32 -0
- claude_kit/_payload/hooks/scripts/load-learnings.sh +40 -0
- claude_kit/_payload/hooks/scripts/type-check.sh +23 -0
- claude_kit/_payload/hooks/scripts/validate-frontmatter.sh +34 -0
- claude_kit/_payload/hooks/scripts/validate-settings.sh +21 -0
- claude_kit/_payload/hooks/scripts/warn-large-edits.sh +24 -0
- claude_kit/_payload/hooks/scripts/warn-missing-tests.sh +24 -0
- claude_kit/_payload/hooks/scripts/warn-sensitive-files.sh +30 -0
- claude_kit/_payload/hooks/scripts/warn-shared-modules.sh +33 -0
- claude_kit/_payload/rules/agent-guardrails.md +83 -0
- claude_kit/_payload/rules/agent-memory.md +106 -0
- claude_kit/_payload/rules/agent-resilience.md +61 -0
- claude_kit/_payload/rules/autonomy-levels.md +30 -0
- claude_kit/_payload/rules/code-organization.md +312 -0
- claude_kit/_payload/rules/continuity.md +84 -0
- claude_kit/_payload/rules/design-patterns.md +422 -0
- claude_kit/_payload/rules/devops-observability.md +57 -0
- claude_kit/_payload/rules/documentation.md +326 -0
- claude_kit/_payload/rules/evals.md +62 -0
- claude_kit/_payload/rules/frontend-best-practices.md +157 -0
- claude_kit/_payload/rules/goal-setting-and-monitoring.md +72 -0
- claude_kit/_payload/rules/human-in-the-loop.md +64 -0
- claude_kit/_payload/rules/linting-and-formatting.md +220 -0
- claude_kit/_payload/rules/mandatory-workflow.md +309 -0
- claude_kit/_payload/rules/model-tiers.md +34 -0
- claude_kit/_payload/rules/quality-gates.md +107 -0
- claude_kit/_payload/rules/rarv-cycle.md +31 -0
- claude_kit/_payload/rules/reasoning-techniques.md +62 -0
- claude_kit/_payload/rules/responsive-and-accessibility.md +353 -0
- claude_kit/_payload/rules/risk-classification.md +36 -0
- claude_kit/_payload/rules/testing.md +417 -0
- claude_kit/_payload/rules/tool-design.md +66 -0
- claude_kit/_payload/skills/_references/accessibility-checklist.md +160 -0
- claude_kit/_payload/skills/_references/orchestration-patterns.md +405 -0
- claude_kit/_payload/skills/_references/performance-checklist.md +153 -0
- claude_kit/_payload/skills/_references/security-checklist.md +134 -0
- claude_kit/_payload/skills/_references/testing-patterns.md +236 -0
- claude_kit/_payload/skills/accessibility-review/SKILL.md +56 -0
- claude_kit/_payload/skills/api-and-interface-design/SKILL.md +294 -0
- claude_kit/_payload/skills/api-integration/SKILL.md +348 -0
- claude_kit/_payload/skills/archive-sprint/SKILL.md +31 -0
- claude_kit/_payload/skills/backlog/SKILL.md +41 -0
- claude_kit/_payload/skills/backlog/item-template.md +20 -0
- claude_kit/_payload/skills/browser-testing-with-devtools/SKILL.md +302 -0
- claude_kit/_payload/skills/ci-cd-and-automation/SKILL.md +402 -0
- claude_kit/_payload/skills/code-review-and-quality/SKILL.md +347 -0
- claude_kit/_payload/skills/code-simplification/SKILL.md +331 -0
- claude_kit/_payload/skills/component-design/SKILL.md +171 -0
- claude_kit/_payload/skills/consolidate-learnings/SKILL.md +55 -0
- claude_kit/_payload/skills/context-engineering/SKILL.md +321 -0
- claude_kit/_payload/skills/debugging-and-error-recovery/SKILL.md +300 -0
- claude_kit/_payload/skills/decision/SKILL.md +46 -0
- claude_kit/_payload/skills/decision/adr-template.md +36 -0
- claude_kit/_payload/skills/deprecation-and-migration/SKILL.md +207 -0
- claude_kit/_payload/skills/documentation-and-adrs/SKILL.md +299 -0
- claude_kit/_payload/skills/doubt-driven-development/SKILL.md +243 -0
- claude_kit/_payload/skills/execute/SKILL.md +27 -0
- claude_kit/_payload/skills/frontend-ui-engineering/SKILL.md +328 -0
- claude_kit/_payload/skills/git-workflow-and-versioning/SKILL.md +300 -0
- claude_kit/_payload/skills/idea-refine/SKILL.md +178 -0
- claude_kit/_payload/skills/idea-refine/examples.md +238 -0
- claude_kit/_payload/skills/idea-refine/frameworks.md +99 -0
- claude_kit/_payload/skills/idea-refine/refinement-criteria.md +113 -0
- claude_kit/_payload/skills/idea-refine/scripts/idea-refine.sh +15 -0
- claude_kit/_payload/skills/incident-postmortem/SKILL.md +74 -0
- claude_kit/_payload/skills/incremental-implementation/SKILL.md +245 -0
- claude_kit/_payload/skills/interview-me/SKILL.md +221 -0
- claude_kit/_payload/skills/load-testing/SKILL.md +83 -0
- claude_kit/_payload/skills/manual-test/SKILL.md +516 -0
- claude_kit/_payload/skills/performance-optimization/SKILL.md +277 -0
- claude_kit/_payload/skills/planning-and-task-breakdown/SKILL.md +223 -0
- claude_kit/_payload/skills/playwright-verification/SKILL.md +205 -0
- claude_kit/_payload/skills/refresh-docs/SKILL.md +63 -0
- claude_kit/_payload/skills/remember/SKILL.md +96 -0
- claude_kit/_payload/skills/scope/SKILL.md +52 -0
- claude_kit/_payload/skills/scope/scope-template.md +82 -0
- claude_kit/_payload/skills/sdlc/SKILL.md +83 -0
- claude_kit/_payload/skills/security-and-hardening/SKILL.md +368 -0
- claude_kit/_payload/skills/security-verification/SKILL.md +209 -0
- claude_kit/_payload/skills/shipping-and-launch/SKILL.md +309 -0
- claude_kit/_payload/skills/smoke-test/SKILL.md +78 -0
- claude_kit/_payload/skills/source-driven-development/SKILL.md +195 -0
- claude_kit/_payload/skills/spec-driven-development/SKILL.md +200 -0
- claude_kit/_payload/skills/sprint/SKILL.md +67 -0
- claude_kit/_payload/skills/sprint/sprint-template.md +90 -0
- claude_kit/_payload/skills/test-driven-development/SKILL.md +383 -0
- claude_kit/_payload/skills/threat-model/SKILL.md +60 -0
- claude_kit/_payload/skills/triage/SKILL.md +87 -0
- claude_kit/_payload/skills/ui-ux-design/SKILL.md +71 -0
- claude_kit/_payload/skills/unit-test/SKILL.md +237 -0
- claude_kit/_payload/skills/using-agent-skills/SKILL.md +180 -0
- claude_kit/_payload/templates/CLAUDE.md +238 -0
- claude_kit/_payload/templates/CLAUDE.stack.md.tmpl +53 -0
- claude_kit/_payload/templates/CONTINUITY.template.md +35 -0
- claude_kit/_payload/templates/README.claude-sdlc.md.tmpl +219 -0
- claude_kit/_payload/templates/agent-memory/MEMORY.md +30 -0
- claude_kit/_payload/templates/agent-memory/api/.gitkeep +0 -0
- claude_kit/_payload/templates/agent-memory/architecture/.gitkeep +0 -0
- claude_kit/_payload/templates/agent-memory/debugging/.gitkeep +0 -0
- claude_kit/_payload/templates/agent-memory/gotchas/.gitkeep +0 -0
- claude_kit/_payload/templates/agent-memory/patterns/.gitkeep +0 -0
- claude_kit/_payload/templates/agent-memory/performance/.gitkeep +0 -0
- claude_kit/_payload/templates/artifacts/adr.md +18 -0
- claude_kit/_payload/templates/artifacts/feature-spec.md +29 -0
- claude_kit/_payload/templates/artifacts/release-plan.md +23 -0
- claude_kit/_payload/templates/artifacts/runbook.md +24 -0
- claude_kit/_payload/templates/artifacts/security-review.md +23 -0
- claude_kit/_payload/templates/artifacts/test-plan.md +22 -0
- claude_kit/_payload/templates/org/README.md +53 -0
- claude_kit/_payload/templates/org/agents/data-workflow-agent.md +59 -0
- claude_kit/_payload/templates/org/agents/founder-prototype-agent.md +61 -0
- claude_kit/_payload/templates/org/agents/internal-tools-builder.md +63 -0
- claude_kit/_payload/templates/org/agents/pm-copilot.md +60 -0
- claude_kit/_payload/templates/org/agents/support-ticket-engineer.md +63 -0
- claude_kit/_payload/templates/org/packs/devops-and-release/README.md +46 -0
- claude_kit/_payload/templates/org/packs/devops-and-release/pack.yaml +32 -0
- claude_kit/_payload/templates/org/packs/engineering-core/README.md +46 -0
- claude_kit/_payload/templates/org/packs/engineering-core/pack.yaml +44 -0
- claude_kit/_payload/templates/org/packs/non-engineer-builder/README.md +53 -0
- claude_kit/_payload/templates/org/packs/non-engineer-builder/pack.yaml +39 -0
- claude_kit/_payload/templates/org/packs/onboarding-and-docs/README.md +49 -0
- claude_kit/_payload/templates/org/packs/onboarding-and-docs/pack.yaml +26 -0
- claude_kit/_payload/templates/org/packs/product-to-code/README.md +50 -0
- claude_kit/_payload/templates/org/packs/product-to-code/pack.yaml +34 -0
- claude_kit/_payload/templates/org/packs/quality-and-review/README.md +53 -0
- claude_kit/_payload/templates/org/packs/quality-and-review/pack.yaml +40 -0
- claude_kit/_payload/templates/org/packs/security-and-compliance/README.md +50 -0
- claude_kit/_payload/templates/org/packs/security-and-compliance/pack.yaml +36 -0
- claude_kit/_payload/templates/org/rules/ai-working-agreement.md +45 -0
- claude_kit/_payload/templates/org/rules/ambiguity-resolution.md +36 -0
- claude_kit/_payload/templates/org/rules/branch-and-pr-policy.md +41 -0
- claude_kit/_payload/templates/org/rules/compliance-policy.md +50 -0
- claude_kit/_payload/templates/org/rules/non-engineer-safe-coding.md +37 -0
- claude_kit/_payload/templates/org/rules/pii-policy.md +46 -0
- claude_kit/_payload/templates/org/rules/production-data-policy.md +35 -0
- claude_kit/_payload/templates/org/rules/prompt-to-task-conversion.md +30 -0
- claude_kit/_payload/templates/org/rules/prototype-boundaries.md +40 -0
- claude_kit/_payload/templates/org/rules/secrets-policy.md +34 -0
- claude_kit/_payload/templates/org/skills/customer-issue-to-fix/SKILL.md +61 -0
- claude_kit/_payload/templates/org/skills/feature-from-idea/SKILL.md +56 -0
- claude_kit/_payload/templates/org/skills/prompt-to-safe-task/SKILL.md +59 -0
- claude_kit/_payload/templates/org/skills/prototype-to-production/SKILL.md +61 -0
- claude_kit/_payload/templates/org/skills/repo-onboarding/SKILL.md +60 -0
- claude_kit/_payload/templates/settings.json +53 -0
- claude_kit/_payload/templates/stacks/backend/python/fastapi/rules/fastapi-patterns.md +64 -0
- claude_kit/_payload/templates/stacks/db/mongodb/agents/migration-specialist.md +61 -0
- claude_kit/_payload/templates/stacks/db/mongodb/agents/mongodb-specialist.md +59 -0
- claude_kit/_payload/templates/stacks/db/mongodb/rules/mongodb-patterns.md +39 -0
- claude_kit/_payload/templates/stacks/db/postgres/agents/db-performance-reviewer.md +66 -0
- claude_kit/_payload/templates/stacks/db/postgres/agents/migration-specialist.md +56 -0
- claude_kit/_payload/templates/stacks/db/postgres/agents/postgres-specialist.md +58 -0
- claude_kit/_payload/templates/stacks/db/postgres/rules/database-performance.md +64 -0
- claude_kit/_payload/templates/stacks/db/postgres/rules/postgres-patterns.md +43 -0
- claude_kit/_payload/templates/stacks/frontend/react/rules/react-patterns.md +63 -0
- claude_kit/catalog.py +476 -0
- claude_kit/cli.py +327 -0
- claude_kit/hooks.py +246 -0
- claude_kit/models.py +205 -0
- claude_kit/prompts.py +209 -0
- claude_kit/render.py +146 -0
- claude_kit/scaffold.py +492 -0
- claude_kit/upgrader.py +294 -0
- claude_kit/validator.py +197 -0
|
@@ -0,0 +1,74 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: incident-postmortem
|
|
3
|
+
description: Write a blameless postmortem after a production incident. Use when an incident is resolved and needs a root-cause writeup, when conducting an RCA, or when the user says "postmortem", "RCA", "writeup the incident", or "what went wrong". Produces a timeline, 5-whys root cause, contributing factors, and tracked action items.
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# Incident Postmortem (blameless)
|
|
7
|
+
|
|
8
|
+
Turn a resolved incident into durable learning. **Blameless** means the writeup attacks the system and process, never a person — "the deploy lacked a migration gate," not "X forgot the migration." Blame kills the honest reporting that prevents the next outage.
|
|
9
|
+
|
|
10
|
+
## When to Use
|
|
11
|
+
|
|
12
|
+
- An incident (SEV1–SEV3) is resolved and needs a writeup.
|
|
13
|
+
- A near-miss worth capturing.
|
|
14
|
+
- The user asks for an RCA / postmortem / "what went wrong."
|
|
15
|
+
|
|
16
|
+
Pairs with the `incident-responder` agent (it runs the incident; this documents it afterward).
|
|
17
|
+
|
|
18
|
+
## Process
|
|
19
|
+
|
|
20
|
+
1. **Reconstruct the timeline** from the incident log (`docs/incidents/...`), git history, deploy records, and the project's structured logs and error-tracking / monitoring tooling. Use UTC; include detection, each action, mitigation, and resolution.
|
|
21
|
+
2. **Find the root cause with 5 Whys** — keep asking "why" past the proximate trigger to the systemic cause. Usually it lands on a missing guardrail/test/alert, not a typo.
|
|
22
|
+
3. **Separate trigger from cause.** The trigger is what set it off; the root cause is why the system allowed it.
|
|
23
|
+
4. **List contributing factors** — what made it worse or slower to detect/fix (no alert, noisy logs, unclear ownership).
|
|
24
|
+
5. **Action items** — each concrete, owned, dated, and ideally a *systemic* fix (a test, an alert, a hook, a gate) so this class of incident can't recur silently. File them to the backlog.
|
|
25
|
+
6. **Promote the lesson** — add a durable entry to `.claude/agent-memory/gotchas/` via `remember` if it's a reusable pitfall (e.g., "migrations must be a separate gated step in prod").
|
|
26
|
+
|
|
27
|
+
## Template — `docs/incidents/{date}-{slug}-postmortem.md`
|
|
28
|
+
|
|
29
|
+
```markdown
|
|
30
|
+
# Postmortem: {title}
|
|
31
|
+
**Date:** {date} · **Severity:** SEV{n} · **Duration:** {detect→resolve} · **Author:** {who}
|
|
32
|
+
|
|
33
|
+
## Summary
|
|
34
|
+
{2–3 sentences: what broke, who was affected, how it was fixed.}
|
|
35
|
+
|
|
36
|
+
## Impact
|
|
37
|
+
- Users/tenants affected: {scope} · Duration: {time} · Data: {none/at-risk/lost}
|
|
38
|
+
- SLA/SLO breached: {which}
|
|
39
|
+
|
|
40
|
+
## Timeline (UTC)
|
|
41
|
+
| Time | Event |
|
|
42
|
+
|------|-------|
|
|
43
|
+
| {ts} | {detection — how we found out} |
|
|
44
|
+
| {ts} | {mitigation} |
|
|
45
|
+
| {ts} | {resolved} |
|
|
46
|
+
|
|
47
|
+
## Root Cause (5 Whys)
|
|
48
|
+
1. Why did it break? …
|
|
49
|
+
2. Why? … 3. Why? … 4. Why? … 5. (systemic) …
|
|
50
|
+
|
|
51
|
+
**Trigger:** {what set it off} · **Root cause:** {why the system allowed it}
|
|
52
|
+
|
|
53
|
+
## Contributing Factors
|
|
54
|
+
- {slow detection / missing alert / unclear ownership / …}
|
|
55
|
+
|
|
56
|
+
## What Went Well / Poorly
|
|
57
|
+
- Well: {fast rollback, good logs}
|
|
58
|
+
- Poorly: {no alert, noisy signal}
|
|
59
|
+
|
|
60
|
+
## Action Items
|
|
61
|
+
| # | Action (systemic fix preferred) | Owner | Due | Type |
|
|
62
|
+
|---|---------------------------------|-------|-----|------|
|
|
63
|
+
| 1 | Add alert for {symptom} | | | detect |
|
|
64
|
+
| 2 | Add {test/hook/gate} so this can't recur silently | | | prevent |
|
|
65
|
+
```
|
|
66
|
+
|
|
67
|
+
## Rules
|
|
68
|
+
|
|
69
|
+
1. **Blameless** — systems and processes, never individuals.
|
|
70
|
+
2. **Every postmortem ends in tracked action items**; a writeup with no follow-ups is theater.
|
|
71
|
+
3. Prefer **systemic** fixes (alert, test, hook, gate) over "be more careful."
|
|
72
|
+
4. Keep it honest about detection/response gaps — that's where the value is.
|
|
73
|
+
|
|
74
|
+
> Adapted from a portfolio project's incident skill; generalized to be stack-agnostic for claude-kit.
|
|
@@ -0,0 +1,245 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: incremental-implementation
|
|
3
|
+
description: Delivers changes incrementally. Use when implementing any feature or change that touches more than one file. Use when you're about to write a large amount of code at once, or when a task feels too big to land in one step.
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# Incremental Implementation
|
|
7
|
+
|
|
8
|
+
## Overview
|
|
9
|
+
|
|
10
|
+
Build in thin vertical slices — implement one piece, test it, verify it, then expand. Avoid implementing an entire feature in one pass. Each increment should leave the system in a working, testable state. This is the execution discipline that makes large features manageable.
|
|
11
|
+
|
|
12
|
+
## When to Use
|
|
13
|
+
|
|
14
|
+
- Implementing any multi-file change
|
|
15
|
+
- Building a new feature from a task breakdown
|
|
16
|
+
- Refactoring existing code
|
|
17
|
+
- Any time you're tempted to write more than ~100 lines before testing
|
|
18
|
+
|
|
19
|
+
**When NOT to use:** Single-file, single-function changes where the scope is already minimal.
|
|
20
|
+
|
|
21
|
+
## The Increment Cycle
|
|
22
|
+
|
|
23
|
+
```
|
|
24
|
+
┌──────────────────────────────────────┐
|
|
25
|
+
│ │
|
|
26
|
+
│ Implement ──→ Test ──→ Verify ──┐ │
|
|
27
|
+
│ ▲ │ │
|
|
28
|
+
│ └───── Commit ◄─────────────┘ │
|
|
29
|
+
│ │ │
|
|
30
|
+
│ ▼ │
|
|
31
|
+
│ Next slice │
|
|
32
|
+
│ │
|
|
33
|
+
└──────────────────────────────────────┘
|
|
34
|
+
```
|
|
35
|
+
|
|
36
|
+
For each slice:
|
|
37
|
+
|
|
38
|
+
1. **Implement** the smallest complete piece of functionality
|
|
39
|
+
2. **Test** — run the project's test suite (or write a test if none exists)
|
|
40
|
+
3. **Verify** — confirm the slice works as expected (tests pass, build succeeds, manual check)
|
|
41
|
+
4. **Commit** -- save your progress with a descriptive message (see `git-workflow-and-versioning` for atomic commit guidance)
|
|
42
|
+
5. **Move to the next slice** — carry forward, don't restart
|
|
43
|
+
|
|
44
|
+
## Slicing Strategies
|
|
45
|
+
|
|
46
|
+
### Vertical Slices (Preferred)
|
|
47
|
+
|
|
48
|
+
Build one complete path through the stack:
|
|
49
|
+
|
|
50
|
+
```
|
|
51
|
+
Slice 1: Create a task (DB + API + basic UI)
|
|
52
|
+
→ Tests pass, user can create a task via the UI
|
|
53
|
+
|
|
54
|
+
Slice 2: List tasks (query + API + UI)
|
|
55
|
+
→ Tests pass, user can see their tasks
|
|
56
|
+
|
|
57
|
+
Slice 3: Edit a task (update + API + UI)
|
|
58
|
+
→ Tests pass, user can modify tasks
|
|
59
|
+
|
|
60
|
+
Slice 4: Delete a task (delete + API + UI + confirmation)
|
|
61
|
+
→ Tests pass, full CRUD complete
|
|
62
|
+
```
|
|
63
|
+
|
|
64
|
+
Each slice delivers working end-to-end functionality.
|
|
65
|
+
|
|
66
|
+
### Contract-First Slicing
|
|
67
|
+
|
|
68
|
+
When backend and frontend need to develop in parallel:
|
|
69
|
+
|
|
70
|
+
```
|
|
71
|
+
Slice 0: Define the API contract (types, interfaces, API spec)
|
|
72
|
+
Slice 1a: Implement backend against the contract + API tests
|
|
73
|
+
Slice 1b: Implement frontend against mock data matching the contract
|
|
74
|
+
Slice 2: Integrate and test end-to-end
|
|
75
|
+
```
|
|
76
|
+
|
|
77
|
+
### Risk-First Slicing
|
|
78
|
+
|
|
79
|
+
Tackle the riskiest or most uncertain piece first:
|
|
80
|
+
|
|
81
|
+
```
|
|
82
|
+
Slice 1: Prove the WebSocket connection works (highest risk)
|
|
83
|
+
Slice 2: Build real-time task updates on the proven connection
|
|
84
|
+
Slice 3: Add offline support and reconnection
|
|
85
|
+
```
|
|
86
|
+
|
|
87
|
+
If Slice 1 fails, you discover it before investing in Slices 2 and 3.
|
|
88
|
+
|
|
89
|
+
## Implementation Rules
|
|
90
|
+
|
|
91
|
+
### Rule 0: Simplicity First
|
|
92
|
+
|
|
93
|
+
Before writing any code, ask: "What is the simplest thing that could work?"
|
|
94
|
+
|
|
95
|
+
After writing code, review it against these checks:
|
|
96
|
+
- Can this be done in fewer lines?
|
|
97
|
+
- Are these abstractions earning their complexity?
|
|
98
|
+
- Would a staff engineer look at this and say "why didn't you just..."?
|
|
99
|
+
- Am I building for hypothetical future requirements, or the current task?
|
|
100
|
+
|
|
101
|
+
```
|
|
102
|
+
SIMPLICITY CHECK:
|
|
103
|
+
✗ Generic EventBus with middleware pipeline for one notification
|
|
104
|
+
✓ Simple function call
|
|
105
|
+
|
|
106
|
+
✗ Abstract factory pattern for two similar components
|
|
107
|
+
✓ Two straightforward components with shared utilities
|
|
108
|
+
|
|
109
|
+
✗ Config-driven form builder for three forms
|
|
110
|
+
✓ Three form components
|
|
111
|
+
```
|
|
112
|
+
|
|
113
|
+
Three similar lines of code is better than a premature abstraction. Implement the naive, obviously-correct version first. Optimize only after correctness is proven with tests.
|
|
114
|
+
|
|
115
|
+
### Rule 0.5: Scope Discipline
|
|
116
|
+
|
|
117
|
+
Touch only what the task requires.
|
|
118
|
+
|
|
119
|
+
Do NOT:
|
|
120
|
+
- "Clean up" code adjacent to your change
|
|
121
|
+
- Refactor imports in files you're not modifying
|
|
122
|
+
- Remove comments you don't fully understand
|
|
123
|
+
- Add features not in the spec because they "seem useful"
|
|
124
|
+
- Modernize syntax in files you're only reading
|
|
125
|
+
|
|
126
|
+
If you notice something worth improving outside your task scope, note it — don't fix it:
|
|
127
|
+
|
|
128
|
+
```
|
|
129
|
+
NOTICED BUT NOT TOUCHING:
|
|
130
|
+
- src/utils/format.ts has an unused import (unrelated to this task)
|
|
131
|
+
- The auth middleware could use better error messages (separate task)
|
|
132
|
+
→ Want me to create tasks for these?
|
|
133
|
+
```
|
|
134
|
+
|
|
135
|
+
### Rule 1: One Thing at a Time
|
|
136
|
+
|
|
137
|
+
Each increment changes one logical thing. Don't mix concerns:
|
|
138
|
+
|
|
139
|
+
**Bad:** One commit that adds a new component, refactors an existing one, and updates the build config.
|
|
140
|
+
|
|
141
|
+
**Good:** Three separate commits — one for each change.
|
|
142
|
+
|
|
143
|
+
### Rule 2: Keep It Compilable
|
|
144
|
+
|
|
145
|
+
After each increment, the project must build and existing tests must pass. Don't leave the codebase in a broken state between slices.
|
|
146
|
+
|
|
147
|
+
### Rule 3: Feature Flags for Incomplete Features
|
|
148
|
+
|
|
149
|
+
If a feature isn't ready for users but you need to merge increments:
|
|
150
|
+
|
|
151
|
+
```typescript
|
|
152
|
+
// Feature flag for work-in-progress
|
|
153
|
+
const ENABLE_TASK_SHARING = process.env.FEATURE_TASK_SHARING === 'true';
|
|
154
|
+
|
|
155
|
+
if (ENABLE_TASK_SHARING) {
|
|
156
|
+
// New sharing UI
|
|
157
|
+
}
|
|
158
|
+
```
|
|
159
|
+
|
|
160
|
+
This lets you merge small increments to the main branch without exposing incomplete work.
|
|
161
|
+
|
|
162
|
+
### Rule 4: Safe Defaults
|
|
163
|
+
|
|
164
|
+
New code should default to safe, conservative behavior:
|
|
165
|
+
|
|
166
|
+
```typescript
|
|
167
|
+
// Safe: disabled by default, opt-in
|
|
168
|
+
export function createTask(data: TaskInput, options?: { notify?: boolean }) {
|
|
169
|
+
const shouldNotify = options?.notify ?? false;
|
|
170
|
+
// ...
|
|
171
|
+
}
|
|
172
|
+
```
|
|
173
|
+
|
|
174
|
+
### Rule 5: Rollback-Friendly
|
|
175
|
+
|
|
176
|
+
Each increment should be independently revertable:
|
|
177
|
+
|
|
178
|
+
- Additive changes (new files, new functions) are easy to revert
|
|
179
|
+
- Modifications to existing code should be minimal and focused
|
|
180
|
+
- Database migrations should have corresponding rollback migrations
|
|
181
|
+
- Avoid deleting something in one commit and replacing it in the same commit — separate them
|
|
182
|
+
|
|
183
|
+
## Working with Agents
|
|
184
|
+
|
|
185
|
+
When directing an agent to implement incrementally:
|
|
186
|
+
|
|
187
|
+
```
|
|
188
|
+
"Let's implement Task 3 from the plan.
|
|
189
|
+
|
|
190
|
+
Start with just the database schema change and the API endpoint.
|
|
191
|
+
Don't touch the UI yet — we'll do that in the next increment.
|
|
192
|
+
|
|
193
|
+
After implementing, run the project's tests and build to verify
|
|
194
|
+
nothing is broken."
|
|
195
|
+
```
|
|
196
|
+
|
|
197
|
+
Be explicit about what's in scope and what's NOT in scope for each increment.
|
|
198
|
+
|
|
199
|
+
## Increment Checklist
|
|
200
|
+
|
|
201
|
+
After each increment, verify:
|
|
202
|
+
|
|
203
|
+
- [ ] The change does one thing and does it completely
|
|
204
|
+
- [ ] All existing tests still pass (run the project's test runner)
|
|
205
|
+
- [ ] The build succeeds (run the project's build)
|
|
206
|
+
- [ ] Type checking passes (run the project's type checker)
|
|
207
|
+
- [ ] Linting passes (run the project's linter)
|
|
208
|
+
- [ ] The new functionality works as expected
|
|
209
|
+
- [ ] The change is committed with a descriptive message
|
|
210
|
+
|
|
211
|
+
**Note:** Run each verification command after a change that could affect it. After a successful run, don't repeat the same command unless the code has changed since — re-running on unchanged code adds no information.
|
|
212
|
+
|
|
213
|
+
## Common Rationalizations
|
|
214
|
+
|
|
215
|
+
| Rationalization | Reality |
|
|
216
|
+
|---|---|
|
|
217
|
+
| "I'll test it all at the end" | Bugs compound. A bug in Slice 1 makes Slices 2-5 wrong. Test each slice. |
|
|
218
|
+
| "It's faster to do it all at once" | It *feels* faster until something breaks and you can't find which of 500 changed lines caused it. |
|
|
219
|
+
| "These changes are too small to commit separately" | Small commits are free. Large commits hide bugs and make rollbacks painful. |
|
|
220
|
+
| "I'll add the feature flag later" | If the feature isn't complete, it shouldn't be user-visible. Add the flag now. |
|
|
221
|
+
| "This refactor is small enough to include" | Refactors mixed with features make both harder to review and debug. Separate them. |
|
|
222
|
+
| "Let me run the build command again just to be sure" | After a successful run, repeating the same command adds nothing unless the code has changed since. Run it again after subsequent edits, not as reassurance. |
|
|
223
|
+
|
|
224
|
+
## Red Flags
|
|
225
|
+
|
|
226
|
+
- More than 100 lines of code written without running tests
|
|
227
|
+
- Multiple unrelated changes in a single increment
|
|
228
|
+
- "Let me just quickly add this too" scope expansion
|
|
229
|
+
- Skipping the test/verify step to move faster
|
|
230
|
+
- Build or tests broken between increments
|
|
231
|
+
- Large uncommitted changes accumulating
|
|
232
|
+
- Building abstractions before the third use case demands it
|
|
233
|
+
- Touching files outside the task scope "while I'm here"
|
|
234
|
+
- Creating new utility files for one-time operations
|
|
235
|
+
- Running the same build/test command twice in a row without any intervening code change
|
|
236
|
+
|
|
237
|
+
## Verification
|
|
238
|
+
|
|
239
|
+
After completing all increments for a task:
|
|
240
|
+
|
|
241
|
+
- [ ] Each increment was individually tested and committed
|
|
242
|
+
- [ ] The full test suite passes
|
|
243
|
+
- [ ] The build is clean
|
|
244
|
+
- [ ] The feature works end-to-end as specified
|
|
245
|
+
- [ ] No uncommitted changes remain
|
|
@@ -0,0 +1,221 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: interview-me
|
|
3
|
+
description: Extracts what the user actually wants instead of what they think they should want. Achieves this through one-question-at-a-time interview until ~95% confidence about the underlying intent. Use when an ask is underspecified ("build me X" without "for whom" or "why now"), when the user explicitly invokes ("interview me", "grill me", "are we sure?", "stress-test my thinking"), or when you catch yourself silently filling in ambiguous requirements before any plan, spec, or code exists.
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# Interview Me
|
|
7
|
+
|
|
8
|
+
## Overview
|
|
9
|
+
|
|
10
|
+
What people ask for and what they actually want are different things. They ask for "a dashboard" because that's what one asks for, not because a dashboard solves their problem. They say "make it faster" without a number to hit.
|
|
11
|
+
|
|
12
|
+
The cheapest moment to find this gap is before any plan, spec, or code exists. Once you've started building, switching costs are real, and the user will rationalize the wrong thing into a "good enough" thing. The misfit gets locked in.
|
|
13
|
+
|
|
14
|
+
This skill closes the gap before it costs anything. The other Define-phase skills assume you already know roughly what you want: `idea-refine` generates variations from an idea, `spec-driven-development` writes the requirements down, `doubt-driven-development` stress-tests a plan after you've drafted one. Interview-me is the part before all of those, where you ask one question at a time, with your best guess attached, until you can predict what the user is going to say before they say it.
|
|
15
|
+
|
|
16
|
+
## When to Use
|
|
17
|
+
|
|
18
|
+
Apply this skill when:
|
|
19
|
+
|
|
20
|
+
- The ask is missing at least one of: **who** the user is, **why** they want it, what **success** looks like, what the binding **constraint** is
|
|
21
|
+
- The request is conventional rather than specific ("build me X", "make it faster") and you can't unpack the convention without guessing
|
|
22
|
+
- You're tempted to start with assumptions you haven't surfaced
|
|
23
|
+
- The user hasn't said which value they're optimizing for when two reasonable ones are in tension (simplicity vs. flexibility, cost vs. speed)
|
|
24
|
+
- The user explicitly invokes: "interview me", "grill me", "before we start, are we sure?", "stress-test my thinking"
|
|
25
|
+
|
|
26
|
+
**When NOT to use:**
|
|
27
|
+
|
|
28
|
+
- The ask is unambiguous and self-contained ("rename this variable", "fix this typo")
|
|
29
|
+
- The user has explicitly asked for speed over verification
|
|
30
|
+
- Pure information requests ("how does X work?", "what does this code do?")
|
|
31
|
+
- Mechanical operations (renames, formats, file moves)
|
|
32
|
+
- You already have ≥95% confidence; re-read the stop condition below before assuming you don't
|
|
33
|
+
|
|
34
|
+
## Loading Constraints
|
|
35
|
+
|
|
36
|
+
This skill needs a live, responsive user. **Do not invoke in non-interactive contexts** like CI pipelines, scheduled runs, `/loop`, or autonomous-loop. If you're in one of those and the ask is underspecified, flag that as a blocker for the user instead of guessing.
|
|
37
|
+
|
|
38
|
+
## The Process
|
|
39
|
+
|
|
40
|
+
### Step 1: Hypothesize, with a confidence number
|
|
41
|
+
|
|
42
|
+
Before asking anything, write down your current best read of what the user wants in **one sentence**, plus an honest confidence number (0–100%):
|
|
43
|
+
|
|
44
|
+
```
|
|
45
|
+
HYPOTHESIS: You want a way to answer "how are we doing?" in standup, and "dashboard" was the convention that came to mind.
|
|
46
|
+
CONFIDENCE: ~30%
|
|
47
|
+
```
|
|
48
|
+
|
|
49
|
+
The number forces honesty. If you wrote down a high number but can't actually predict the user's reactions to the next three questions you'd ask, the number is wrong. Start at the confidence level you can defend.
|
|
50
|
+
|
|
51
|
+
### Step 2: Ask one question at a time, each with a guess attached
|
|
52
|
+
|
|
53
|
+
Format:
|
|
54
|
+
|
|
55
|
+
```
|
|
56
|
+
Q: <one focused question>
|
|
57
|
+
GUESS: <your hypothesis for the answer, with the reasoning that produced it>
|
|
58
|
+
```
|
|
59
|
+
|
|
60
|
+
Wait for the user to react before asking the next question.
|
|
61
|
+
|
|
62
|
+
**Why one at a time, not a batch:**
|
|
63
|
+
|
|
64
|
+
- The user can't react to your hypotheses if you bury them in a list
|
|
65
|
+
- Batches encourage skim-reading and surface answers
|
|
66
|
+
- The third question often depends on the answer to the first; asking them all at once locks in the wrong framing
|
|
67
|
+
- The user's energy for thinking carefully is finite; spend it one question at a time
|
|
68
|
+
|
|
69
|
+
**Why attach a guess:**
|
|
70
|
+
|
|
71
|
+
- The user reacts faster to a wrong guess than they generate an answer from scratch
|
|
72
|
+
- It commits you to a hypothesis you can be visibly wrong about, which keeps you honest
|
|
73
|
+
- It surfaces *your* assumptions, which is what the interview is meant to expose
|
|
74
|
+
|
|
75
|
+
The risk here is a polite user agreeing with your guess to be agreeable. Mitigate by being visibly willing to be wrong, and occasionally guess in a direction you expect the user to push back on.
|
|
76
|
+
|
|
77
|
+
### Step 3: Listen for "want vs. should want"
|
|
78
|
+
|
|
79
|
+
The most dangerous answers are the ones where the user says what a thoughtful answer *sounds like* rather than what they actually want. Watch for:
|
|
80
|
+
|
|
81
|
+
- Answers that pattern-match best-practice talk ("I want it to be scalable", "clean architecture") without specifics
|
|
82
|
+
- Answers that defer to convention ("the way most apps do it", "the standard approach")
|
|
83
|
+
- Phrases like "I should probably…", "I think I'm supposed to…", "good engineering practice says…"
|
|
84
|
+
- Buzzwords as goals — when "modern", "scalable", "robust" are the answer instead of a specific outcome
|
|
85
|
+
|
|
86
|
+
When you hear these, the question to ask is:
|
|
87
|
+
|
|
88
|
+
> *"If you didn't have to justify this to anyone, what would you actually want?"*
|
|
89
|
+
|
|
90
|
+
That single question often does more work than the previous five.
|
|
91
|
+
|
|
92
|
+
### Step 4: Restate intent in the user's own words
|
|
93
|
+
|
|
94
|
+
When your confidence is high, write back what you now think the user wants. Keep it tight (5–8 lines), use their language where possible, and structure it so the user can confirm or correct line by line:
|
|
95
|
+
|
|
96
|
+
```
|
|
97
|
+
Here's what I now think you want:
|
|
98
|
+
|
|
99
|
+
- Outcome: <one line>
|
|
100
|
+
- User: <one line — who benefits>
|
|
101
|
+
- Why now: <one line — what changed>
|
|
102
|
+
- Success: <one line — how we know it worked>
|
|
103
|
+
- Constraint: <one line — the binding limit>
|
|
104
|
+
- Out of scope: <one line — what we're explicitly not doing>
|
|
105
|
+
|
|
106
|
+
Yes / no / refine?
|
|
107
|
+
```
|
|
108
|
+
|
|
109
|
+
Including "Out of scope" is non-negotiable. Half of misalignment is silent disagreement about what is *not* being built.
|
|
110
|
+
|
|
111
|
+
### Step 5: Confirm — explicit yes, not "whatever you think"
|
|
112
|
+
|
|
113
|
+
The gate is an explicit "yes." The following are **not** yes:
|
|
114
|
+
|
|
115
|
+
- "Whatever you think is best." → The user is delegating, which means they don't have 95% confidence either. Re-ask with two concrete options framed as a choice.
|
|
116
|
+
- "Sounds good." → Ambiguous. Ask: "Anything you'd refine?" Silence isn't confirmation.
|
|
117
|
+
- "Sure, let's go." → Often a polite exit, not an endorsement. Same follow-up.
|
|
118
|
+
- Silence followed by "okay let's start." → The user has given up on the interview, not converged. Stop and ask whether you've missed something.
|
|
119
|
+
|
|
120
|
+
If they correct you, fold the correction in and restate. Loop until you get an explicit yes.
|
|
121
|
+
|
|
122
|
+
### The 95% Confidence Stop
|
|
123
|
+
|
|
124
|
+
You're done when you can answer yes to this:
|
|
125
|
+
|
|
126
|
+
> *Can I predict the user's reaction to the next three questions I would ask?*
|
|
127
|
+
|
|
128
|
+
If yes, you have shared understanding. Stop interviewing and produce the restate. If no, you're not done; ask the next question.
|
|
129
|
+
|
|
130
|
+
This is a checkable test, not a vibe. It also has a floor: if you've gone several rounds and still can't predict, that's information about the ask, not a reason to keep grinding. Stop and tell the user: "I've asked X questions and I still can't predict your reactions. Something foundational is missing. Want to step back?"
|
|
131
|
+
|
|
132
|
+
## Output
|
|
133
|
+
|
|
134
|
+
The output of this skill is a **confirmed statement of intent**: the restate from Step 4, with an explicit yes from Step 5. That's the deliverable. Specs, plans, and task lists are downstream; they consume the intent this skill produces.
|
|
135
|
+
|
|
136
|
+
If the user wants the intent to persist (a multi-session project, a handoff to another collaborator), offer to save it to `docs/intent/[topic].md`. Only save if they confirm.
|
|
137
|
+
|
|
138
|
+
## Example
|
|
139
|
+
|
|
140
|
+
A short before-and-after.
|
|
141
|
+
|
|
142
|
+
**Without `interview-me`:**
|
|
143
|
+
|
|
144
|
+
```
|
|
145
|
+
User: build me a dashboard for our metrics
|
|
146
|
+
Agent: <starts proposing chart libraries and layouts>
|
|
147
|
+
```
|
|
148
|
+
|
|
149
|
+
The agent has assumed: who it's for, what metrics, what "for our team" means, what success looks like. None of these are stated. Every assumption is a place the user can later say "that's not what I meant."
|
|
150
|
+
|
|
151
|
+
**With `interview-me`:**
|
|
152
|
+
|
|
153
|
+
```
|
|
154
|
+
User: build me a dashboard for our metrics
|
|
155
|
+
|
|
156
|
+
Agent:
|
|
157
|
+
HYPOTHESIS: You want a way to answer "how are we doing?" in standup, and "dashboard" came to mind because that's the conventional answer.
|
|
158
|
+
CONFIDENCE: ~30%
|
|
159
|
+
|
|
160
|
+
Q: When you say "how are we doing?", who's asking — you alone, the engineering team in standup, or up the chain (e.g., your manager, an exec, a board)?
|
|
161
|
+
GUESS: engineering team in standup, because "we" usually scopes that way and standups are where this question gets asked. If it's for execs the metrics and the framing change a lot.
|
|
162
|
+
|
|
163
|
+
User: It's actually for me. I keep losing track of which experiments are running and what their early signal looks like.
|
|
164
|
+
|
|
165
|
+
Agent:
|
|
166
|
+
Updated read: not a team dashboard at all. It's a personal experiment tracker. Confidence: ~60%.
|
|
167
|
+
|
|
168
|
+
Q: Is the gap that you don't know which experiments exist, or that you can't see their results in one place?
|
|
169
|
+
GUESS: the second one. You have a list of experiments somewhere, but their results live in five different tools, and reconciling them by hand is what you're tired of.
|
|
170
|
+
|
|
171
|
+
User: First one actually. I literally don't have a list. They're spread across different docs.
|
|
172
|
+
|
|
173
|
+
Agent: <continues>
|
|
174
|
+
```
|
|
175
|
+
|
|
176
|
+
Two questions in, the agent has discovered the actual ask isn't "a dashboard." It's "a list." Different artifact, different scope, different work. The dashboard would have been wrong.
|
|
177
|
+
|
|
178
|
+
## Interaction with Other Skills
|
|
179
|
+
|
|
180
|
+
- **`idea-refine`**: downstream. If the confirmed intent is "I want X but I don't know how to scope it," hand off to `idea-refine` to generate variations against the now-explicit intent.
|
|
181
|
+
- **`spec-driven-development`**: downstream. If the confirmed intent is concrete ("I want X for Y users with Z success criteria"), hand off to `spec-driven-development` to write it down.
|
|
182
|
+
- **`planning-and-task-breakdown`**: two hops downstream of this skill (after the spec).
|
|
183
|
+
- **`doubt-driven-development`**: opposite end of the timeline. Interview-me is pre-decision intent extraction; doubt-driven is post-decision artifact review. Both catch divergence, but at different moments.
|
|
184
|
+
- **`source-driven-development`**: orthogonal. Interview-me clarifies what the user wants; SDD verifies framework facts. They don't compete.
|
|
185
|
+
|
|
186
|
+
## Common Rationalizations
|
|
187
|
+
|
|
188
|
+
| Rationalization | Reality |
|
|
189
|
+
|---|---|
|
|
190
|
+
| "The ask is clear enough" | If you can't write the user's desired outcome in one sentence right now, the ask isn't clear. Run Step 1 before deciding. |
|
|
191
|
+
| "Asking too many questions wastes their time" | Time wasted by 4–6 targeted questions is small. Time wasted by building the wrong thing is enormous, and the user is the one bearing that cost. |
|
|
192
|
+
| "I'll figure it out as I build" | Switching costs after code exists are 10x what they are now. Discovery during implementation is rework. |
|
|
193
|
+
| "They said 'whatever you think,' so I should just decide" | "Whatever you think" is delegation, not decision. Re-ask with two concrete options as a choice. |
|
|
194
|
+
| "I should give them several options to pick from" | Options work when the user knows what they want and is choosing between trade-offs. They don't know what they want yet. Listing options widens the search; asking narrows it. |
|
|
195
|
+
| "If I attach my guess, I'm leading them" | Leading is the point. Reacting is faster than generating from scratch. The risk is sycophancy, not leading; mitigate by being visibly willing to be wrong. |
|
|
196
|
+
| "We've talked enough, I get it" | Test it: can you predict their reaction to the next three questions? If not, you don't get it yet. |
|
|
197
|
+
| "The user said yes, we're done" | If the yes followed a vague restate or an open-ended "sounds good," the yes is hollow. Restate concretely and re-confirm. |
|
|
198
|
+
|
|
199
|
+
## Red Flags
|
|
200
|
+
|
|
201
|
+
- Three or more questions in a single message: that's batching, not interviewing
|
|
202
|
+
- A question without your hypothesis attached: that's surveying, not committing
|
|
203
|
+
- Accepting "whatever you think is best" as a terminal answer
|
|
204
|
+
- Producing a spec, plan, or task list before the user has explicitly confirmed your restate
|
|
205
|
+
- Questions framed as "what would be best practice?" instead of "what do you actually want?"
|
|
206
|
+
- The user gives a sophistication-signaling answer ("scalable", "clean", "modern") and you accept it without probing whether it's what they actually want
|
|
207
|
+
- Three or more rounds without your confidence visibly rising: you're asking the wrong questions, step back and reframe
|
|
208
|
+
- Saving the intent doc before the user has confirmed (the doc itself implies a yes the user didn't give)
|
|
209
|
+
- Skipping the "Out of scope" line in the restate (silent disagreement about non-goals is half of misalignment)
|
|
210
|
+
|
|
211
|
+
## Verification
|
|
212
|
+
|
|
213
|
+
After applying interview-me:
|
|
214
|
+
|
|
215
|
+
- [ ] An explicit hypothesis with a confidence number was stated in the first turn
|
|
216
|
+
- [ ] Questions were asked one at a time, each with the agent's guess attached
|
|
217
|
+
- [ ] At least one "what would you actually want if you didn't have to justify it?" probe ran when the user gave a sophistication-signaling or convention-signaling answer
|
|
218
|
+
- [ ] A concrete restate (Outcome / User / Why now / Success / Constraint / Out of scope) was written back to the user
|
|
219
|
+
- [ ] The user confirmed the restate with an explicit yes (not "whatever you think," not "sounds good," not silence)
|
|
220
|
+
- [ ] At the stop point, the agent could predict reactions to the next three questions it would ask
|
|
221
|
+
- [ ] Any handoff to a downstream skill (`idea-refine`, `spec-driven-development`) was framed in terms of the confirmed intent, not the original underspecified ask
|
|
@@ -0,0 +1,83 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: load-testing
|
|
3
|
+
description: Load and stress test API endpoints under concurrency. Use when measuring throughput/latency under load, validating an SLO before launch, sizing the connection pool, or hunting a performance cliff. Covers ramp profiles, thresholds (p95/p99 latency, error rate), authenticated + rate-limited endpoints, and reading results. Distinct from frontend performance-optimization (web vitals/bundles).
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# Load Testing
|
|
7
|
+
|
|
8
|
+
Measure how a service behaves under concurrency — before users find the cliff. This is the missing
|
|
9
|
+
half of the kit's performance story: `performance-optimization` covers the **frontend** (web vitals,
|
|
10
|
+
bundles, rendering); this skill drives **concurrent load against an API** to validate the SLOs that
|
|
11
|
+
`.claude/rules/devops-observability.md` and the `observability-engineer` define. Most cliffs are
|
|
12
|
+
data-layer bound, so if your stack ships a DB-performance reviewer or rule, cross-check it.
|
|
13
|
+
|
|
14
|
+
## When to Use
|
|
15
|
+
|
|
16
|
+
- Before a launch, or after a change to a hot endpoint.
|
|
17
|
+
- To validate an SLO ("p95 `POST /api/...` < 200ms at 50 rps").
|
|
18
|
+
- To size the service's connection-pool and worker settings.
|
|
19
|
+
- To reproduce a latency/timeout complaint under controlled load.
|
|
20
|
+
|
|
21
|
+
## Tool
|
|
22
|
+
|
|
23
|
+
Use whatever load tool the project standardizes on — **k6** (lightweight, scriptable, good
|
|
24
|
+
thresholds) and **Locust** (scenarios in Python) are common choices. The methodology below is
|
|
25
|
+
tool-agnostic; the example uses k6. Adapt the routes, auth, and tool to your stack.
|
|
26
|
+
|
|
27
|
+
```javascript
|
|
28
|
+
// example (k6): k6 run load/login-and-list.js — adapt routes/auth to your API
|
|
29
|
+
import http from 'k6/http';
|
|
30
|
+
import { check, sleep } from 'k6';
|
|
31
|
+
|
|
32
|
+
const BASE = __ENV.BASE || 'http://localhost:8000';
|
|
33
|
+
|
|
34
|
+
export const options = {
|
|
35
|
+
scenarios: {
|
|
36
|
+
ramp: { executor: 'ramping-vus', startVUs: 0,
|
|
37
|
+
stages: [ { duration: '30s', target: 50 }, { duration: '2m', target: 50 }, { duration: '30s', target: 0 } ] },
|
|
38
|
+
},
|
|
39
|
+
thresholds: {
|
|
40
|
+
http_req_failed: ['rate<0.01'], // < 1% errors
|
|
41
|
+
'http_req_duration{name:list}': ['p(95)<200'], // p95 < 200ms on the list call
|
|
42
|
+
},
|
|
43
|
+
};
|
|
44
|
+
|
|
45
|
+
export function setup() {
|
|
46
|
+
// If the API uses cookie/session or token auth, authenticate ONCE here and reuse per VU.
|
|
47
|
+
const res = http.post(`${BASE}/api/login`,
|
|
48
|
+
JSON.stringify({ email: __ENV.EMAIL, password: __ENV.PASSWORD }),
|
|
49
|
+
{ headers: { 'Content-Type': 'application/json' } });
|
|
50
|
+
check(res, { 'login 200': (r) => r.status === 200 });
|
|
51
|
+
return { cookies: res.cookies };
|
|
52
|
+
}
|
|
53
|
+
|
|
54
|
+
export default function (data) {
|
|
55
|
+
const res = http.get(`${BASE}/api/items`, { jar: http.cookieJar(), tags: { name: 'list' } });
|
|
56
|
+
check(res, { 'list 200': (r) => r.status === 200 });
|
|
57
|
+
sleep(1);
|
|
58
|
+
}
|
|
59
|
+
```
|
|
60
|
+
|
|
61
|
+
## Method
|
|
62
|
+
|
|
63
|
+
1. **Pick the target + SLO.** One endpoint/flow per test; define the pass threshold up front (p95 latency, error rate, target rps).
|
|
64
|
+
2. **Use realistic data + auth.** Authenticate once and reuse the session/token. Seed realistic data so any tenant-scoped or filtered queries hit realistic row counts.
|
|
65
|
+
3. **Mind rate limits.** Throttled endpoints (e.g. auth) — either test below the limit, or test the limiter itself deliberately and expect 429s.
|
|
66
|
+
4. **Ramp, don't spike** (unless spike is the test). Warm up, hold, ramp down.
|
|
67
|
+
5. **Watch the service while it runs:** tail the service's logs, the database's active-connection count, and CPU. The bottleneck is usually the data layer (N+1, missing index, pool exhaustion) — cross-check your stack's DB-performance guidance.
|
|
68
|
+
6. **Read results:** p95/p99 (not just avg), error rate, throughput plateau. A latency knee as concurrency climbs = saturation (pool/CPU/lock).
|
|
69
|
+
|
|
70
|
+
## Thresholds to start from
|
|
71
|
+
|
|
72
|
+
- Error rate < 1% under target load.
|
|
73
|
+
- p95 within the endpoint's SLO; p99 no more than ~2–3× p95 (bigger gap = tail problems).
|
|
74
|
+
- Throughput scales with concurrency until a plateau — find where it flattens.
|
|
75
|
+
|
|
76
|
+
## Rules
|
|
77
|
+
|
|
78
|
+
1. **Never load-test production** without explicit approval — test a staging/local stack. See `.claude/rules/human-in-the-loop.md`.
|
|
79
|
+
2. Tie every run to an SLO; a load test without a pass/fail threshold is just noise.
|
|
80
|
+
3. When you find a cliff, hand the specifics to the data-layer/dev lane — don't guess the fix here.
|
|
81
|
+
4. Keep scripts in `load/`; record the run + result in `docs/performance/`.
|
|
82
|
+
|
|
83
|
+
> Adapted from a portfolio project's load-testing skill; generalized to be stack- and tool-agnostic for claude-kit.
|