@ai-agent-lead/skills 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (48) hide show
  1. package/README.md +37 -0
  2. package/bin/install.js +272 -0
  3. package/package.json +34 -0
  4. package/skills/LANGUAGE.md +72 -0
  5. package/skills/README.md +156 -0
  6. package/skills/SKILL-TEMPLATE.md +120 -0
  7. package/skills/TRIGGERS.md +64 -0
  8. package/skills/WORKFLOWS.md +369 -0
  9. package/skills/bench/SKILL.md +40 -0
  10. package/skills/bench/templates/benchmark-report.md +26 -0
  11. package/skills/bootstrap/BOOTSTRAP.md +13 -0
  12. package/skills/bootstrap/SKILL.md +47 -0
  13. package/skills/code-hygiene/SKILL.md +92 -0
  14. package/skills/debug/SKILL.md +122 -0
  15. package/skills/design/DEEP-MODULES.md +76 -0
  16. package/skills/design/FUNCTIONAL-CORE.md +121 -0
  17. package/skills/design/ILLEGAL-STATES.md +102 -0
  18. package/skills/design/OBSERVABILITY.md +49 -0
  19. package/skills/design/PERSONAS.md +41 -0
  20. package/skills/design/SKILL.md +139 -0
  21. package/skills/design/TESTABILITY.md +84 -0
  22. package/skills/feature-doc/SKILL.md +113 -0
  23. package/skills/feature-doc/templates/feature-template.md +52 -0
  24. package/skills/formats/ADR-FORMAT.md +51 -0
  25. package/skills/formats/CONTEXT-FORMAT.md +109 -0
  26. package/skills/formats/CONTEXT-MAP-FORMAT.md +6 -0
  27. package/skills/grill-plan/SKILL.md +112 -0
  28. package/skills/improve-codebase-architecture/DEEPENING.md +37 -0
  29. package/skills/improve-codebase-architecture/INTERFACE-DESIGN.md +41 -0
  30. package/skills/improve-codebase-architecture/SKILL.md +115 -0
  31. package/skills/investigate/SKILL.md +97 -0
  32. package/skills/investigate/templates/research-note.md +84 -0
  33. package/skills/pr-review/SKILL.md +197 -0
  34. package/skills/prod-ready/SKILL.md +88 -0
  35. package/skills/security-review/SKILL.md +145 -0
  36. package/skills/simplify/SKILL.md +105 -0
  37. package/skills/sync-check/SKILL.md +69 -0
  38. package/skills/system-design/SKILL.md +160 -0
  39. package/skills/tdd/SKILL.md +121 -0
  40. package/skills/tdd/TESTS.md +93 -0
  41. package/skills/tdd-rounds/COMMITS.md +122 -0
  42. package/skills/tdd-rounds/SKILL.md +96 -0
  43. package/skills/tdd-rounds/templates/builder-brief.md +73 -0
  44. package/skills/tdd-rounds/templates/builder-report.md +21 -0
  45. package/skills/verify-real-deps/MOTIVATION.md +18 -0
  46. package/skills/verify-real-deps/SKILL.md +118 -0
  47. package/skills/verify-real-deps/templates/known-issues.md +45 -0
  48. package/skills/zoom-out/SKILL.md +104 -0
@@ -0,0 +1,121 @@
1
+ ---
2
+ name: tdd
3
+ description: Test-driven development with the red-green-refactor loop. Use when implementing a feature, fixing a bug, changing core logic, or when the user mentions "TDD", "test-first", "red-green-refactor", or "integration tests". Skip for trivial UI glue, config changes, or pure docs edits.
4
+ complexity: medium
5
+ expected_duration: 20 minutes
6
+ ---
7
+
8
+ # Test-Driven Development
9
+
10
+ ## When to use
11
+
12
+ - Implementing a feature, fixing a bug, or changing core logic.
13
+ - Any `tdd-rounds` Builder invocation (mandatory every round).
14
+ - The user mentions "TDD", "test-first", "red-green-refactor", "integration tests".
15
+
16
+ ## When to skip
17
+
18
+ - Trivial UI glue, framework wiring, config changes, pure docs edits.
19
+ - Trivial getters / setters with no behavior.
20
+ - Bug whose root cause isn't yet known — run [`debug`](../debug/SKILL.md) first; the reproduction crystallises into the failing test.
21
+
22
+ ## Philosophy
23
+
24
+ Tests verify **behavior through public interfaces**, not implementation details. Code can change entirely; tests shouldn't.
25
+
26
+ - **Good tests** read like specifications: "user can checkout with valid cart". They use the public API and survive refactors.
27
+ - **Bad tests** are coupled to internals: they mock internal collaborators, assert on call order, or peek at private state. The warning sign — the test breaks during a refactor when behavior didn't change.
28
+
29
+ See [TESTS.md](TESTS.md) for concrete good/bad examples and a pre-commit checklist.
30
+
31
+ ## Anti-pattern: Horizontal Slicing
32
+
33
+ **Do not write all tests first, then all implementation.** This is the most common TDD failure mode for AI agents — you write five tests in one shot, then five matching functions. The tests describe *imagined* behavior, become insensitive to real bugs, and lock in the wrong shape.
34
+
35
+ ```
36
+ WRONG (horizontal):
37
+ RED: test1, test2, test3, test4, test5
38
+ GREEN: impl1, impl2, impl3, impl4, impl5
39
+
40
+ RIGHT (vertical):
41
+ RED → GREEN: test1 → impl1
42
+ RED → GREEN: test2 → impl2
43
+ ...
44
+ ```
45
+
46
+ One test → minimum code to pass → next test. Each cycle informs the next.
47
+
48
+ ## Workflow
49
+
50
+ ### 1. Red — Write a failing test
51
+
52
+ - Pick one acceptance criterion from the feature doc.
53
+ - Write a single test that asserts the behavior through the public interface.
54
+ - Name the test after the behavior, not the function: `returns_empty_list_when_no_users`, not `test_getUsers`.
55
+ - Run it. Confirm it fails for the **right reason** (assertion failed, not import error).
56
+
57
+ ### 2. Green — Minimum code to pass
58
+
59
+ - Smallest amount of code that makes the test pass.
60
+ - Hardcoding a return value is acceptable on the first test — the next test forces generalization.
61
+ - Resist adding features the test does not require.
62
+
63
+ ### 3. Refactor — Clean up with the test as a safety net
64
+
65
+ - Remove duplication, improve names, extract functions.
66
+ - Run tests after every change.
67
+ - Do not add new behavior during refactor.
68
+ - **Never refactor while red.** Get to green first.
69
+
70
+ ### 4. Simplify pass — end-of-round, after green
71
+
72
+ Before declaring the round done, run the [`simplify`](../simplify/SKILL.md) skill. It walks every changed file with four lenses — reuse, quality, efficiency, test relevance — and is a single sweep, not a license to refactor everything. If you find structural issues that need a dedicated round, surface them as `Open questions for parent` instead.
73
+
74
+ ## Rules
75
+
76
+ - One behavior per test.
77
+ - No production code without a failing test first (for core logic).
78
+ - Test through public interfaces only — don't peek at private state or internal calls.
79
+ - If a test is hard to write, the design is probably wrong — fix the design, not the test.
80
+ - Test-after is acceptable for: UI wiring, framework glue, trivial getters/setters.
81
+
82
+ ## Test Pyramid (default targets)
83
+
84
+ - ~70% unit tests — fast, isolated.
85
+ - ~20% integration tests — real dependencies where it matters (DB, HTTP).
86
+ - ~10% end-to-end — only the critical user paths.
87
+
88
+ ## Anti-patterns
89
+
90
+ - Writing tests after the code "to get coverage" — defeats the design feedback loop.
91
+ - Mocking everything — tests pass but the system is broken. Mock at boundaries only (external APIs, time, randomness).
92
+ - One giant test asserting many things — failures become hard to diagnose.
93
+ - Asserting on internal call counts or order — couples the test to implementation.
94
+
95
+ ## When invoked as a Builder for a multi-round project
96
+
97
+ If the parent agent dispatched you with a `tdd-rounds`-shaped brief listing "Skills (in order): design, tdd, simplify": **execute all three sequentially in this single invocation. Do not return to the parent between skills.** "In order" means run them in that order WITHIN this run — not handed off across separate invocations. Returning early after only `design` is the most common Builder failure mode and forces the parent to re-dispatch.
98
+
99
+ The `tdd-rounds` skill captures the full orchestration pattern (Builder brief schema, structured report shape, parent verification ritual). Read it if your brief references it.
100
+
101
+ ## Pairing with other skills
102
+
103
+ - **[`feature-doc`](../feature-doc/SKILL.md)** — runs *before*. ACs become the test list.
104
+ - **[`debug`](../debug/SKILL.md)** — runs *before* for non-trivial bugs. Reproduction → failing test.
105
+ - **[`design`](../design/SKILL.md)** — runs *alongside*. If TDD feels painful, the design is wrong.
106
+ - **[`simplify`](../simplify/SKILL.md)** — runs *after green*. End-of-slice / end-of-round sweep.
107
+ - **[`prod-ready`](../prod-ready/SKILL.md)** — runs *after green* (single feature) before opening the PR.
108
+ - **[`tdd-rounds`](../tdd-rounds/SKILL.md)** — wraps `tdd` for multi-round projects.
109
+
110
+ ## Done when
111
+
112
+ - All ACs from the feature doc are green and the tests are committed.
113
+ - Tests assert behavior through public interfaces, not internals.
114
+ - The simplify pass has run.
115
+ - For single-feature flow: `prod-ready` is queued. For `tdd-rounds`: the structured Builder report is emitted.
116
+
117
+ ## Handoff
118
+
119
+ When all acceptance criteria are green:
120
+ - For a single-feature project: run `prod-ready` before opening the PR. Catches operational, infrastructure, and consistency issues tests don't surface.
121
+ - For a multi-round project orchestrated under `tdd-rounds`: emit the structured Builder report (per `tdd-rounds/templates/builder-report.md`) and stop. The parent runs `prod-ready` and `verify-real-deps` at the project level.
@@ -0,0 +1,93 @@
1
+ # Good and Bad Tests
2
+
3
+ Companion to [SKILL.md](SKILL.md). Concrete examples of the philosophy "test behavior, not implementation."
4
+
5
+ Examples use TypeScript-ish syntax for readability — the principles apply to any language and any test framework (pytest, JUnit, Go's `testing`, Jest, RSpec, etc.). Replace `expect(x).toBe(y)` with the equivalent assertion in your stack; the structure (Arrange / Act / Assert through the public interface) is what matters.
6
+
7
+ ## Good tests
8
+
9
+ Test through the public interface. Read like a specification.
10
+
11
+ ```ts
12
+ test("user can checkout with valid cart", async () => {
13
+ const cart = createCart();
14
+ cart.add(product);
15
+ const result = await checkout(cart, paymentMethod);
16
+ expect(result.status).toBe("confirmed");
17
+ });
18
+ ```
19
+
20
+ Properties:
21
+
22
+ - Describes *what* the system does, not *how*.
23
+ - Uses only public APIs.
24
+ - Survives internal refactors.
25
+ - One logical assertion per test.
26
+
27
+ ## Bad tests
28
+
29
+ ### 1. Asserting on internal calls
30
+
31
+ ```ts
32
+ // BAD — couples the test to the implementation
33
+ test("checkout calls paymentService.process", async () => {
34
+ const mockPayment = jest.mock(paymentService);
35
+ await checkout(cart, payment);
36
+ expect(mockPayment.process).toHaveBeenCalledWith(cart.total);
37
+ });
38
+ ```
39
+
40
+ If you rename `process` or split it into two calls, the test breaks even though behavior is identical. Test the *outcome*, not the call.
41
+
42
+ ### 2. Bypassing the interface to verify
43
+
44
+ ```ts
45
+ // BAD — peeks into the database directly
46
+ test("createUser saves to database", async () => {
47
+ await createUser({ name: "Alice" });
48
+ const row = await db.query("SELECT * FROM users WHERE name = ?", ["Alice"]);
49
+ expect(row).toBeDefined();
50
+ });
51
+ ```
52
+
53
+ ```ts
54
+ // GOOD — verifies through the public interface
55
+ test("createUser makes user retrievable", async () => {
56
+ const user = await createUser({ name: "Alice" });
57
+ const retrieved = await getUser(user.id);
58
+ expect(retrieved.name).toBe("Alice");
59
+ });
60
+ ```
61
+
62
+ If the storage layer changes (different table, new ORM, in-memory cache), the good version still works.
63
+
64
+ ### 3. Testing the shape, not the behavior
65
+
66
+ ```ts
67
+ // BAD — asserts structure, not meaning
68
+ test("getUser returns object with id and name fields", async () => {
69
+ const user = await getUser("123");
70
+ expect(user).toHaveProperty("id");
71
+ expect(user).toHaveProperty("name");
72
+ });
73
+ ```
74
+
75
+ ```ts
76
+ // GOOD — asserts the behavior the caller depends on
77
+ test("getUser returns the user matching the requested id", async () => {
78
+ const created = await createUser({ name: "Alice" });
79
+ const fetched = await getUser(created.id);
80
+ expect(fetched.name).toBe("Alice");
81
+ });
82
+ ```
83
+
84
+ ## Quick checklist
85
+
86
+ Before committing a test, ask:
87
+
88
+ - [ ] Would this test still pass if I rewrote the implementation while keeping behavior the same?
89
+ - [ ] Does the test name describe a behavior, not a function?
90
+ - [ ] Does it use only public APIs (no private fields, no DB peeking, no internal mocks)?
91
+ - [ ] Is there exactly one logical thing being asserted?
92
+
93
+ If any answer is no, revise before merging.
@@ -0,0 +1,122 @@
1
+ # Commit cadence and message style
2
+
3
+ The Builder's commits are the parent's review surface. A round that ships as one big commit forces the parent to read everything at once; a round that ships as a clean per-slice sequence is reviewable one diff at a time. The shape below was load-bearing across 14 rounds in the project this skill was distilled from.
4
+
5
+ ## The rules
6
+
7
+ ### 1. Every code commit gets an `R<N>:` prefix
8
+
9
+ `git log --oneline` then reads as round history. A reviewer or future contributor scanning the log can see "this is round 5; this is the simplify pass at end of round 6.7; this is the final round before tag" without opening any commit.
10
+
11
+ ### 2. Bug-fix commits also reference the issue ledger
12
+
13
+ Format: `R<N>: #<X> <title>`. Each bug-fix commit ties one-to-one to an entry in `docs/known-issues.md`. The ledger entry doubles as the brief; the commit closes it.
14
+
15
+ ```
16
+ R12: #3 strip models/ prefix from parsed model id
17
+ R14: #8 forwarder rewrites body's project field to chosen pool account
18
+ ```
19
+
20
+ ### 3. One commit per AC slice when slices are independent
21
+
22
+ Round delivers 5 ACs with no dependency between them → 5 commits + 1 simplify-pass commit + 1 doc tick commit. Each commit is a clean unit a parent reviews in isolation.
23
+
24
+ ```
25
+ R12: #1 add g1-pro-tier to routing tier order
26
+ R12: #2 seed quota cache on accounts add
27
+ R12: #3 strip models/ prefix from parsed model id
28
+ R12: #4 add POST /v1/accounts/{handle}/refresh-quota endpoint
29
+ R12: #5 dual-mode audit parser dispatches on Content-Type
30
+ R12: simplify — factor applyFrame, extractModel; close known-issues
31
+ ```
32
+
33
+ A single mono-commit "R12: smoke-test fixes" of the same content forces the parent to mentally separate five unrelated changes from one diff. Avoid.
34
+
35
+ ### 4. Simplify pass is its own commit
36
+
37
+ End-of-round refactoring (extracting helpers, removing dead branches, tightening names) is mechanically separate from the AC work that motivated it. Bundling them hides the "what changed because the AC required it" from "what changed because we cleaned up after." Reviewers care about the distinction; a separate commit preserves it.
38
+
39
+ ### 5. Doc-only commits stay separate from code
40
+
41
+ `docs/STATE.md`, `docs/known-issues.md`, ADR patches — never bundle into a code commit. Mixing makes both diffs harder to read. Doc-only commits get a `docs:` prefix (no `R<N>:`):
42
+
43
+ ```
44
+ docs: append actual R14 entry to STATE.md
45
+ docs: add public README
46
+ ```
47
+
48
+ ### 6. Single-commit rounds are OK when justified
49
+
50
+ A refactor round (no AC slices) or a single-issue fix can land in one commit if splitting would create non-buildable intermediate states. The Builder's report MUST justify the lump in `Deviations`:
51
+
52
+ > Single commit instead of multiple `R13:` slices. The change is small (one interface method + one Service-level helper + four targeted tests) and the slices are tightly coupled (interface change forces Stub method to land in the same commit); splitting would create a non-buildable intermediate state.
53
+
54
+ If you can't justify it, don't lump it.
55
+
56
+ ### 7. Commit messages are honest
57
+
58
+ The commit message states what the commit actually does, not what you wished it did. The lesson from this skill's source project: an R14 commit message claimed "STATE.md updated" but the Builder had forgotten that file — a follow-up commit was needed to restore truth. **Cost of an honest message: one extra line of typing. Cost of a lying message: a follow-up commit, a smudged history, and a lost minute the next time someone wonders why the round entry is missing.**
59
+
60
+ If a step you described in the brief didn't actually land, say so in `Deviations`. If a commit's diff doesn't match its message, fix the message before committing — or split into two commits if the message's intent really was two things.
61
+
62
+ ## Message body shape
63
+
64
+ The first line is the subject (~50–72 chars). The body explains:
65
+
66
+ - **What** the commit does (one or two sentences).
67
+ - **Why** the change was needed (the AC, the bug, the constraint).
68
+ - **Tradeoff being accepted** if the change picked one path over another.
69
+
70
+ Keep it short. Three short paragraphs > one long one. Bullets are fine.
71
+
72
+ Real example from R8:
73
+
74
+ ```
75
+ R8: AC-CLI1/3/4/5 + AC-OP1/3 — cobra CLI + start subcommand
76
+
77
+ Wraps the admin API with a cobra-driven command tree:
78
+ - accounts {list, add, remove, cooldown, rename}
79
+ - stats / usage with --since, --until, --group_by, --json
80
+ - start subcommand wires app.NewDaemon + admin.NewServer under
81
+ shared ctx with SIGINT/SIGTERM cancel.
82
+
83
+ Read commands precheck /v1/healthz; daemon-down surfaces the
84
+ documented "Run gemini-proxy start first" remediation instead of a
85
+ raw connection-refused stack.
86
+
87
+ --group_by uses snake_case to match the AC text. table-by-default,
88
+ --json for machine-readable; same data both formats.
89
+ ```
90
+
91
+ Counter-example that fails the rules:
92
+
93
+ ```
94
+ R8: stuff
95
+ ```
96
+
97
+ (no scope, no why, no shape).
98
+
99
+ ## Footer
100
+
101
+ Co-authored-by line for agent contributions:
102
+
103
+ ```
104
+ Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
105
+ ```
106
+
107
+ Use whatever model line matches the actual driver. Match exactly the convention the project's CLAUDE.md (or root agent doc) prescribes — don't invent variations.
108
+
109
+ ## Anti-patterns
110
+
111
+ - **Mono-commit at end of round.** "R<N>: round done." Defeats the per-slice review story.
112
+ - **Bundling docs with code.** STATE.md updates ride along with the code change. Diffs become harder to read.
113
+ - **Imperative-mood squashing.** Four sequential "wip" commits eventually amended into one. The journey isn't the commit; the destination is. Don't force-amend; commit cleanly the first time.
114
+ - **Empty bodies on non-trivial commits.** "fix bug" with no body. The body is where the *why* lives.
115
+ - **Lying about scope.** A commit message saying "also updates X" when the diff has no X. Worse than no message.
116
+ - **Skipping the prefix.** A round commit without `R<N>:` is a hidden commit — the log loses its round structure.
117
+
118
+ ## When to skip these rules
119
+
120
+ - **Pre-round scaffolding** (initial repo setup, before round 1) — no `R<N>:` prefix yet, no AC contract. Plain conventional commits work.
121
+ - **Hot-fix to a tagged release** (post-v1, urgent) — the `R<N>:` cadence is for active development; emergency fixes can use `fix:` or `hotfix:` prefixes per the project's release-engineering norms. Document in the project's CLAUDE.md.
122
+ - **Squash merges to main** — if the project's branching model squashes feature branches, the per-slice commits live on the feature branch only. The squashed commit message should still preserve the per-slice summary in its body.
@@ -0,0 +1,96 @@
1
+ ---
2
+ name: tdd-rounds
3
+ description: Multi-round TDD orchestration. Use when delivering a feature larger than one TDD slice — typically 5-15 acceptance criteria across multiple packages — by dispatching Builder sub-agents per round, with the parent maintaining state and verifying. Triggered when the user mentions "drive the sub-agent team", "multi-round TDD", "orchestrate rounds", "Builder agents", or when a plan from `feature-doc` lists more ACs than one agent should reasonably tackle in a single invocation. Pairs with `tdd` (Builders invoke that skill per round) and `prod-ready` (final round before tag).
4
+ complexity: high
5
+ expected_duration: 1 hour
6
+ ---
7
+
8
+ # Multi-Round TDD Orchestration
9
+
10
+ Distills the pattern of a parent agent driving Builder sub-agents through a feature's acceptance criteria, one round per AC slice. The parent does not write code; the parent writes briefs, reviews diffs, runs verification, and keeps the running summary.
11
+
12
+ ## When to use
13
+
14
+ - A feature doc has more ACs (say, ≥10) than one agent invocation can hold cleanly.
15
+ - Multiple packages are involved; serial rounds let each one land cleanly before the next builds on top.
16
+ - Cross-round invariants matter (test green every round, not just at the end).
17
+
18
+ ## When to skip
19
+
20
+ - Single-AC bug fix or single-package feature — invoke `tdd` directly, no round overhead.
21
+ - Pure refactor with no AC changes — use the `improve-codebase-architecture` flow instead.
22
+
23
+ ## The parent's contract
24
+
25
+ - **Plan once.** The plan file lists rounds, each round's ACs, the dependency order, and the skills cadence per round (which rounds need `design`; when to invoke `improve-codebase-architecture` mid-project; when to invoke `prod-ready`).
26
+ - **Brief per round.** A self-contained brief — the Builder shouldn't need conversation history.
27
+ - **Briefing Sub-agent**: For complex rounds, invoke a sub-agent to autonomously generate the brief by analyzing `docs/STATE.md`, the `feature-doc`, and the results of the previous round. See `templates/builder-brief.md` for the schema.
28
+ - **Verify after each round.** Run the test command independently (don't trust the Builder's pasted output), read the diff, tick AC checkboxes in the feature doc with the test name, append a round summary to `docs/STATE.md`.
29
+ - **Never write code yourself.** If you find yourself doing it directly under time pressure, that's a signal the round was misscoped — split it.
30
+
31
+ ## The Builder's contract
32
+
33
+ When dispatched for a round, the Builder:
34
+
35
+ 1. Reads the brief, the plan file, `docs/STATE.md`, and any ADRs / feature docs the brief cites.
36
+ 2. **Executes the listed skills sequentially in this single invocation.** When a brief says "Skills (in order): design, tdd, simplify" — that means run all three in this run, not return to the parent between them. This rule is non-obvious; the brief template makes it explicit.
37
+ 3. Commits per AC slice (or per behavior slice for refactor rounds), prefix `R<N>:`. **Read [`COMMITS.md`](COMMITS.md) before the first commit** — it captures the seven rules (`R<N>:` prefix, `#<X>` for bug fixes, per-AC slicing, separate simplify-pass commit, separate doc commits, single-commit-with-justification, honest messages) and the message-body shape. The parent reads commits as the review surface; a clean sequence is reviewable one diff at a time, a mono-commit isn't.
38
+ 4. Returns the structured report from `templates/builder-report.md`.
39
+ 5. Does NOT push, does NOT touch files outside the brief's allowlist.
40
+
41
+ ## Skills cadence
42
+
43
+ | Skill | Invoked by | When |
44
+ |---|---|---|
45
+ | [`design`](../design/SKILL.md) | Builder | Rounds introducing a new package or non-trivial interface. Skip on rounds that only extend existing modules. |
46
+ | [`tdd`](../tdd/SKILL.md) | Builder | **Every** code-writing round. Mandatory. |
47
+ | [`simplify`](../simplify/SKILL.md) | Builder | End of every round, after green. Single sweep — reuse / quality / efficiency / test relevance. Lands as its own commit ([COMMITS.md rule 4](COMMITS.md)). |
48
+ | [`improve-codebase-architecture`](../improve-codebase-architecture/SKILL.md) | **Parent** | Once mid-project, after the load-bearing seams exist. Surfaced opportunities become dedicated refactor rounds (no behavior change; ACs are "all existing tests still green"). |
49
+ | [`prod-ready`](../prod-ready/SKILL.md) | Builder, final round only | Pre-tag operational checklist. Output the filled checklist verbatim in the Builder's report. |
50
+ | [`verify-real-deps`](../verify-real-deps/SKILL.md) | **Parent** | After `prod-ready` clean, before tagging. Catches what unit/integration tests can't see — wire-shape mismatches against real third-party APIs. |
51
+
52
+ ## State tracking
53
+
54
+ `docs/STATE.md` is the parent-maintained running summary, append-only, ~one entry per round. The Builder reads it as pre-flight context for the round; the parent appends after each round closes. Single source of truth for "what we have so far" between Builder invocations.
55
+
56
+ Format per round:
57
+
58
+ ```
59
+ ## Round N — <title> (DONE YYYY-MM-DD)
60
+ Delivered: AC-NN, AC-NN
61
+ New packages: <list>
62
+ Public types: <one line>
63
+ Invariants now enforced: <one line>
64
+ Known follow-ups: <one line>
65
+ ```
66
+
67
+ ## Failure modes and recovery
68
+
69
+ - **Builder stuck (test won't go green / design feels wrong).** Builder reports a *blocking* open question instead of thrashing. Hard rule: **Builder must not silently descope an AC.**
70
+ - **Concrete escalation signals** (any one fires → stop and surface):
71
+ - 3 consecutive failed attempts at making the same test green with the same approach.
72
+ - 2 design-level questions that the brief + STATE.md + cited ADRs don't answer.
73
+ - The Builder finds itself wanting to modify a file in the "must NOT touch" allowlist to make progress.
74
+ - A new test would require mocking >3 internal collaborators (smell: design is wrong, not the test).
75
+ - **Parent's response**: shrink scope (split the round), or invoke `design` explicitly with the friction described, or run `grill-plan` if a load-bearing decision is wobbling. Don't push the Builder to keep trying.
76
+ - **Round breaks earlier rounds' tests.** Parent-review failure. Fix is a follow-up round, scope = "restore green for tests X, Y; explain what regressed; adjust ADR if needed." Never amend prior commits.
77
+ - **ADR turns out wrong mid-round.** Builder stops with a blocking open question. Parent runs `grill-plan` or writes a superseding ADR. The current round restarts with the new ADR as input.
78
+ - **User redirects mid-round.** Cancel the in-flight Task. No partial-round commit. Re-brief and re-run. Sub-agent state is ephemeral by design.
79
+ - **Brief was wrong.** Parent rewrites the brief, re-dispatches. Don't try to "fix it up" through follow-up messages — sub-agents work better from a clean self-contained brief.
80
+
81
+ ## Templates
82
+
83
+ - [`templates/builder-brief.md`](templates/builder-brief.md) — the self-contained brief shape the parent fills in per round.
84
+ - [`templates/builder-report.md`](templates/builder-report.md) — the structured report shape the Builder returns.
85
+
86
+ ## Supporting docs
87
+
88
+ - [`COMMITS.md`](COMMITS.md) — commit cadence and message style (per-AC slicing, `R<N>:` prefix, when single-commit is OK, honesty rule). Builders read this before the first commit.
89
+
90
+ ## Handoff
91
+
92
+ When the final round completes:
93
+
94
+ 1. Run `verify-real-deps`. Capture surfaced bugs into `docs/known-issues.md`.
95
+ 2. Iterate fix-rounds until clean, or document deferrals to vN.1 with rationale.
96
+ 3. Tag and publish via whatever release / distribution channel applies.
@@ -0,0 +1,73 @@
1
+ # ROUND N — <title>
2
+
3
+ You are a Builder sub-agent in a multi-round TDD project. Read the orchestration plan and `docs/STATE.md` first. **Execute design + tdd + simplify in THIS invocation. Do not stop after design.**
4
+
5
+ ## Scope
6
+
7
+ <one paragraph in plain English: what behavior ships this round, what it does NOT cover>
8
+
9
+ ## ACs in scope (verbatim from feature doc)
10
+
11
+ - **AC-XX1** Given ..., when ..., then ....
12
+ - **AC-XX2** ...
13
+
14
+ ## ACs explicitly out of scope this round
15
+
16
+ - AC-YYY (defer to Round M).
17
+ - All other ACs.
18
+
19
+ ## ADR refs (load-bearing constraints, restated in 1 line each)
20
+
21
+ - **ADR-NNN** (`docs/adr/NNN-*.md`): <one-line summary of the constraint this round must respect>.
22
+ - **ADR-NNN** (...): ...
23
+
24
+ ## Files you may create
25
+
26
+ - `<path>/*.go` and `*_test.go`
27
+ - ...
28
+
29
+ ## Files you may modify
30
+
31
+ - `<path>` (specific reason)
32
+ - ...
33
+
34
+ ## Files you must NOT touch
35
+
36
+ - closed packages: `<list>`
37
+ - `cli/`, `creds/`, `.claude/`, `docs/STATE.md` (parent owns)
38
+ - ...
39
+
40
+ ## Skills (in order — execute ALL in this invocation)
41
+
42
+ 1. **`design`** — REQUIRED if introducing a new package or non-trivial interface. Decisions to make: <list>.
43
+ 2. **`tdd`** — REQUIRED. Vertical slicing per AC; commit prefix `R<N>:`. Read [`tdd-rounds/COMMITS.md`](../COMMITS.md) before the first commit — it captures the per-AC slicing rule, the simplify-pass-gets-its-own-commit rule, the doc-commits-stay-separate rule, when single-commit is justified, and the message-body shape.
44
+ 3. **[`simplify`](../../simplify/SKILL.md)** — REQUIRED end-of-round. Walk every changed file with four lenses: reuse, quality, efficiency, test relevance. Land as its own commit (`R<N>: simplify — <summary>`).
45
+
46
+ ## Pre-flight reading (in order)
47
+
48
+ 1. `docs/STATE.md` — what previous rounds delivered.
49
+ 2. `docs/features/<feature>.md` — the AC contract.
50
+ 3. The ADRs cited above.
51
+ 4. <specific files the Builder needs to read before starting>
52
+
53
+ ## Concrete deliverables
54
+
55
+ <per-AC or per-component, what the Builder must produce — function signatures, schema, etc.>
56
+
57
+ ## Success criteria
58
+
59
+ - `make test` (or equivalent) exits 0. **All previously-ticked ACs remain green.**
60
+ - `make lint` clean.
61
+ - New tests cover: <specific behaviors>.
62
+ - Per-AC commits prefixed `R<N>:`.
63
+ - AC-XX1, AC-XX2, ... ticked in `docs/features/<feature>.md` with the test names.
64
+
65
+ ## Output (REQUIRED — paste verbatim shape from `templates/builder-report.md`)
66
+
67
+ ## Reminders
68
+
69
+ - **Use the project's dev-environment commands** (e.g., Docker via `make`). Do not pollute the host.
70
+ - **No `time.Sleep` in tests.**
71
+ - **Don't `git push`.**
72
+ - **Don't break existing tests.**
73
+ - **Don't silently descope an AC.** If blocked, surface as a blocking open question and stop.
@@ -0,0 +1,21 @@
1
+ ## Summary
2
+ <one paragraph — what changed, why, what's now enforced>
3
+
4
+ ## ACs covered
5
+ AC-XX1: <test name(s)> — green
6
+ AC-XX2: <test name(s)> — green
7
+ ...
8
+
9
+ ## Files changed
10
+ <path> — created|modified — <one-line purpose>
11
+ <path> — created|modified — <one-line purpose>
12
+ ...
13
+
14
+ ## Test run
15
+ <paste last 30 lines of the test command output>
16
+
17
+ ## Deviations
18
+ <anything done that the brief did not explicitly authorize, with reason; or "none">
19
+
20
+ ## Open questions for parent
21
+ <blocking | non-blocking, each with a concrete decision needed; or "none">
@@ -0,0 +1,18 @@
1
+ # Why verify-real-deps exists — a worked example
2
+
3
+ A motivating example from a real session: a proxy for a third-party AI API shipped **25 / 25 acceptance criteria green**, lint clean, prod-ready clean. First end-to-end run against the live upstream surfaced **eight bugs** across four follow-up sessions:
4
+
5
+ - Body shape was `{models: [...]}`; the real API takes `{project: "..."}`.
6
+ - Tier enum had 3 values; the real API returns more.
7
+ - Cache wasn't populated at credential-add time → first request returned 503.
8
+ - Response body had `"models/X"` prefix; cache stored bare names → cache miss.
9
+ - Non-streaming responses bypassed the SSE parser → all-zero token counts.
10
+ - *(Three more in the same class — wire-shape mismatches the fake accepted but the upstream rejected.)*
11
+
12
+ Every one of those passed unit + integration tests because the fake harness accepted whatever the code sent. None survived contact with the real API.
13
+
14
+ ## The lesson, generalized
15
+
16
+ A fake server is a *hypothesis* about the contract; the real upstream **is** the contract. Until you run code against the real upstream at least once, every test that uses the fake is testing your hypothesis — not the contract.
17
+
18
+ Whatever the domain — payments, AI inference, search, geolocation, telephony, OAuth — the same class of wire-shape bug exists. `verify-real-deps` is the discipline that finds it before users do.