npm - @ai-agent-lead/skills - Versions diffs - 1.4.0 → 1.5.0 - Mend

@ai-agent-lead/skills 1.4.0 → 1.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (36) hide show

package/README.md +30 -0
package/package.json +1 -1
package/skills/README.md +11 -4
package/skills/SKILL-TEMPLATE.md +4 -2
package/skills/WORKFLOWS.md +11 -8
package/skills/bench/SKILL.md +20 -2
package/skills/bootstrap/SKILL.md +24 -13
package/skills/caveman/SKILL.md +1 -1
package/skills/design/SKILL.md +10 -1
package/skills/design/templates/design-note.md +47 -0
package/skills/feature-doc/SKILL.md +11 -17
package/skills/formats/CODE-HYGIENE.md +1 -1
package/skills/formats/CONTEXT-FORMAT.md +5 -5
package/skills/formats/CONTEXT-MAP-FORMAT.md +14 -3
package/skills/formats/CONVENTION-FORMAT.md +40 -0
package/skills/formats/DOC-DRIFT-AUDIT.md +2 -1
package/skills/formats/OKF.md +13 -9
package/skills/formats/{STYLE-comments.md → STYLE-COMMENTS.md} +1 -1
package/skills/grill-plan/SKILL.md +4 -0
package/skills/improve-codebase-architecture/INTERFACE-DESIGN.md +1 -1
package/skills/improve-codebase-architecture/SKILL.md +6 -15
package/skills/investigate/SKILL.md +14 -0
package/skills/pr-review/SKILL.md +3 -3
package/skills/prod-ready/SKILL.md +9 -8
package/skills/security-review/SKILL.md +12 -2
package/skills/simplify/SKILL.md +4 -4
package/skills/system-design/SKILL.md +28 -16
package/skills/tdd/SKILL.md +5 -4
package/skills/tdd-rounds/COMMITS.md +4 -4
package/skills/tdd-rounds/SKILL.md +33 -7
package/skills/tdd-rounds/templates/builder-brief.md +4 -4
package/skills/verify-real-deps/SKILL.md +16 -2
package/skills/verify-real-deps/templates/known-issues.md +33 -28
package/skills/zoom-out/SKILL.md +1 -1
/package/skills/{design → formats}/OBSERVABILITY.md +0 -0
/package/skills/{design → formats}/PERSONAS.md +0 -0

package/skills/improve-codebase-architecture/SKILL.md CHANGED Viewed

@@ -20,26 +20,17 @@ Surface architectural friction and propose **deepening opportunities** — refac
 - Single-module local refactor with no cross-module impact — use `design` or refactor inline.
 - Line-level cleanup (renames, dead-code removal) — use the [`code-hygiene`](../formats/CODE-HYGIENE.md) lens + [`simplify`](../simplify/SKILL.md).
-## Glossary
+## Vocabulary
-Use these terms exactly in every suggestion. Consistent language is the point — don't drift into "component," "service," "API," or "boundary." Full definitions in [skills/LANGUAGE.md](../LANGUAGE.md).
+Use the shared architecture terms **exactly** — **module**, **interface**, **implementation**, **depth** (deep / shallow), **seam**, **adapter**, **leverage**, **locality** — and don't drift into "component," "service," "API," or "boundary." Definitions and principles are canonical in [`LANGUAGE.md`](../LANGUAGE.md); this skill links rather than restating them.
-- **Module** — anything with an interface and an implementation (function, class, package, slice).
-- **Interface** — everything a caller must know to use the module: types, invariants, error modes, ordering, config. Not just the type signature.
-- **Implementation** — the code inside.
-- **Depth** — leverage at the interface: a lot of behaviour behind a small interface. **Deep** = high leverage. **Shallow** = interface nearly as complex as the implementation.
-- **Seam** — where an interface lives; a place behaviour can be altered without editing in place. (Use this, not "boundary.")
-- **Adapter** — a concrete thing satisfying an interface at a seam.
-- **Leverage** — what callers get from depth.
-- **Locality** — what maintainers get from depth: change, bugs, knowledge concentrated in one place.
+Three principles from there do the heavy lifting here:
-Key principles (see [skills/LANGUAGE.md](../LANGUAGE.md) for the full list):
-- **Deletion test**: imagine deleting the module. If complexity vanishes, it was a pass-through. If complexity reappears across N callers, it was earning its keep.
+- **Deletion test** — imagine deleting the module. If complexity vanishes, it was a pass-through; if it reappears across N callers, it was earning its keep. This is the primary filter for the candidates below.
 - **The interface is the test surface.**
-- **One adapter = hypothetical seam. Two adapters = real seam.**
+- **One adapter = hypothetical seam; two adapters = real seam.**
-This skill is _informed_ by the project's domain model. The domain language gives names to good seams; ADRs record decisions the skill should not re-litigate.
+This skill is _informed_ by the project's domain model: the domain language gives names to good seams; ADRs record decisions the skill should not re-litigate.
 ## Process

package/skills/investigate/SKILL.md CHANGED Viewed

@@ -85,6 +85,20 @@ The note must include:
 - Do **not** skip the artifact for "small" investigations. The discipline of writing it is the value; the durable trail is the bonus.
 - Do **not** add a Recommendation that just lists the options again. Pick one.
+## Pairing with other skills
+- **[`feature-doc`](../feature-doc/SKILL.md)** — runs *after*. The "Decided" recommendation becomes the feature contract.
+- **[`grill-plan`](../grill-plan/SKILL.md)** — runs *after* when the chosen option is load-bearing and needs stress-testing into an ADR.
+- **[`system-design`](../system-design/SKILL.md)** — runs *after* when the investigation settled a greenfield topology direction.
+- **[`debug`](../debug/SKILL.md)** — *lateral*. Also emits a `docs/research/<topic>.md` note, but for an unexplained bug rather than a forward-looking decision.
+## Done when
+- A research note exists at `docs/research/<topic>.md`, opening with `type: research` OKF frontmatter, containing Context (cited), 2–3 Options, a single Recommendation, Checkpoint questions, and Out-of-scope.
+- Every claim in Context cites `path:line` — nothing is worked from impression.
+- Exactly one option is recommended, with the accepted trade-off named (not a re-listing of the options).
+- The user has the checkpoint questions needed to choose, and no implementation has started.
 ## Handoff
 Once the user picks an option:

package/skills/pr-review/SKILL.md CHANGED Viewed

@@ -85,13 +85,13 @@ For non-surface-changing diffs: walk `prod-ready` Section 3 (defense-in-depth) b
 #### 3e. Doc-drift audit
-Walk the six checks in [`skills/formats/DOC-DRIFT-AUDIT.md`](../formats/DOC-DRIFT-AUDIT.md) against the diff — **reviewer lens**. This is the second line of defense for `prod-ready` Section 7: the author may have missed it; you catch what's left. Any check that resolves to "no" without `n/a + reason` is a finding, at the severity that reference defines — **Blocker** for load-bearing drift (a missing ADR for a hard-to-reverse decision, a `CONTEXT.md` entry for a term other PRs will use, AC drift hiding behavior, a direct ADR contradiction), **Suggestion** when the diff is self-explanatory in isolation.
+Walk the seven checks in [`skills/formats/DOC-DRIFT-AUDIT.md`](../formats/DOC-DRIFT-AUDIT.md) against the diff — **reviewer lens**. This is the second line of defense for `prod-ready` Section 7: the author may have missed it; you catch what's left. Any check that resolves to "no" without `n/a + reason` is a finding, at the severity that reference defines — **Blocker** for load-bearing drift (a missing ADR for a hard-to-reverse decision, a `CONTEXT.md` entry for a term other PRs will use, AC drift hiding behavior, a direct ADR contradiction), **Suggestion** when the diff is self-explanatory in isolation.
 #### 3f. Hygiene (line level)
 Apply the [`code-hygiene`](../formats/CODE-HYGIENE.md) lens here, not as a primary phase:
-- **Comment noise**: new WHAT-comments, docstrings on exports whose contract is obvious from the signature, in-function section headers (`// validate`, `// build response`), stale "used by X" references, citation grammar that doesn't match the repo's comment style doc ([`skills/formats/STYLE-comments.md`](../formats/STYLE-comments.md)). Flag as **nits by default**; promote to a suggestion only when cumulative comment noise obscures the diff (signal the author skipped `simplify`).
+- **Comment noise**: new WHAT-comments, docstrings on exports whose contract is obvious from the signature, in-function section headers (`// validate`, `// build response`), stale "used by X" references, citation grammar that doesn't match the repo's comment style doc ([`skills/formats/STYLE-COMMENTS.md`](../formats/STYLE-COMMENTS.md)). Flag as **nits by default**; promote to a suggestion only when cumulative comment noise obscures the diff (signal the author skipped `simplify`).
 - Names that mislead (boolean returning non-bool, `getX` that mutates, `Manager`/`Helper` suffixes hiding what the thing is).
 - Cleverness that earns its cost? Or could be boring?
 - YAGNI — "in case we need it" parameters / interfaces / classes? Strip.
@@ -164,7 +164,7 @@ A `tdd-rounds` parent verifying a Builder's round runs a focused subset:
 2. Run the test command independently — don't trust pasted output.
 3. Read the diff, classify findings.
 4. Tick AC checkboxes in the feature doc with the test names.
-5. Append the round summary to `docs/STATE.md`.
+5. Append the round summary to the feature's `docs/features/<name>.state.md`.
 The classification (blocker / suggestion / nit) lives in the parent's notes; only blockers gate the next round.

package/skills/prod-ready/SKILL.md CHANGED Viewed

@@ -1,6 +1,6 @@
 ---
 name: prod-ready
-description: Pre-merge production-readiness checklist — operational, infrastructure, and consistency checks that tests alone don't surface. Use after `tdd` reaches green; before opening a PR or merging to main; after significant infra changes (new DB, new deployment target, new auth flow); or when the user mentions "shipping", "ready to merge", "before deploy", "production readiness", or "prod-ready". Pairs with the `tdd` skill — tdd proves the feature works; this catches what tests can't see (server timeouts, DB pragmas, error-response consistency, secrets at rest).
+description: Pre-merge production-readiness checklist — operational, infrastructure, and consistency checks that tests alone don't surface. Use after `tdd` reaches green; before opening a PR or merging to main; after significant infra changes (new DB, new deployment target, new auth flow); or when the user mentions "shipping", "ready to merge", "before deploy", "production readiness", or "prod-ready". Skip for pure docs / comment / test-only diffs, dependency bumps with no runtime change, or one-line bug fixes that don't touch infra, auth, or error paths. Pairs with `tdd` (proves the feature works; this catches what tests can't see — server timeouts, DB pragmas, error-response consistency, secrets at rest), `security-review` (the heavier threat-model pass when the surface changes), `pr-review` (the reviewer's mirror of this gate), and `verify-real-deps` (after, for releases that touch third-party APIs).
 ---
 # Prod-Readiness Checklist
@@ -15,6 +15,12 @@ A short pre-merge gate. Tests prove the **feature** works. This skill catches wh
 - Before opening a PR or merging to main.
 - After significant infrastructure changes (new DB, new deployment target, new auth flow).
+## When to skip
+- Pure docs / comment / test-only changes.
+- Dependency bumps that don't change runtime behavior.
+- One-line business-logic bug fixes that don't touch infra, auth, or error paths.
 ## Checklist
 Walk each section. An item is OK to fail **only if** the feature doc's Notes / Non-Goals explicitly accepts the gap.
@@ -52,7 +58,7 @@ Walk each section. An item is OK to fail **only if** the feature doc's Notes / N
 ### 7. Documentation (the doc-map)
-Implementation lands → docs drift. The natural moment to catch drift is now, not "next sprint". Walk the six checks in [`skills/formats/DOC-DRIFT-AUDIT.md`](../formats/DOC-DRIFT-AUDIT.md) — **author lens**: tick each `✓`, `✗ + remediation`, or `n/a + reason`, and fix the drift inline before the PR (don't kick it to a follow-up). Files don't need to pre-exist — create `docs/adr/`, `CONTEXT.md`, design notes lazily when the first relevant change appears.
+Implementation lands → docs drift. The natural moment to catch drift is now, not "next sprint". Walk the seven checks in [`skills/formats/DOC-DRIFT-AUDIT.md`](../formats/DOC-DRIFT-AUDIT.md) — **author lens**: tick each `✓`, `✗ + remediation`, or `n/a + reason`, and fix the drift inline before the PR (don't kick it to a follow-up). Files don't need to pre-exist — create `docs/adr/`, `CONTEXT.md`, design notes lazily when the first relevant change appears.
 - [ ] ADR for any new decision with viable alternatives (and no active ADR is contradicted).
 - [ ] `CONTEXT.md` updated for any new/changed domain term, with `_Avoid_:` aliases where confusion is likely.
@@ -60,15 +66,10 @@ Implementation lands → docs drift. The natural moment to catch drift is now, n
 - [ ] Feature doc reflects what was actually built (no silently-dropped or silently-added behavior).
 - [ ] `CHANGELOG.md` `[Unreleased]` entry for any user-visible change.
 - [ ] Every new/changed `docs/` file opens with OKF `type` frontmatter.
+- [ ] `docs/index.md` lists every produced doc — new ones added, deleted ones removed.
 Per-check definitions, skip lists, and severity live in [`DOC-DRIFT-AUDIT.md`](../formats/DOC-DRIFT-AUDIT.md) — this is the same audit `pr-review` §3e runs from the reviewer side.
-## When to skip
-- Pure docs / comment / test-only changes.
-- Dependency bumps that don't change runtime behavior.
-- One-line business-logic bug fixes that don't touch infra, auth, or error paths.
 ## Pairing with other skills
 - **[`tdd`](../tdd/SKILL.md)** — runs *before*. `prod-ready` runs after all ACs are green.

package/skills/security-review/SKILL.md CHANGED Viewed

@@ -116,7 +116,17 @@ Most reviews produce a **`## Security review` section appended to the feature do
 - <risk> — accepted because <reason>; tracked at <link or follow-up>
 ```
-For high-stakes changes (new auth flow, new external surface, regulated data), promote to `docs/security/<feature>.md` with a fuller threat-model section. Same shape, just longer.
+For high-stakes changes (new auth flow, new external surface, regulated data), promote to a dedicated `docs/security/<feature>.md` with a fuller threat-model section — same shape, just longer. As a standalone produced doc it opens with OKF frontmatter (`type: security`, per [`../formats/OKF.md`](../formats/OKF.md) §2); the `## Security review` section appended to a feature doc needs none, since the feature doc already carries its own:
+```yaml
+---
+type: security
+title: <Feature> security review
+description: Threat model and verified controls for <feature>'s surface change.
+tags: [security, threat-model]
+timestamp: YYYY-MM-DD
+---
+```
 ## Anti-patterns
@@ -139,5 +149,5 @@ For high-stakes changes (new auth flow, new external surface, regulated data), p
 - The security surface (entry points + trust zones + data flows) is named, not assumed.
 - Each plausible threat has a verified control or an explicit deferral with rationale.
 - High-impact threats have two independent layers of control where feasible.
-- The artifact (feature-doc section or `docs/security/<feature>.md`) is in the repo.
+- The artifact is in the repo — a `## Security review` feature-doc section, or a standalone `docs/security/<feature>.md` opening with `type: security` OKF frontmatter.
 - The PR description references the review so the reviewer can audit it.

package/skills/simplify/SKILL.md CHANGED Viewed

@@ -1,6 +1,6 @@
 ---
 name: simplify
-description: Single end-of-round sweep that tightens what `tdd` just left green — review every changed file for reuse, quality, efficiency, and test relevance. Use after `tdd` reaches green and before opening a PR (or before a Builder closes a round in `tdd-rounds`). Triggered by phrases like "simplify pass", "tighten this", "clean up before commit", "end-of-round sweep", or appearing as a step in a Builder brief. Skip for trivial diffs (typo, dep bump, doc-only). Pairs with `tdd` (runs immediately after green), the `code-hygiene` lens (`formats/CODE-HYGIENE.md`, applied during the sweep), and `pr-review` (a self-check after this).
+description: Single end-of-round sweep that tightens what `tdd` just left green — review every changed file for reuse, quality, efficiency, test relevance, and telemetry. Use after `tdd` reaches green and before opening a PR (or before a Builder closes a round in `tdd-rounds`). Triggered by phrases like "simplify pass", "tighten this", "clean up before commit", "end-of-round sweep", or appearing as a step in a Builder brief. Skip for trivial diffs (typo, dep bump, doc-only). Pairs with `tdd` (runs immediately after green), the `code-hygiene` lens (`formats/CODE-HYGIENE.md`, applied during the sweep), and `pr-review` (a self-check after this).
 ---
 # Simplify
@@ -47,7 +47,7 @@ Walk every changed file. Apply each lens in order. Fix what you find inline.
 - Names that read clearly out of context — would a stranger guess what `result`, `data`, `value` referred to? If not, rename.
 - Error messages that name the failing input — `"could not parse: <value>"` beats `"parse error"`.
 - Abstractions that haven't earned their keep — a base class with one subclass, an interface with one implementation. Inline.
-- Comments — **default during the sweep is DELETE.** Apply [`CODE-HYGIENE.md`](../formats/CODE-HYGIENE.md) Principle 6 and the bar in [`STYLE-comments.md`](../formats/STYLE-comments.md): delete WHAT-comments, obvious-from-signature docstrings, "used by X" caller references, commented-out code, banners, and in-function section headers (`// validate`, `// build response`); keep only why-comments the next reader would otherwise reattempt. Normalize any kept citation (`ADR-007 §7`, `R6 AC-3`, `v0.3 R1b2`) to [`STYLE-comments.md`](../formats/STYLE-comments.md) §3.
+- Comments — **default during the sweep is DELETE.** Apply [`CODE-HYGIENE.md`](../formats/CODE-HYGIENE.md) Principle 6 and the bar in [`STYLE-COMMENTS.md`](../formats/STYLE-COMMENTS.md): delete WHAT-comments, obvious-from-signature docstrings, "used by X" caller references, commented-out code, banners, and in-function section headers (`// validate`, `// build response`); keep only why-comments the next reader would otherwise reattempt. Normalize any kept citation (`ADR-007 §7`, `R6 AC-3`, `v0.3 R1b2`) to [`STYLE-COMMENTS.md`](../formats/STYLE-COMMENTS.md) §3.
 ### 3. Efficiency — dead code, redundant work, premature defensive checks
@@ -67,7 +67,7 @@ For each test added in the slice:
 ### 5. Telemetry — the observability pass
-Apply the lens from [`skills/design/OBSERVABILITY.md`](../design/OBSERVABILITY.md):
+Apply the lens from [`skills/formats/OBSERVABILITY.md`](../formats/OBSERVABILITY.md):
 - [ ] Does every error message name the failing input?
 - [ ] Are there any "catch-all" blocks that swallow errors?
@@ -97,7 +97,7 @@ In single-feature flow (no rounds), the sweep can land as a separate commit befo
 ## Done when
-- Every changed file walked once with the four lenses.
+- Every changed file walked once with the five lenses.
 - Each finding either fixed inline (small) or surfaced as a follow-up (large).
 - Tests still green after the sweep — re-run them.
 - A separate `simplify` commit exists (in `tdd-rounds`) or the sweep is captured in the final commit message (in single-feature flow).

package/skills/system-design/SKILL.md CHANGED Viewed

@@ -78,7 +78,19 @@ Each seam is a potential test boundary AND a potential failure point. Naming the
 ### 5. Draw the system map
-Output an ASCII diagram + a module table + a seam list. Save to `docs/architecture.md` (create lazily on first use).
+Output an ASCII diagram + a module table + a seam list, saved to `docs/architecture.md` (create lazily on first use). It's a produced doc, so it opens with OKF frontmatter (`type: architecture`, per [`../formats/OKF.md`](../formats/OKF.md) §2):
+```yaml
+---
+type: architecture
+title: <System> architecture
+description: Module topology, dependency direction, and seams for <system>.
+tags: [architecture, topology]
+timestamp: YYYY-MM-DD
+---
+```
+Below the frontmatter, the body — ASCII map, module table, seam list:
 ```
                  ┌──────────────┐
@@ -125,14 +137,6 @@ Before declaring the design done, check each item:
 Failures here are not "warnings" — they're "stop and rework". The system map is load-bearing for everything downstream.
-## Done when
-- `docs/architecture.md` exists with: module table, dependency direction, seam list, ASCII map.
-- Each module name comes from `CONTEXT.md` vocabulary (not framework conventions).
-- The dependency graph is acyclic and explicitly reviewed.
-- Every cross-module boundary has a named seam + adapter location.
-- For any decision you're least sure about, you've raised it as an open question to the user before locking in.
 ## Anti-patterns
 - **Layering instead of slicing.** Naming modules `controllers`, `services`, `repositories` is layering — a category split, not a responsibility split. Prefer feature slicing (`ordering/`, `billing/`).
@@ -140,6 +144,21 @@ Failures here are not "warnings" — they're "stop and rework". The system map i
 - **Premature service extraction.** Splitting modules into separate processes / deployments without a real reason (scale, team, fault isolation) adds operational cost without clarity benefit. Prefer in-process modules first; extract later if needed.
 - **Cycles "for now".** A cyclic dependency is a sign the modules aren't really separate. Don't accept it; merge them or insert a third module to break the cycle.
+## Pairing with other skills
+- **`investigate`** runs *before*. Investigates the problem and chooses the direction. `system-design` then operationalizes the chosen direction into a topology.
+- **`design`** runs *after*, per module. `system-design` says what modules should exist; `design` says what shape each module has.
+- **`grill-plan`** runs alongside or after, to stress-test high-stakes topology decisions.
+- **`improve-codebase-architecture`** is the *opposite* skill: given an existing codebase, find what to deepen. `system-design` is greenfield; that one is brownfield.
+## Done when
+- `docs/architecture.md` exists, opening with `type: architecture` OKF frontmatter, and contains: module table, dependency direction, seam list, ASCII map.
+- Each module name comes from `CONTEXT.md` vocabulary (not framework conventions).
+- The dependency graph is acyclic and explicitly reviewed.
+- Every cross-module boundary has a named seam + adapter location.
+- For any decision you're least sure about, you've raised it as an open question to the user before locking in.
 ## Handoff
 Once the system map is stable:
@@ -149,10 +168,3 @@ Once the system map is stable:
 - Then proceed to `feature-doc` for the first feature using the topology.
 If the system map needs to evolve later (new module, changed boundaries, removed seam), come back to this skill — don't drift the map silently.
-## Pairing with other skills
-- **`investigate`** runs *before*. Investigates the problem and chooses the direction. `system-design` then operationalizes the chosen direction into a topology.
-- **`design`** runs *after*, per module. `system-design` says what modules should exist; `design` says what shape each module has.
-- **`grill-plan`** runs alongside or after, to stress-test high-stakes topology decisions.
-- **`improve-codebase-architecture`** is the *opposite* skill: given an existing codebase, find what to deepen. `system-design` is greenfield; that one is brownfield.

package/skills/tdd/SKILL.md CHANGED Viewed

@@ -1,6 +1,6 @@
 ---
 name: tdd
-description: Test-driven development with the red-green-refactor loop. Use when implementing a feature, fixing a bug, changing core logic, or when the user mentions "TDD", "test-first", "red-green-refactor", or "integration tests". Skip for trivial UI glue, config changes, or pure docs edits.
+description: Test-driven development with the red-green-refactor loop. Use when implementing a feature, fixing a bug, changing core logic, or when the user mentions "TDD", "test-first", "red-green-refactor", or "integration tests". Skip for trivial UI glue, config changes, or pure docs edits. Pairs with `feature-doc` (its ACs become the test list), `debug` (run first when the root cause isn't yet known), and `simplify` (the end-of-round sweep after green).
 ---
 # Test-Driven Development
@@ -31,9 +31,9 @@ Tests verify **behavior through public interfaces**, not implementation details.
 See [TESTS.md](TESTS.md) for concrete good/bad examples and a pre-commit checklist.
-## Anti-pattern: Horizontal Slicing
+## Slice vertically, one test at a time
-**Do not write all tests first, then all implementation.** This is the most common TDD failure mode for AI agents — you write five tests in one shot, then five matching functions. The tests describe *imagined* behavior, become insensitive to real bugs, and lock in the wrong shape.
+The principle that drives the workflow below. **Do not write all tests first, then all implementation.** This is the most common TDD failure mode for AI agents — you write five tests in one shot, then five matching functions. The tests describe *imagined* behavior, become insensitive to real bugs, and lock in the wrong shape.
 ```
 WRONG (horizontal):
@@ -72,7 +72,7 @@ One test → minimum code to pass → next test. Each cycle informs the next.
 ### 4. Simplify pass — end-of-round, after green
-Before declaring the round done, run the [`simplify`](../simplify/SKILL.md) skill. It walks every changed file with four lenses — reuse, quality, efficiency, test relevance — and is a single sweep, not a license to refactor everything. If you find structural issues that need a dedicated round, surface them as `Open questions for parent` instead.
+Before declaring the round done, run the [`simplify`](../simplify/SKILL.md) skill. It walks every changed file with five lenses — reuse, quality, efficiency, test relevance, telemetry — and is a single sweep, not a license to refactor everything. If you find structural issues that need a dedicated round, surface them as `Open questions for parent` instead.
 ## Rules
@@ -90,6 +90,7 @@ Before declaring the round done, run the [`simplify`](../simplify/SKILL.md) skil
 ## Anti-patterns
+- Horizontal slicing — all tests first, then all implementation. See *Slice vertically, one test at a time* above; it's the failure mode this skill most guards against.
 - Writing tests after the code "to get coverage" — defeats the design feedback loop.
 - Mocking everything — tests pass but the system is broken. Mock at boundaries only (external APIs, time, randomness).
 - One giant test asserting many things — failures become hard to diagnose.

package/skills/tdd-rounds/COMMITS.md CHANGED Viewed

@@ -38,10 +38,10 @@ End-of-round refactoring (extracting helpers, removing dead branches, tightening
 ### 5. Doc-only commits stay separate from code
-`docs/STATE.md`, `docs/known-issues.md`, ADR patches — never bundle into a code commit. Mixing makes both diffs harder to read. Doc-only commits get a `docs:` prefix (no `R<N>:`):
+`docs/features/<name>.state.md`, `docs/known-issues.md`, ADR patches — never bundle into a code commit. Mixing makes both diffs harder to read. Doc-only commits get a `docs:` prefix (no `R<N>:`):
 ```
-docs: append actual R14 entry to STATE.md
+docs: append actual R14 entry to <name>.state.md
 docs: add public README
 ```
@@ -55,7 +55,7 @@ If you can't justify it, don't lump it.
 ### 7. Commit messages are honest
-The commit message states what the commit actually does, not what you wished it did. The lesson from this skill's source project: an R14 commit message claimed "STATE.md updated" but the Builder had forgotten that file — a follow-up commit was needed to restore truth. **Cost of an honest message: one extra line of typing. Cost of a lying message: a follow-up commit, a smudged history, and a lost minute the next time someone wonders why the round entry is missing.**
+The commit message states what the commit actually does, not what you wished it did. The lesson from this skill's source project: an R14 commit message claimed "state file updated" but the Builder had forgotten that file — a follow-up commit was needed to restore truth. **Cost of an honest message: one extra line of typing. Cost of a lying message: a follow-up commit, a smudged history, and a lost minute the next time someone wonders why the round entry is missing.**
 If a step you described in the brief didn't actually land, say so in `Deviations`. If a commit's diff doesn't match its message, fix the message before committing — or split into two commits if the message's intent really was two things.
@@ -109,7 +109,7 @@ Use whatever model line matches the actual driver. Match exactly the convention
 ## Anti-patterns
 - **Mono-commit at end of round.** "R<N>: round done." Defeats the per-slice review story.
-- **Bundling docs with code.** STATE.md updates ride along with the code change. Diffs become harder to read.
+- **Bundling docs with code.** State-file updates ride along with the code change. Diffs become harder to read.
 - **Imperative-mood squashing.** Four sequential "wip" commits eventually amended into one. The journey isn't the commit; the destination is. Don't force-amend; commit cleanly the first time.
 - **Empty bodies on non-trivial commits.** "fix bug" with no body. The body is where the *why* lives.
 - **Lying about scope.** A commit message saying "also updates X" when the diff has no X. Worse than no message.

package/skills/tdd-rounds/SKILL.md CHANGED Viewed

@@ -1,6 +1,6 @@
 ---
 name: tdd-rounds
-description: Multi-round TDD orchestration. Use when delivering a feature larger than one TDD slice — typically 5-15 acceptance criteria across multiple packages — by dispatching Builder sub-agents per round, with the parent maintaining state and verifying. Triggered when the user mentions "drive the sub-agent team", "multi-round TDD", "orchestrate rounds", "Builder agents", or when a plan from `feature-doc` lists more ACs than one agent should reasonably tackle in a single invocation. Pairs with `tdd` (Builders invoke that skill per round) and `prod-ready` (final round before tag).
+description: Multi-round TDD orchestration. Use when delivering a feature larger than one TDD slice — typically 5-15 acceptance criteria across multiple packages — by dispatching Builder sub-agents per round, with the parent maintaining state and verifying. Triggered when the user mentions "drive the sub-agent team", "multi-round TDD", "orchestrate rounds", "Builder agents", or when a plan from `feature-doc` lists more ACs than one agent should reasonably tackle in a single invocation. Skip for a single-AC bug fix or single-package feature — invoke `tdd` directly, with no round overhead. Pairs with `tdd` (Builders invoke that skill per round) and `prod-ready` (final round before tag).
 ---
 # Multi-Round TDD Orchestration
@@ -23,8 +23,8 @@ Distills the pattern of a parent agent driving Builder sub-agents through a feat
 - **Dispatch on a feature branch, never `main`.** Verify the current branch before dispatching the first Builder; if it's `main` / `master`, create `feat/<short-name>` first. The whole multi-round delivery lives on one feature branch (or a stack of feature branches), merged to `main` only at the end.
 - **Plan once.** The plan file lists rounds, each round's ACs, the dependency order, and the skills cadence per round (which rounds need `design`; when to invoke `improve-codebase-architecture` mid-project; when to invoke `prod-ready`).
 - **Brief per round.** A self-contained brief — the Builder shouldn't need conversation history.
-  - **Briefing Sub-agent**: For complex rounds, invoke a sub-agent to autonomously generate the brief by analyzing `docs/STATE.md`, the `feature-doc`, and the results of the previous round. See `templates/builder-brief.md` for the schema.
-- **Verify after each round.** Run the test command independently (don't trust the Builder's pasted output), read the diff, tick AC checkboxes in the feature doc with the test name, append a round summary to `docs/STATE.md`.
+  - **Briefing Sub-agent**: For complex rounds, invoke a sub-agent to autonomously generate the brief by analyzing the feature's `docs/features/<name>.state.md`, the `feature-doc`, and the results of the previous round. See `templates/builder-brief.md` for the schema.
+- **Verify after each round.** Run the test command independently (don't trust the Builder's pasted output), read the diff, tick AC checkboxes in the feature doc with the test name, append a round summary to `docs/features/<name>.state.md`.
 - **Never write code yourself.** If you find yourself doing it directly under time pressure, that's a signal the round was misscoped — split it.
 ## The Builder's contract
@@ -32,7 +32,7 @@ Distills the pattern of a parent agent driving Builder sub-agents through a feat
 When dispatched for a round, the Builder:
 0. **Pre-flight.** Verifies the current branch is **not** `main` / `master`. If it is, refuses immediately and returns a blocking report — does not proceed to step 1.
-1. Reads the brief, the plan file, `docs/STATE.md`, and any ADRs / feature docs the brief cites.
+1. Reads the brief, the plan file, the feature's `docs/features/<name>.state.md`, and any ADRs / feature docs the brief cites.
 2. **Executes the listed skills sequentially in this single invocation.** When a brief says "Skills (in order): design, tdd, simplify" — that means run all three in this run, not return to the parent between them. This rule is non-obvious; the brief template makes it explicit.
 3. Commits per AC slice (or per behavior slice for refactor rounds), prefix `R<N>:`. **Read [`COMMITS.md`](COMMITS.md) before the first commit** — it captures the seven rules (`R<N>:` prefix, `#<X>` for bug fixes, per-AC slicing, separate simplify-pass commit, separate doc commits, single-commit-with-justification, honest messages) and the message-body shape. The parent reads commits as the review surface; a clean sequence is reviewable one diff at a time, a mono-commit isn't.
 4. Returns the structured report from `templates/builder-report.md`.
@@ -44,18 +44,28 @@ When dispatched for a round, the Builder:
 |---|---|---|
 | [`design`](../design/SKILL.md) | Builder | Rounds introducing a new package or non-trivial interface. Skip on rounds that only extend existing modules. |
 | [`tdd`](../tdd/SKILL.md) | Builder | **Every** code-writing round. Mandatory. |
-| [`simplify`](../simplify/SKILL.md) | Builder | End of every round, after green. Single sweep — reuse / quality / efficiency / test relevance. Lands as its own commit ([COMMITS.md rule 4](COMMITS.md)). |
+| [`simplify`](../simplify/SKILL.md) | Builder | End of every round, after green. Single sweep — reuse / quality / efficiency / test relevance / telemetry. Lands as its own commit ([COMMITS.md rule 4](COMMITS.md)). |
 | [`improve-codebase-architecture`](../improve-codebase-architecture/SKILL.md) | **Parent** | Once mid-project, after the load-bearing seams exist. Surfaced opportunities become dedicated refactor rounds (no behavior change; ACs are "all existing tests still green"). |
 | [`prod-ready`](../prod-ready/SKILL.md) | Builder, final round only | Pre-tag operational checklist. Output the filled checklist verbatim in the Builder's report. |
 | [`verify-real-deps`](../verify-real-deps/SKILL.md) | **Parent** | After `prod-ready` clean, before tagging. Catches what unit/integration tests can't see — wire-shape mismatches against real third-party APIs. |
 ## State tracking
-`docs/STATE.md` is the parent-maintained running summary, append-only, ~one entry per round. The Builder reads it as pre-flight context for the round; the parent appends after each round closes. Single source of truth for "what we have so far" between Builder invocations.
+Each feature has one state file, `docs/features/<name>.state.md` — a sibling to its contract (`<name>.md`) and design note (`<name>.design.md`). It is the parent-maintained running summary, append-only, ~one `## Round N` section per round. The Builder reads it as pre-flight context for the round; the parent appends after each round closes. Single source of truth for "what we have so far" between Builder invocations. There is no global `docs/STATE.md`; `docs/index.md` is the feature manifest (per [ADR-0001](../../docs/adr/0001-distributed-state.md)).
-Format per round:
+It is a produced doc, so it opens with OKF frontmatter (`type: state`, per [`../formats/OKF.md`](../formats/OKF.md) §2), then one section per round:
 ```
+---
+type: state
+title: <Feature Name> — state
+description: Round-by-round state for the <name> feature.
+tags: [<name>, state]
+timestamp: YYYY-MM-DD
+---
+# <Feature Name> — state
 ## Round N — <title> (DONE YYYY-MM-DD)
 Delivered: AC-NN, AC-NN
 New packages: <list>
@@ -87,6 +97,22 @@ Known follow-ups: <one line>
 - [`COMMITS.md`](COMMITS.md) — commit cadence and message style (per-AC slicing, `R<N>:` prefix, when single-commit is OK, honesty rule). Builders read this before the first commit.
+## Pairing with other skills
+- **[`feature-doc`](../feature-doc/SKILL.md)** — runs *before*. Its AC list is what splits into rounds.
+- **[`tdd`](../tdd/SKILL.md)** — *invoked* by every code-writing round (mandatory).
+- **[`design`](../design/SKILL.md)**, **[`simplify`](../simplify/SKILL.md)**, **[`prod-ready`](../prod-ready/SKILL.md)**, **[`verify-real-deps`](../verify-real-deps/SKILL.md)** — *invoked* per the Skills-cadence table above.
+- **[`improve-codebase-architecture`](../improve-codebase-architecture/SKILL.md)** — run *once by the parent* mid-project, after the load-bearing seams exist.
+- **[`grill-plan`](../grill-plan/SKILL.md)** — *escalation* when an ADR wobbles mid-round.
+## Done when
+- Every AC in the feature doc is ticked with the name of the test that proves it.
+- `docs/features/<name>.state.md` has one append-only `## Round N` section per round.
+- The test suite was green at the end of **every** round (parent-verified independently), not only at the end.
+- The final round ran `prod-ready`; `verify-real-deps` ran before the tag; `docs/known-issues.md` has zero `Open` entries (or explicit deferred-by-design ones).
+- The whole delivery lived on a feature branch and merged to `main` only at the end.
 ## Handoff
 When the final round completes:

package/skills/tdd-rounds/templates/builder-brief.md CHANGED Viewed

@@ -1,6 +1,6 @@
 # ROUND N — <title>
-You are a Builder sub-agent in a multi-round TDD project. Read the orchestration plan and `docs/STATE.md` first. **Execute design + tdd + simplify in THIS invocation. Do not stop after design.**
+You are a Builder sub-agent in a multi-round TDD project. Read the orchestration plan and the feature's `docs/features/<name>.state.md` first. **Execute design + tdd + simplify in THIS invocation. Do not stop after design.**
 ## Scope
@@ -34,18 +34,18 @@ You are a Builder sub-agent in a multi-round TDD project. Read the orchestration
 ## Files you must NOT touch
 - closed packages: `<list>`
-- `cli/`, `creds/`, `.claude/`, `docs/STATE.md` (parent owns)
+- `cli/`, `creds/`, `.claude/`, `docs/features/<name>.state.md` (parent owns)
 - ...
 ## Skills (in order — execute ALL in this invocation)
 1. **`design`** — REQUIRED if introducing a new package or non-trivial interface. Decisions to make: <list>.
 2. **`tdd`** — REQUIRED. Vertical slicing per AC; commit prefix `R<N>:`. Read [`tdd-rounds/COMMITS.md`](../COMMITS.md) before the first commit — it captures the per-AC slicing rule, the simplify-pass-gets-its-own-commit rule, the doc-commits-stay-separate rule, when single-commit is justified, and the message-body shape.
-3. **[`simplify`](../../simplify/SKILL.md)** — REQUIRED end-of-round. Walk every changed file with four lenses: reuse, quality, efficiency, test relevance. Land as its own commit (`R<N>: simplify — <summary>`).
+3. **[`simplify`](../../simplify/SKILL.md)** — REQUIRED end-of-round. Walk every changed file with five lenses: reuse, quality, efficiency, test relevance, telemetry. Land as its own commit (`R<N>: simplify — <summary>`).
 ## Pre-flight reading (in order)
-1. `docs/STATE.md` — what previous rounds delivered.
+1. `docs/features/<name>.state.md` — what previous rounds delivered.
 2. `docs/features/<feature>.md` — the AC contract.
 3. The ADRs cited above.
 4. <specific files the Builder needs to read before starting>

package/skills/verify-real-deps/SKILL.md CHANGED Viewed

@@ -1,6 +1,6 @@
 ---
 name: verify-real-deps
-description: Pre-tag smoke test against real third-party APIs. Use after `prod-ready` is clean, before tagging vN.0 — the gate that catches wire-shape mismatches that fakes accept but real upstreams reject. Triggered when the user mentions "smoke test", "real API", "live verify", "before tag", or "end-to-end against actual <vendor>". Pairs with `prod-ready` (which catches ops/infra issues tests miss) and `tdd-rounds` (the orchestration that feeds into this gate).
+description: Pre-tag smoke test against real third-party APIs. Use after `prod-ready` is clean, before tagging vN.0 — the gate that catches wire-shape mismatches that fakes accept but real upstreams reject. Triggered when the user mentions "smoke test", "real API", "live verify", "before tag", or "end-to-end against actual <vendor>". Skip when there's no third-party API integration, or the "real" upstream is a staging service you fully control. Pairs with `prod-ready` (which catches ops/infra issues tests miss) and `tdd-rounds` (the orchestration that feeds into this gate).
 ---
 # Verify Against Real Dependencies
@@ -107,10 +107,24 @@ git push --tags
 - [`templates/known-issues.md`](templates/known-issues.md) — the bug-ledger format.
+## Pairing with other skills
+- **[`prod-ready`](../prod-ready/SKILL.md)** — runs *before*. This gate fires only once `prod-ready` is clean; it catches the wire-shape class `prod-ready` can't see.
+- **[`tdd-rounds`](../tdd-rounds/SKILL.md)** — the *orchestration* that feeds this gate; each ledger entry becomes a fix-round brief the parent dispatches.
+- **[`tdd`](../tdd/SKILL.md)** — *invoked* inside each fix-round: reproduce the surprise as a failing test before fixing.
+## Done when
+- The representative end-to-end flow ran against the **real** upstream with real credentials.
+- Every surprise is captured in `docs/known-issues.md` with all six fields.
+- The bug ledger has zero `Open` entries (or only explicit deferred-by-design ones with a tracking line).
+- For each fixed bug, the fake / mock was tightened to reject the wrong shape too.
+- A verification entry was appended to `docs/features/<name>.state.md`, and the release is tagged.
 ## Handoff
 When the bug ledger is clean and the same end-to-end flow runs without surprise:
-- Append a verification entry to `docs/STATE.md`: `## End-to-end smoke test (DONE YYYY-MM-DD)` with what was run and what was found.
+- Append a verification entry to the feature's `docs/features/<name>.state.md`: `## End-to-end smoke test (DONE YYYY-MM-DD)` with what was run and what was found.
 - Tag the release.
 - The bug ledger stays in `docs/known-issues.md` as a permanent post-mortem record. Future contributors read it to understand "what surprises lurk in this codebase that the test suite doesn't show."

package/skills/verify-real-deps/templates/known-issues.md CHANGED Viewed

@@ -1,56 +1,61 @@
 ---
 type: known-issues
-title: "Known issues — post-vN.0 smoke test findings"
-description: Bugs surfaced by the first end-to-end run against the real upstream.
-tags: [smoke-test, known-issues]
+title: "Known issues"
+description: <one sentence — what this ledger tracks>
+tags: [known-issues]
 timestamp: YYYY-MM-DD
 ---
-<!-- Frontmatter is OKF per skills/formats/OKF.md (`type: known-issues`).
-     `timestamp` tracks the last edit; the Discovered line below pins the first run. -->
+<!-- Frontmatter is OKF per skills/formats/OKF.md (`type: known-issues`). One per repo: docs/known-issues.md. -->
-# Known issues — post-vN.0 smoke test findings
+# Known issues
-**Discovered:** YYYY-MM-DD (first end-to-end run against real <upstream>).
-**Status:** <N> open / <M> closed.
+This file has **two modes** — use whichever fits (a repo may use both, under separate headings). Delete the mode you don't need.
-These bugs slipped past unit/integration tests because the test harness accepted whatever wire shape the code sent. Real upstream enforces stricter contracts and surfaced them. Each entry doubles as the canonical brief for its fix-round.
+- **A — Smoke-test bug ledger** (`verify-real-deps`): bugs the first real-upstream run surfaced that tests missed. Each entry doubles as the brief for its fix-round.
+- **B — Accepted-but-unimplemented follow-ups** (`tdd-rounds`, deferrals): work that's agreed but not done yet — e.g. an ADR accepted but not implemented. Each entry names the artifact that *claims* it's done, the real state, and the next step.
 ---
-## #N — <one-line title>
+## A — Smoke-test bug ledger
+**Discovered:** YYYY-MM-DD (first end-to-end run against real <upstream>).
+**Status:** <N> open / <M> closed.
+### #N — <one-line title>
 **Severity:** High | Medium | Low — <one-line reason based on user impact>.
-**Status:** Open | Closed in R<N> (commit <hash>). Tests:
-`TestNameOne`,
-`TestNameTwo`.
+**Status:** Open | Closed in R<N> (commit <hash>). Tests: `TestOne`, `TestTwo`.
 **Reproduction:**
 1. <numbered step a stranger could follow>
-2. <step>
-3. <expected vs observed>
+2. <expected vs observed>
-**Root cause:** <one paragraph — why the test passed but the real API didn't. Cite the test harness's permissive behavior, the real API's documented or observed contract, and the gap between them.>
+**Root cause:** <one paragraph — why the test passed but the real API didn't: the harness's permissive behavior, the real contract, the gap between them.>
 **Files:**
-- `<package>/<file>.go` (the function that mishandled the shape)
-- `<test-harness>/<fake>.go` (the fake that accepted the wrong shape — fix this too to prevent regression)
+- `<package>/<file>` — the function that mishandled the shape.
+- `<harness>/<fake>` — the fake that accepted the wrong shape (fix this too, or the test layer regresses).
-**Fix sketch:**
-<one paragraph specific enough to scope a Builder brief. Mention the precise wire shape change, the function signature change if any, and the new test that pins the real-API contract.>
+**Fix sketch:** <one paragraph specific enough to scope a Builder brief.>
 ---
-(repeat per issue)
+## B — Accepted-but-unimplemented follow-ups
+### <short title>
+- **Claimed done by:** <artifact — e.g. ADR-NNNN, a feature doc>.
+- **Actual state:** <what's really true — e.g. "decided, not yet implemented">.
+- **Next step:** <the concrete action, and what unblocks it>.
+_Write `_No open follow-ups._` when the list is empty._
 ---
-## Fix plan
+## Fix plan (mode A)
-**R<N> round (DONE YYYY-MM-DD):** issues #X-#Y closed in <K> dedicated commits + a simplify pass.
-**R<N+1> round (PENDING):** issue #Z.
+**R<N> (DONE YYYY-MM-DD):** issues #X–#Y closed in <K> commits + a simplify pass.
+**R<N+1> (PENDING):** issue #Z.
-After all open issues are closed:
-- Append an end-to-end verification entry to `docs/STATE.md`.
-- Tag vN.0.
-- Leave this file in place as a permanent post-mortem record.
+After all open issues close: append an end-to-end verification entry to `docs/features/<name>.state.md`, tag vN.0, and leave this file as a permanent post-mortem record.

package/skills/zoom-out/SKILL.md CHANGED Viewed

@@ -1,6 +1,6 @@
 ---
 name: zoom-out
-description: User-invoked utility — pulls the agent up an abstraction layer when the user is lost in unfamiliar code. Produces a map of relevant modules, callers, and seams in `docs/CONTEXT.md` vocabulary. Use when the user says "I'm lost", "zoom out", "give me higher-level context", "I don't know this area", "what depends on what here", or invokes the slash command. Does not change which workflow the user is in — interrupts to orient, then hands back. Skip when the user already has the map and just needs to read code.
+description: User-invoked utility — pulls the agent up an abstraction layer when the user is lost in unfamiliar code. Produces a map of relevant modules, callers, and seams in `docs/CONTEXT.md` vocabulary. Use when the user says "I'm lost", "zoom out", "give me higher-level context", "I don't know this area", "what depends on what here", or invokes the slash command. Does not change which workflow the user is in — interrupts to orient, then hands back. Skip when the user already has the map and just needs to read code. Runs before `debug` or `improve-codebase-architecture` when the area is unfamiliar.
 disable-model-invocation: true
 ---

/package/skills/{design → formats}/OBSERVABILITY.md RENAMED Viewed

File without changes

/package/skills/{design → formats}/PERSONAS.md RENAMED Viewed

File without changes