npm - @alexandrealvaro/agentic - Versions diffs - 0.15.2-beta.1 → 0.16.0-beta.1 - Mend

@alexandrealvaro/agentic 0.15.2-beta.1 → 0.16.0-beta.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (52) hide show

package/README.md +40 -44
package/WORKFLOW.md +83 -36
package/package.json +1 -1
package/src/commands/init.js +3 -0
package/src/lib/profiles.js +20 -4
package/src/skills/claude-code/ad-architecture/SKILL.md +1 -8
package/src/skills/claude-code/ad-archive/SKILL.md +117 -0
package/src/skills/claude-code/ad-audit/SKILL.md +13 -3
package/src/skills/claude-code/ad-bootstrap/SKILL.md +44 -7
package/src/skills/claude-code/ad-commit/SKILL.md +7 -7
package/src/skills/claude-code/ad-deepen/SKILL.md +5 -5
package/src/skills/claude-code/ad-diagnose/SKILL.md +2 -2
package/src/skills/claude-code/ad-domain/SKILL.md +4 -4
package/src/skills/claude-code/ad-grill/SKILL.md +3 -3
package/src/skills/claude-code/ad-ground/SKILL.md +11 -7
package/src/skills/claude-code/ad-guidelines/SKILL.md +200 -0
package/src/skills/claude-code/ad-merge/SKILL.md +6 -6
package/src/skills/claude-code/ad-next/SKILL.md +21 -14
package/src/skills/claude-code/ad-philosophy/SKILL.md +6 -3
package/src/skills/claude-code/ad-pr/SKILL.md +6 -6
package/src/skills/claude-code/ad-prd/SKILL.md +115 -0
package/src/skills/claude-code/ad-spec/SKILL.md +16 -15
package/src/skills/claude-code/ad-spike/SKILL.md +4 -4
package/src/skills/claude-code/ad-tdd/SKILL.md +117 -0
package/src/skills/claude-code/ad-tdg/SKILL.md +1 -1
package/src/skills/codex/ad-architecture/SKILL.md +1 -8
package/src/skills/codex/ad-archive/SKILL.md +98 -0
package/src/skills/codex/ad-archive/agents/openai.yaml +5 -0
package/src/skills/codex/ad-audit/SKILL.md +9 -2
package/src/skills/codex/ad-bootstrap/SKILL.md +44 -7
package/src/skills/codex/ad-commit/SKILL.md +6 -6
package/src/skills/codex/ad-deepen/SKILL.md +5 -5
package/src/skills/codex/ad-diagnose/SKILL.md +2 -2
package/src/skills/codex/ad-domain/SKILL.md +3 -3
package/src/skills/codex/ad-grill/SKILL.md +2 -2
package/src/skills/codex/ad-ground/SKILL.md +10 -6
package/src/skills/codex/ad-guidelines/SKILL.md +114 -0
package/src/skills/codex/ad-guidelines/agents/openai.yaml +5 -0
package/src/skills/codex/ad-merge/SKILL.md +4 -4
package/src/skills/codex/ad-next/SKILL.md +18 -13
package/src/skills/codex/ad-philosophy/SKILL.md +6 -3
package/src/skills/codex/ad-pr/SKILL.md +5 -5
package/src/skills/codex/ad-prd/SKILL.md +81 -0
package/src/skills/codex/ad-prd/agents/openai.yaml +5 -0
package/src/skills/codex/ad-spec/SKILL.md +13 -13
package/src/skills/codex/ad-spike/SKILL.md +3 -3
package/src/skills/codex/ad-tdd/SKILL.md +86 -0
package/src/skills/codex/ad-tdd/agents/openai.yaml +5 -0
package/templates/agents-project.md +1 -6
package/templates/architecture.md +0 -7
package/templates/guidelines.md +388 -0
package/templates/prd.md +73 -0

package/README.md CHANGED Viewed

@@ -4,7 +4,7 @@ A starter kit for engineering production code with LLMs. Lean templates and init
 **The framing.** An LLM is the super-soldier serum; the engineer is Steve Rogers. The serum amplifies what the engineer already brings — solid bases, investigation, care for quality, architecture, clean code, observability, maintainability. The kit encodes those bases as skills, ADRs, and gates so the amplification compounds in the right direction. See [WORKFLOW.md](WORKFLOW.md) for the principles.
-The CLI installs nineteen universal skills at the default `team` profile (`ad-bootstrap`, `ad-philosophy`, `ad-architecture`, `ad-adr`, `ad-spec`, `ad-task`, `ad-audit`, `ad-review`, `ad-ground`, `ad-next`, `ad-spike`, `ad-tdg`, `ad-domain`, `ad-grill`, `ad-deepen`, `ad-diagnose`, `ad-commit`, `ad-pr`, `ad-merge`) plus four conditional ones (`ad-design` for frontend, `ad-subagent` for Claude Code, `ad-skill` opt-in, `ad-hooks` opt-in / recommended at `mature`) into the agent's native location. Lower profiles install fewer (`poc` = 9 universals, `solo` = 16, `team` / `mature` = 19; `ad-deepen` is excluded from `poc` / `solo` per [ADR-0020](doc/adr/0020-deep-modules-vocabulary.md) §4; `ad-commit` / `ad-pr` / `ad-merge` are excluded from `poc` per [ADR-0023](doc/adr/0023-agentic-commit-skill.md) / [ADR-0024](doc/adr/0024-agentic-pr-skill.md) / [ADR-0025](doc/adr/0025-agentic-merge-skill.md)). Each skill produces its artifact or runs its operation via the agent's native conversational UI; `agentic update` keeps installed skills in sync with upstream kit changes via a state-aware three-way diff. Report rough edges via [GitHub Issues](https://github.com/alexandremendoncaalvaro/agentic-development/issues); current releases live under [GitHub Releases](https://github.com/alexandremendoncaalvaro/agentic-development/releases).
+The CLI installs a universal skill set at the default `team` profile plus four conditional skills (`ad-design` for frontend, `ad-subagent` for Claude Code, `ad-skill` opt-in, `ad-hooks` opt-in / recommended at `mature`) into the agent's native location. The full skill table is below; profile counts and exclusions are summarized under [Project maturity profiles](#project-maturity-profiles). Each skill produces its artifact or runs its operation via the agent's native conversational UI; `agentic update` keeps installed skills in sync with upstream kit changes via a state-aware three-way diff. Report rough edges via [GitHub Issues](https://github.com/alexandremendoncaalvaro/agentic-development/issues); releases live under [GitHub Releases](https://github.com/alexandremendoncaalvaro/agentic-development/releases).
 ## Prerequisites
@@ -33,22 +33,25 @@ Two categories ([ADR-0007](doc/adr/0007-workflow-operational-skills.md)) and two
 | `ad-bootstrap` | spec-driven | universal | Scans the repo, writes `AGENTS.md` ≤150 lines | `/ad-bootstrap` |
 | `ad-architecture` | spec-driven | universal | Scans the code, writes `ARCHITECTURE.md` | `/ad-architecture` |
 | `ad-adr` | spec-driven | universal | Drafts `doc/adr/NNNN-<slug>.md` from the conversation | `/ad-adr` |
-| `ad-spec` | spec-driven | universal | Drafts `doc/specs/NNNN-<slug>.md` — feature-level spec (User Scenarios, Requirements, Success Criteria) layer 3 of the five-layer stack | `/ad-spec` |
+| `ad-prd` | spec-driven | universal in `solo` / `team` / `mature` | Drafts or updates `doc/product/PRD.md` (or per-product `<slug>.md`) — Layer 3, product-level scope (target user, problem, success metrics, multi-feature roadmap) feature specs inherit from. Distinct from `ad-spec` (feature-level). Lazy lifecycle. | `/ad-prd` |
+| `ad-guidelines` | spec-driven | universal in `solo` / `team` / `mature` | Drafts or updates `GUIDELINES.md` — the project's full engineering reference (Clean Architecture, SOLID, Object Calisthenics loose/moderate/strict per Bay 2008, code standards, complexity discipline, testing strategy, security). Layer 1 Constitution trinity member alongside `WORKFLOW.md` and `AGENTS.md`. Scan-first; pre-suggested defaults from canon. Lazy lifecycle. | `/ad-guidelines` |
+| `ad-spec` | spec-driven | universal | Drafts `doc/specs/NNNN-<slug>.md` — feature-level spec (User Scenarios, Requirements, Success Criteria) Layer 4 of the six-layer stack; references parent PRD for product-scope inheritance | `/ad-spec` |
 | `ad-task` | spec-driven | universal | Drafts `doc/tasks/NNNN-<slug>.md` (checkbox + Notes format; carries `Spec ref` to link the implementing spec) | `/ad-task` |
 | `ad-audit` | spec-driven | universal | Read-only drift report (AGENTS.md / ARCHITECTURE.md / ADRs) | `/ad-audit` |
 | `ad-philosophy` | workflow-operational | universal | Universal agent guardrails — auto-loads on non-trivial work | implicit |
 | `ad-review` | workflow-operational | universal | Fresh-context code review per WORKFLOW §10; structured findings, no "approve" | `/ad-review <range>` |
-| `ad-ground` | workflow-operational | universal | Four-source pre-implementation research (docs / OSS / in-repo / git history) + happy-path synthesis + deviation gate per WORKFLOW §4 + §5 | `/ad-ground` |
-| `ad-next` | workflow-operational | universal | State-aware navigation aid (`flutter doctor` pattern) — surveys the five-layer artifact stack and recommends prioritized next actions; complements `ad-audit` (drift) | `/ad-next` |
+| `ad-ground` | workflow-operational | universal | Four-source pre-implementation research (docs / impl-refs / in-repo / git history) + happy-path synthesis + deviation gate per WORKFLOW §4 + §5 | `/ad-ground` |
+| `ad-next` | workflow-operational | universal | State-aware navigation aid (`flutter doctor` pattern) — surveys the six-layer artifact stack and recommends prioritized next actions; complements `ad-audit` (drift) | `/ad-next` |
 | `ad-spike` | workflow-operational | universal | Staged spike with golden fixtures per WORKFLOW §14, for cases where the *technique* is uncertain across multiple plausible approaches; produces `spikes/NNNN-<slug>/` with discovery + fixture + pipeline-with-gates + two-layer evaluation | `/ad-spike` |
 | `ad-tdg` | workflow-operational | universal | Outcome-based prompting per WORKFLOW §9 — ground truth pair + Test Dependency Map + three approaches + single-criterion selection, for cases where the technique is known but the implementation strategy is uncertain | `/ad-tdg` |
+| `ad-tdd` | workflow-operational | universal | Test-Driven Development per WORKFLOW §16 — red-green-refactor as deterministic LLM guardrail. Five phases — confirm regime, plan vertically, tracer bullet, incremental loop, refactor while green. Distinct from `ad-tdg`; routes to it for strategy selection inside the GREEN phase | `/ad-tdd` |
 | `ad-domain` | spec-driven | universal | Lazy lifecycle owner of `CONTEXT.md` (Layer 2 — ubiquitous language per Evans 2003); single-context or `CONTEXT-MAP.md` multi-context | `/ad-domain` |
 | `ad-grill` | workflow-operational | universal | Interview-before-research grilling — one question at a time with recommendation, codebase-first, sharpens vocabulary against `CONTEXT.md`, captures terms via `ad-domain` and decisions via `ad-adr` (three-criteria rule); upstream of `ad-ground` | `/ad-grill` |
 | `ad-deepen` | workflow-operational | universal in `team` + `mature` only | Surface deepening opportunities using WORKFLOW §8 vocabulary (Module / Interface / Depth / Seam / Adapter / Leverage / Locality); three phases — explore, present numbered candidates with deletion-test framing, grill the chosen one; pairs with `ad-audit` | `/ad-deepen` |
 | `ad-diagnose` | workflow-operational | universal | Disciplined diagnosis loop for hard bugs and performance regressions per WORKFLOW §15; five phases — build a feedback loop (the skill itself), reproduce, hypothesise (3-5 ranked falsifiable), instrument (one variable at a time), fix + regression-test | `/ad-diagnose` |
-| `ad-commit` | workflow-operational | universal in `solo` / `team` / `mature` | Atomic Conventional Commits with DCO `Signed-off-by` sign-off per ADR-0023; four phases — scope intake, stage-split when concerns mix, draft message, sign + write. Identity from `git config`. No `Co-Authored-By`. Helper, not blocker | `/ad-commit` |
-| `ad-pr` | workflow-operational | universal in `solo` / `team` / `mature` | Open a GitHub PR with a uniform body shape (Summary / Test plan / Links) per ADR-0024; four phases — preflight (`gh` auth + branch pushed), scope assembly, draft body, open + report URL. Title format = Conventional Commits | `/ad-pr` |
-| `ad-merge` | workflow-operational | universal in `solo` / `team` / `mature` | Evaluate (CI / fresh-context review / linked task / unresolved comments / mergeability) and merge a GitHub PR per ADR-0025; CI green = hard gate (yields to explicit user override); others = warnings. Merge mode auto-detected from `gh repo view`; `--delete-branch` by default | `/ad-merge` |
+| `ad-commit` | workflow-operational | universal in `solo` / `team` / `mature` | Atomic Conventional Commits with DCO `Signed-off-by` sign-off; four phases — scope intake, stage-split when concerns mix, draft message, sign + write. Identity from `git config`. No `Co-Authored-By`. Helper, not blocker | `/ad-commit` |
+| `ad-pr` | workflow-operational | universal in `solo` / `team` / `mature` | Open a GitHub PR with a uniform body shape (Summary / Test plan / Links); four phases — preflight (`gh` auth + branch pushed), scope assembly, draft body, open + report URL. Title format = Conventional Commits | `/ad-pr` |
+| `ad-merge` | workflow-operational | universal in `solo` / `team` / `mature` | Evaluate (CI / fresh-context review / linked task / unresolved comments / mergeability) and merge a GitHub PR; CI green = hard gate (yields to explicit user override); others = warnings. Merge mode auto-detected from `gh repo view`; `--delete-branch` by default | `/ad-merge` |
 | `ad-design` | spec-driven | auto if frontend detected | Bootstrap `DESIGN.md` from existing tokens (Figma, tailwind.config, tokens.json, CSS custom props) | `/ad-design` |
 | `ad-subagent` | spec-driven | auto if installing for Claude Code | Drafts `.claude/agents/<name>.md` (Claude Code only — Codex has no subagent primitive) | `/ad-subagent` |
 | `ad-skill` | spec-driven | opt-in only | Drafts a new Claude Code or Codex skill at the appropriate path | `/ad-skill` |
@@ -58,27 +61,16 @@ A short TUI shows the detected mode, agent, and feature signals (frontend / `.cl
 If your project already has an `AGENTS.md` (or `CLAUDE.md`), the installer appends a managed `Skills installed by agentic` section bracketed by `<!-- agentic-managed-skills:start -->` / `:end -->` markers. User content outside those markers is byte-preserved; re-runs update only the managed block.
-### v0.15 bundle
-The four skills accepted by ADR-0019 / 0020 / 0021 / 0022 (the mattpocock-absorption Phase-2 set) shipped together in v0.15.0-beta.1. Closes [task-0020](doc/tasks/0020-mattpocock-absorptions.md) Phase 2 in a single release rather than the originally-planned per-minor stack (v0.15 → v0.18). Rationale: 3 of 4 skills are direct mirrors of mature mattpocock prior art; bundling kept the WORKFLOW §15 / §8 / Layer-2 deltas coherent in one ship.
-| Skill | ADR | Operationalizes |
-| --- | --- | --- |
-| `ad-domain` | [ADR-0019](doc/adr/0019-domain-language-layer.md) | Layer 2 (ubiquitous language) — lazy `CONTEXT.md` lifecycle |
-| `ad-grill` | [ADR-0022](doc/adr/0022-agentic-grill-skill.md) | Interview-before-research, upstream of `ad-ground` |
-| `ad-deepen` | [ADR-0020](doc/adr/0020-deep-modules-vocabulary.md) | WORKFLOW §8 vocabulary applied to refactor proposals (team + mature only) |
-| `ad-diagnose` | [ADR-0021](doc/adr/0021-diagnose-discipline.md) | WORKFLOW §15 five-phase debugging |
 ## Project maturity profiles
 The kit ships four profiles that select which skills auto-install. Same WORKFLOW principles bind every profile; only the artifact set scales.
 | Profile | Universal install set | Conditional posture | Recommended for |
 | --- | --- | --- | --- |
-| `poc` | 9 — philosophy, ground, audit, next, spike, tdg, domain, grill, diagnose | all blocked | spike, hackathon, exploration |
-| `solo` | 16 — + bootstrap, spec, task, review, commit, pr, merge | architecture / adr / hooks opt-in; design auto if frontend; subagent auto for Claude Code | solo developer shipping a real product |
-| `team` (default) | 19 — + architecture, adr, deepen | hooks opt-in; design / subagent / skill follow autoIf | team product, shared discipline |
-| `mature` | 19 — same as team | hooks **recommended**; deepening surfaced via `ad-deepen` | regulated / public-facing production |
+| `poc` | 10 — philosophy, ground, audit, next, spike, tdg, tdd, domain, grill, diagnose | all blocked; `ad-prd` + `ad-guidelines` blocked | spike, hackathon, exploration |
+| `solo` | 19 — + bootstrap, prd, guidelines, spec, task, review, commit, pr, merge | architecture / adr / hooks opt-in; design auto if frontend; subagent auto for Claude Code | solo developer shipping a real product |
+| `team` (default) | 22 — + architecture, adr, deepen | hooks opt-in; design / subagent / skill follow autoIf | team product, shared discipline |
+| `mature` | 22 — same as team | hooks **recommended**; deepening surfaced via `ad-deepen` | regulated / public-facing production |
 Select at init time:
@@ -96,8 +88,6 @@ npx @alexandrealvaro/agentic@beta profile list        # list all profiles
 `profile set` runs the equivalent of `update` after writing the new profile, so the install set matches. The state-aware three-way diff prompts before overwriting user-edited files.
-Existing v0.7 installs with no profile field migrate to `team` automatically — same install set as before, no user action needed.
 ## Updating an existing project
 To pull upstream kit changes into a project that already has agentic skills installed:
@@ -130,8 +120,6 @@ Useful flags:
 * `--agent claude-code | codex | both` — restrict the update to one agent.
 * `--yes` — non-interactive, accepts defaults (skip on conflict, keep orphans). Combine with `--force` if you want overwrites in CI.
-If the project was installed with a kit version older than v0.3 (no state file present), the first `update` falls back to today's byte-compare behavior, then writes the state file so subsequent runs use the three-way diff.
 The `ad-review` skill writes the assembled WORKFLOW §10 handoff to `.agentic/reviews/<ISO-timestamp>-<scope>.md` before delegating to the fresh-context reviewer (Claude Code) or before instructing you to `/clear` and paste (Codex). These files are ephemeral audit artifacts — add `.agentic/reviews/` to your `.gitignore`.
 For persistent install:
@@ -143,21 +131,23 @@ agentic init
 ## Recommended daily sequence
-The kit ships nineteen universal skills in the full `team` / `mature` install set; `ad-deepen` is excluded from `poc` and `solo` per [ADR-0020](doc/adr/0020-deep-modules-vocabulary.md) §4, and `ad-commit` / `ad-pr` / `ad-merge` are excluded from `poc` per [ADR-0023](doc/adr/0023-agentic-commit-skill.md) / [ADR-0024](doc/adr/0024-agentic-pr-skill.md) / [ADR-0025](doc/adr/0025-agentic-merge-skill.md) (concrete counts: `poc` = 9 universals, `solo` = 16, `team` / `mature` = 19). Plus four conditional skills (`ad-design`, `ad-subagent`, `ad-skill`, `ad-hooks`). The sequence below is a happy path through them for the three flows that cover most daily work. Skip steps that don't apply; the kit never enforces order.
+The sequence below is a happy path for the three flows that cover most daily work. Skip steps that don't apply; the kit never enforces order. Profile-specific install sets and conditional skills are summarized under [Project maturity profiles](#project-maturity-profiles).
 **Greenfield project, first non-trivial feature:**
 1. `agentic init` — install skills.
-2. `/ad-bootstrap` — produce `AGENTS.md` (operational guide).
-3. `/ad-architecture` — produce `ARCHITECTURE.md` once load-bearing patterns emerge.
-4. `/ad-grill` — interview-before-research when the ask is fuzzy; resolves vocabulary into `CONTEXT.md` via `/ad-domain` as terms surface.
-5. `/ad-spec` — feature-level spec at `doc/specs/NNNN-<slug>.md` (User Scenarios, Requirements, Success Criteria).
-6. `/ad-adr` — only when the feature forces a binding architectural decision worth recording for posterity (three-criteria rule: hard to reverse, surprising without context, real trade-off).
-7. `/ad-task` — work-unit decomposition; reference the spec via `Spec ref`.
-8. `/ad-ground` — four-source research before code (`ad-philosophy` auto-loads in parallel).
-9. Implement.
-10. `/ad-review main..HEAD` — fresh-context §10 review before merge.
-11. `/ad-audit` — periodic drift check across operational docs, specs, and the `CONTEXT.md` glossary.
+2. `/ad-bootstrap` — produce `AGENTS.md` (distilled session-load rules).
+3. `/ad-guidelines` — produce `GUIDELINES.md` (full engineering reference: Clean Architecture, SOLID, Object Calisthenics tier, testing strategy, security). Excluded from `poc`. After writing, refresh `AGENTS.md` so its engineering sections point to `GUIDELINES.md`.
+4. `/ad-architecture` — produce `ARCHITECTURE.md` once load-bearing patterns emerge.
+5. `/ad-prd` — scope the product (target user, problem, success metrics, multi-feature roadmap) at `doc/product/PRD.md`. Excluded from `poc` profile.
+6. `/ad-grill` — interview-before-research when the ask is fuzzy; resolves vocabulary into `CONTEXT.md` via `/ad-domain` as terms surface.
+7. `/ad-spec` — feature-level spec at `doc/specs/NNNN-<slug>.md`; inherits target user, success metrics, and constraints from the PRD.
+8. `/ad-adr` — only when the feature forces a binding architectural decision worth recording for posterity (three-criteria rule: hard to reverse, surprising without context, real trade-off).
+9. `/ad-task` — work-unit decomposition; reference the spec via `Spec ref`.
+10. `/ad-ground` — four-source research before code (`ad-philosophy` auto-loads in parallel).
+11. Implement — `/ad-tdd` when the behavior is test-expressible up front (red-green-refactor); `/ad-tdg` when the technique is known but the implementation strategy is uncertain.
+12. `/ad-review main..HEAD` — fresh-context §10 review before merge.
+13. `/ad-audit` — periodic drift check across operational docs, PRD, guidelines, specs, and the `CONTEXT.md` glossary.
 **Brownfield project, quick fix:**
@@ -170,17 +160,19 @@ The kit ships nineteen universal skills in the full `team` / `mature` install se
 1. `/ad-ground` — runs the four-source research pass and surfaces the happy path with citations.
 2. Decide whether the answer becomes a spec (`/ad-spec`) or a one-off task (`/ad-task`).
-3. Continue from step 6 of the greenfield flow.
+3. Continue from step 8 of the greenfield flow.
-The kit's discipline scales with the project's maturity. A solo PoC may legitimately skip `/ad-spec` and `/ad-adr` (the WORKFLOW §1 prune principle applies — don't add an artifact that wouldn't change agent behavior). A team product running on this kit is expected to use the full sequence and additionally invoke `/ad-hooks` once to scaffold the deterministic gates per WORKFLOW §11 (pre-commit lint / format / secret-scan; pre-push build / unit / integration). Project maturity profiles that automate the recommendation by stack are deferred — see the next planned release.
+The kit's discipline scales with the project's maturity. A solo PoC may legitimately skip `/ad-spec` and `/ad-adr` (the WORKFLOW §1 prune principle applies — don't add an artifact that wouldn't change agent behavior). A team product running on this kit is expected to use the full sequence and additionally invoke `/ad-hooks` once to scaffold the deterministic gates per WORKFLOW §11 (pre-commit lint / format / secret-scan; pre-push build / unit / integration).
-**Lost mid-flow?** Invoke `/ad-next` at any time to survey the project's state across the five-layer artifact stack (Constitution → Domain → Spec → Plan/Decisions → Code) and get prioritized next-action recommendations. Read-only; complements `/ad-audit` (drift detection — different question).
+**Lost mid-flow?** Invoke `/ad-next` at any time to survey the project's state across the six-layer artifact stack (Constitution → Domain → Product → Spec → Plan/Decisions → Code) and get prioritized next-action recommendations. Read-only; complements `/ad-audit` (drift detection — different question).
 **Technique uncertain across multiple plausible approaches?** Invoke `/ad-spike` (per WORKFLOW §14) when the spec is clear but the *how* is unknown — library choice, multi-stage transformation, novel domain. The skill scaffolds a staged spike with golden fixtures + per-stage debug artifacts + two-layer evaluation under `spikes/NNNN-<slug>/`. The directory is throwaway by design; conclude with `/ad-adr` and delete.
 **Technique known but implementation strategy uncertain?** Invoke `/ad-tdg` (per WORKFLOW §9) when multiple algorithms could produce the expected output with different trade-offs along readability / performance / testability. The skill forces a ground-truth pair, lists the tests covering the file (Test Dependency Map), generates three implementation candidates, and commits to one by a single named criterion — refusing the "optimize for all three at once" failure mode. No file written; the verified implementation is the artifact, with the candidate set + criterion landing in the commit message body.
-**Question is fuzzy?** Invoke `/ad-grill` (per [ADR-0022](doc/adr/0022-agentic-grill-skill.md)) before research. One question at a time with a recommended answer; codebase-first when the answer is in code; sharpens vocabulary against `CONTEXT.md`. Routes to `/ad-ground` (research-ready), `/ad-tdg` (implement-ready), `/ad-spike` (technique-uncertain), or `/ad-diagnose` (it turned out to be a bug) when the question is sharp.
+**Behavior is test-expressible up front?** Invoke `/ad-tdd` (per WORKFLOW §16) when the change has a clear behavior to express as a test — a new API capability, a bug-driven regression test, a refactor with a known contract. Red-green-refactor as deterministic LLM guardrail: one test → red → minimum code → green → repeat. Horizontal-slicing anti-pattern (bulk-write tests, then bulk-write code) is rejected. Distinct from `/ad-tdg`; route to `/ad-tdg` inside the GREEN phase when the strategy for that test cycle is uncertain.
+**Question is fuzzy?** Invoke `/ad-grill` before research. One question at a time with a recommended answer; codebase-first when the answer is in code; sharpens vocabulary against `CONTEXT.md`. Routes to `/ad-ground` (research-ready), `/ad-tdg` (implement-ready), `/ad-spike` (technique-uncertain), or `/ad-diagnose` (it turned out to be a bug) when the question is sharp.
 **Bug or performance regression?** Invoke `/ad-diagnose` (per WORKFLOW §15). Five phases — build a feedback loop (the skill itself), reproduce, hypothesise (3-5 ranked falsifiable), instrument (one variable at a time), fix + regression-test. The loop is the skill; everything else is mechanical.
@@ -219,9 +211,13 @@ Prompts reference templates by relative path. Two ways to give your agent access
 **Reviewing your own diff.** Run `/ad-review <range>` (e.g. `/ad-review main..HEAD` or `/ad-review PR#42`). The skill assembles the diff, the relevant spec slice (`AGENTS.md`, applicable ADRs, the task's Acceptance Criteria), and delegates to a fresh-context reviewer subagent — no inherited bias from the session that wrote the code (WORKFLOW §10). Returns structured findings grouped Blocker / Concern / Note. Codex variant uses `/clear` + paste handoff since Codex has no subagent primitive.
-**Researching before implementation.** Run `/ad-ground` (or let it auto-trigger on non-trivial work). The skill runs a four-source research pass — official docs, validated open-source examples, in-repo patterns, and git history — synthesizes a happy path with citations from each source, and gates any deviation behind an irrefutable justification before code is written (WORKFLOW §4 + §5). Output is the input to whatever produces the implementation plan; the skill does not write code.
+**Researching before implementation.** Run `/ad-ground` (or let it auto-trigger on non-trivial work). The skill runs a four-source research pass — official docs, validated implementation references (open-source repos, Stack Overflow / forum answers, blog posts, gists), in-repo patterns, and git history — synthesizes a happy path with citations from each source, and gates any deviation behind an irrefutable justification before code is written (WORKFLOW §4 + §5). Output is the input to whatever produces the implementation plan; the skill does not write code.
+**Scoping a product.** Run `/ad-prd` when you are scoping a product (not a single feature) — target user, problem, success metrics across multiple features, roadmap, cross-feature constraints. The skill scaffolds `doc/product/PRD.md` (single-product) or `doc/product/<slug>.md` plus `doc/product/PRODUCT-MAP.md` (multi-product). Layer 3 of the six-layer stack; feature specs (`/ad-spec`) reference back to it for product-scope inheritance. Excluded from `poc` profile.
+**Defining engineering standards.** Run `/ad-guidelines` to write `GUIDELINES.md` — the project's full engineering reference: Clean Architecture binding, SOLID, Object Calisthenics tier (loose / moderate / strict per Bay 2008), naming conventions, error handling, complexity discipline, testing strategy, security policy. Scan-first — reads the language toolchain, existing test/lint/format config, and existing `AGENTS.md` engineering sections; pre-fills detected fields; asks only the genuine gaps and preference questions. Layer 1 Constitution trinity member alongside `WORKFLOW.md` (kit-shipped philosophy) and `AGENTS.md` (distilled session-load rules). Excluded from `poc` profile.
-**Specifying a feature.** Run `/ad-spec` (or `/ad-spec` with a feature name). The skill scaffolds `doc/specs/NNNN-<slug>.md` with industry-aligned mandatory sections (User Scenarios, Functional / Non-functional Requirements, Success Criteria, Edge Cases, Out of Scope, Open Questions, Related). Specs are layer 3 of the five-layer artifact stack — Constitution → Domain → Spec → Plan/Decisions → Code. One spec per feature; multiple tasks (`/ad-task`) implement one spec; the task template carries a `Spec ref` field linking back to the spec.
+**Specifying a feature.** Run `/ad-spec` (or `/ad-spec` with a feature name). The skill scaffolds `doc/specs/NNNN-<slug>.md` with industry-aligned mandatory sections (User Scenarios, Functional / Non-functional Requirements, Success Criteria, Edge Cases, Out of Scope, Open Questions, Related). Specs are Layer 4 of the six-layer artifact stack — Constitution → Domain → Product → Spec → Plan/Decisions → Code. One spec per feature; multiple tasks (`/ad-task`) implement one spec; the task template carries a `Spec ref` field linking back to the spec; the spec references its parent PRD for product-scope inheritance.
 **Project already built with agents.** Treat missing artifacts as brownfield (run the relevant skill) and existing artifacts as audit (`/ad-audit`).
@@ -266,7 +262,7 @@ node bin/agentic.js init
 Branch layout:
 - `main` — manual workflow source of truth (no CLI code; the npm package gets promoted here when mature).
-- `cli` — CLI development (you're here). Beta releases are published from this branch.
+- `cli` — CLI development. Beta releases are published from this branch.
 ## License

package/WORKFLOW.md CHANGED Viewed

@@ -28,27 +28,29 @@ What to keep in mind:
 14. **Autonomy requires observability.** If the agent makes decisions, log the trajectory: tool calls, intermediate outputs, failures.
 15. **Staged spikes when the technique is uncertain.** When the *how* is unknown — a library choice, a CV technique, a multi-stage transformation — break the problem into staged spikes against golden fixtures with per-stage debug artifacts.
 16. **Diagnose with discipline.** For hard bugs and performance regressions: build a fast, deterministic feedback loop *before* hypothesising. Reproduce, then generate three to five ranked falsifiable hypotheses, then change one variable at a time. The feedback loop is the skill; everything else is mechanical.
-17. **One task per session.** Context decays as it fills. Reset and reload only what the next task needs rather than extending a long conversation across many features. Smaller, deliberately-loaded contexts beat large, accreted ones.
-18. **Slice vertically, not horizontally.** Decompose a spec into thin end-to-end paths through every layer (schema, API, UI, tests) rather than one layer at a time. Each slice ships demonstrable behavior; horizontal layer-stacks ship nothing on their own.
-19. **Discipline scales with project maturity.** Same principles bind every project; the artifact set scales. A spike runs posture + research + audit; a regulated product adds spec / ADR / hooks / evals. Add ceremony only where it changes agent behavior; configure at init and reconfigure as the project matures.
+17. **TDD as deterministic guardrail.** When behavior is test-expressible up front: tracer-bullet test → minimum code → green → refactor. One test at a time; tests verify behavior through public interfaces, not implementation. Pairs with TDG inside the GREEN phase of any cycle where multiple implementation strategies are plausible.
+18. **One task per session.** Reset rather than extend; reload only what the next task needs.
+19. **Slice vertically, not horizontally.** Decompose a spec into thin end-to-end paths through every layer (schema, API, UI, tests) rather than one layer at a time. Each slice ships demonstrable behavior; horizontal layer-stacks ship nothing on their own.
+20. **Discipline scales with project maturity.** Same principles bind every project; the artifact set scales. A spike runs posture + research + audit; a regulated product adds spec / ADR / hooks / evals. Add ceremony only where it changes agent behavior; configure at init and reconfigure as the project matures.
 > Working with agents means trading typing for technical direction. The value is in giving the right context, setting boundaries, validating the result, and keeping "almost right" out of production.
 ## 1. Spec-Driven Design
-Define the rules before the agent writes a line. The temptation is to dump everything into `AGENTS.md` and hope it works — but bloat causes the model to ignore the file. Keep one topic per Markdown file: lean and focused.
+Define the rules before the agent writes a line. The temptation is to dump everything into `AGENTS.md` and hope it works — but bloat causes the model to ignore the file. One topic per file.
 There are two complementary frames for the artifacts the kit produces. The first is **purpose** — what each artifact is *for*. The second is **loading mechanism** — when each artifact reaches the agent's context.
-### Five-layer artifact stack (purpose)
+### Six-layer artifact stack (purpose)
-1. **Constitution** — `AGENTS.md` (operational guide) and `WORKFLOW.md` (engineering philosophy). Tells the agent how the project works and how the team thinks. Read every session.
-2. **Domain** — `CONTEXT.md` at the repo root (or `CONTEXT-MAP.md` plus per-context `CONTEXT.md` files for multi-context repos). The project's ubiquitous language: canonical nouns, the aliases to avoid, the relationships between them, and the ambiguities that have already been resolved. Direct application of Domain-Driven Design (Evans, 2003) — when an agent and a human share the project's vocabulary, the agent uses fewer tokens to say more, and the code, tests, and conversation all converge on the same names. Created lazily — first term resolved triggers the file. ADR-0019 records the layer.
-3. **Spec** — `doc/specs/NNNN-<slug>.md`. Feature-level requirements: who the feature is for, what it must do, the measurable success criteria, the explicit non-goals. One spec per feature; multiple tasks implement one spec; ADRs may be driven by spec constraints. Industry-aligned with [GitHub Spec Kit](https://github.com/github/spec-kit).
-4. **Plan / Decisions** — `ARCHITECTURE.md` (system patterns and boundaries), `doc/adr/NNNN-*.md` (binding architectural decisions in Michael Nygard's pattern), `doc/tasks/NNNN-*.md` (per-work-unit plan with checkbox acceptance criteria). The *how* of building what the spec asked for.
-5. **Code** — the implementation. Code is the primary documentation of behavior; comments justify non-obvious choices.
+1. **Constitution** — `WORKFLOW.md` (universal engineering philosophy, kit-shipped), `AGENTS.md` (project-specific compressed rules read every session), and `GUIDELINES.md` (project-specific full engineering reference: Clean Architecture binding, SOLID application, Object Calisthenics tier, code standards, complexity discipline, API rules, performance standards, build system, static analysis, quality gates, testing strategy, git workflow, documentation, security). The trinity answers "how this project is built" at three compression levels — principles, distilled rules, full reference. `AGENTS.md` and `GUIDELINES.md` are project-owned and lazy; `WORKFLOW.md` is kit-shipped.
+2. **Domain** — `CONTEXT.md` at the repo root (or `CONTEXT-MAP.md` plus per-context `CONTEXT.md` files for multi-context repos). The project's ubiquitous language: canonical nouns, the aliases to avoid, the relationships between them, and the ambiguities that have already been resolved. Direct application of Domain-Driven Design (Evans, 2003) — when an agent and a human share the project's vocabulary, the agent uses fewer tokens to say more, and the code, tests, and conversation all converge on the same names. Created lazily — first term resolved triggers the file.
+3. **Product** — `doc/product/PRD.md` (single-product) or `doc/product/<slug>.md` plus `doc/product/PRODUCT-MAP.md` (multi-product). Product-level scope: target user, problem, goals, non-goals, success metrics, multi-feature roadmap, cross-feature constraints. One PRD per product; feature specs (Layer 4) inherit target user, success metrics, and constraints from the PRD. Created lazily when a product is being scoped.
+4. **Spec** — `doc/specs/NNNN-<slug>.md`. Feature-level requirements: who the feature is for, what it must do, the measurable success criteria, the explicit non-goals. One spec per feature; multiple tasks implement one spec; specs reference their parent PRD for product-scope inheritance. Industry-aligned with [GitHub Spec Kit](https://github.com/github/spec-kit).
+5. **Plan / Decisions** — `ARCHITECTURE.md` (system patterns and boundaries), `doc/adr/NNNN-*.md` (binding architectural decisions in Michael Nygard's pattern), `doc/tasks/NNNN-*.md` (per-work-unit plan with checkbox acceptance criteria). The *how* of building what the spec asked for. Layer 5 spans three document roles — `ARCHITECTURE.md` is definition, ADRs are decision-record, tasks are tracking — write each per its role, not as a single class.
+6. **Code** — the implementation. Code is the primary documentation of behavior; comments justify non-obvious choices.
-The five layers scale with project maturity (TL;DR #19 — Discipline scales). A spike or PoC profile may legitimately ship only Layers 1, 2, and 5 — adding Layers 3 and 4 to a 200-line experiment is ceremony that does not change agent behavior. (Domain — Layer 2 — earns its keep even at PoC because vocabulary drift starts on day one.) A team or regulated product runs all five. The kit's profiles (`poc`, `solo`, `team`, `mature`) configure which layers auto-install per project and are changeable as the project matures; the principles in this document bind every profile, only the artifact set differs.
+The six layers scale with project maturity (TL;DR #20 — Discipline scales). A spike or PoC profile may legitimately ship only Layers 1, 2, and 6 — adding Layers 3, 4, and 5 to a 200-line experiment is ceremony that does not change agent behavior. (Domain — Layer 2 — earns its keep even at PoC because vocabulary drift starts on day one.) A team or regulated product runs all six. The kit's profiles (`poc`, `solo`, `team`, `mature`) configure which layers auto-install per project and are changeable as the project matures; the principles in this document bind every profile, only the artifact set differs.
 ### Three context types (loading mechanism)
@@ -72,18 +74,22 @@ Comments are exceptions. They justify *why* a non-obvious choice was made — ne
 ### Documentation Discipline
-The agent's authoritative copy of the eight-rule documentation discipline lives in the `ad-philosophy` skill (`Documentation Discipline` section). The rules are summarized below for reference; the skill carries the full text agents read at session time. ADR-0008 records the canonical decision and the reconciliations against ADR-0004 (file-based task tracking) and ADR-0005 (universal agent behavior as a skill).
+The rules below are canonical.
 1. **Definitions and decisions only.** No speculation, history, or unfounded plans.
-2. **No dates, version stamps, `DRAFT` markers, or changelogs in narrative documents.** Decision-record artifacts under `doc/adr/`, `doc/tasks/`, `doc/specs/` are exempt — their lifecycle fields are the auditability primitive.
+2. **No dates, version stamps, `DRAFT` markers, or changelogs in narrative documents.** Decision-record artifacts under `doc/adr/`, `doc/tasks/`, `doc/specs/`, `doc/product/` are exempt — their lifecycle fields are the auditability primitive.
 3. **No emoji anywhere.**
 4. **Business context first.**
 5. **One scope per document. No duplication.**
 6. **Code is the primary documentation of behavior.**
 7. **No commented-out code; no orphan `TODO` / `FIXME` in source.** Every deferred item references a GitHub Issue or a `doc/tasks/NNNN-*.md` task.
 8. **Tests are living documentation of behavior.**
+9. **Single responsibility per document, named by layer.** Each document plays exactly one role — **definition** (pillar Layers 1, 2, 3 plus `ARCHITECTURE.md`; read-mostly after defined; no per-item tracking UI), **decision-record** (ADRs, Specs; single `Status:` field; mostly immutable after acceptance), or **tracking** (Tasks; full checkbox / append-only-Notes UI is their job). A document does not take on adjacent layers' responsibilities.
+10. **Each layer owns its directory index. No duplication across docs.** `doc/adr/` is the canonical ADR index; `doc/tasks/`, `doc/specs/`, `doc/product/` likewise own their layers. Other documents do not list / digest / re-state these indices.
+11. **Cross-references must be load-bearing.** If you can delete the reference and the surrounding statement still stands, the reference was decoration — drop it.
+12. **Universal-vs-kit-state separation.** `WORKFLOW.md` ships to downstream projects and carries universal principles only — it does not cite kit-specific ADR numbers. The kit's adoption of each principle is recorded in `doc/adr/` (kit-internal). Literature citations remain (they are universal load-bearing references).
-The skill body explains the rationale per rule, lists the failure modes the rules counter (bloated `AGENTS.md`, README pages drifting into changelogs, decision artifacts diluted by speculation), and walks through the reconciliations. Generator skills (`ad-bootstrap`, `ad-architecture`, `ad-spec`, `ad-task`, `ad-adr`, `ad-design`) reject violations of these rules at write time; `ad-audit` flags drift across narrative docs and decision-record artifacts on demand.
+The twelve rules above are the authoritative Documentation Discipline contract. They counter recurring failure modes: session-load files bloated past relevance, README pages drifting into changelogs, decision artifacts diluted by speculation, definition documents accumulating per-item tracking UI, and pillar documents duplicating adjacent layers' indices.
 ## 3. Format by Evidence
@@ -96,21 +102,19 @@ Structure reduces ambiguity, but format isn't magic. Pick the right one for the
 Use XML when the prompt mixes instructions, retrieved context, examples, user input, and expected output — the separation pays off when there's noise to fight. Skip it for simple prompts; if Markdown headings or plain text are clear enough, use them.
-No format is universally best. **An observation from my practice, not benchmarked:** I've seen consistent gains when shifting prompts to XML — most noticeably with autonomous agents, where the prompt has to land alone without conversational refinement. Direct interactive use (Claude Code, Codex) tolerates loose Markdown; unattended agents don't. Claude in particular seems to respond well to XML, which I attribute to its training, but I haven't benchmarked it. Treat this as a starting hypothesis worth testing on your own target model and task before standardizing.
+No format is universally best. XML separation pays off most for autonomous agents, where the prompt has to land alone without conversational refinement; interactive use (Claude Code, Codex) tolerates loose Markdown. Claude appears to respond well to XML, plausibly an artifact of training. Treat this as a working hypothesis worth testing on your own target model and task before standardizing.
-**Host-aware structured prompts.** Hosts that expose structured-prompt primitives — Claude Code's `AskUserQuestion` (multi-choice cards) and Plan Mode (plan-approval cards) — reduce ambiguity at confirmation gates more reliably than inline text. Prefer the structured primitive when the host supports it; fall back to numbered text otherwise. Codex has no equivalent today; its skills stay on numbered text. Skills carrying confirmation gates or multi-choice interview steps prescribe this preference (ADR-0014).
+**Host-aware structured prompts.** Hosts that expose structured-prompt primitives — Claude Code's `AskUserQuestion` (multi-choice cards) and Plan Mode (plan-approval cards) — reduce ambiguity at confirmation gates more reliably than inline text. Prefer the structured primitive when the host supports it; fall back to numbered text otherwise. Codex has no equivalent today; its skills stay on numbered text.
 ## 4–5. Research Before Implementation
-Combines Find the Happy Path (canonical / idiomatic baseline) and Ground in Real Patterns (anchoring in project-specific examples). The kit treats both as one indivisible flow via `ad-ground`; two prose sections would frame one operation as two separate practices.
-Two sub-practices, joined into one indivisible pass.
+Two sub-practices, joined into one indivisible pass: find the canonical baseline (Happy Path) and anchor it in project-specific examples (Ground in Real Patterns).
 **Find the happy path.** Before implementing, ask: *"What is the canonical, idiomatic way to implement [X] in [stack]? Cite official docs. List common deviations and why people take them."* Mid-implementation: *"Are we still on the happy path? If we deviated, was it deliberate?"* Sometimes you can't follow the happy path — that's fine. Always know where it is and why you left it.
 **Ground in real patterns.** Don't dump the codebase into context. Anchor the model in a specific, project-relevant example: *"Find an existing example of [similar feature]; use that exact structure."* Cite specific files, not "the codebase." Use just-in-time retrieval — pass paths or IDs and let the agent fetch via tools.
-The kit ships `ad-ground` as the workflow-operational implementation of both. It runs a four-source research pass — official docs, validated open-source examples, in-repo patterns, git history — joined by AND not OR, synthesizes the happy path with citations from each source, and gates any deviation behind an irrefutable justification before code is written. Splitting the two sub-practices into separate skills would force two invocations with overlapping research outputs and fragment the synthesis context (ADR-0010).
+A research pass joining four sources — official docs, validated implementation references (public repos, Q&A forums, blog posts, gists), in-repo patterns, git history — by AND not OR, synthesizing the happy path with citations from each source, and gating any deviation behind an irrefutable justification before code is written, is the operational shape.
 ## 6. Explore → Plan → Implement → Commit
@@ -142,7 +146,7 @@ The agent will follow what's specified and invent what isn't. Prefer specifying.
 ### Architectural vocabulary
-Architectural drift accelerates with the agent's typing speed; the counter is shared vocabulary that names the shapes that matter. The kit adopts the canonical terms from John Ousterhout's *A Philosophy of Software Design* (2018) and Michael Feathers's *Working Effectively with Legacy Code* (2004), and uses them in `ARCHITECTURE.md`, ADRs, and architecture-touching skills (ADR-0020).
+Architectural drift accelerates with the agent's typing speed; the counter is shared vocabulary that names the shapes that matter. The canonical terms come from John Ousterhout's *A Philosophy of Software Design* (2018) and Michael Feathers's *Working Effectively with Legacy Code* (2004).
 - **Module** — anything with an interface and an implementation; deliberately scale-agnostic (function, class, package, vertical slice).
 - **Interface** — everything a caller must know to use the module correctly: types, invariants, ordering constraints, error modes, configuration, performance characteristics. *Not* just the type signature.
@@ -159,7 +163,7 @@ Three principles fall out of those terms:
 - **The interface is the test surface.** Callers and tests cross the same seam. If you want to test *past* the interface, the module is probably the wrong shape.
 - **One adapter is a hypothetical seam; two adapters make it real.** Don't introduce a seam unless something actually varies across it.
-Skills that touch architecture (`ad-architecture`, `ad-adr`, the planned `ad-deepen`) use these terms verbatim, so suggestions and reviews land in a single language.
+Architectural skills should use these terms verbatim, so suggestions and reviews land in a single language.
 ## 9. Outcome-Based Prompting (TDG)
@@ -168,7 +172,7 @@ Give the finish line first, not the path:
 1. **Ground truth.** Raw input plus exact expected output.
 2. **Command the implementation.** The algorithm that connects the two.
 3. **Iterate by criterion.** Ask for three approaches; pick by *one* explicit criterion (readability, performance, *or* testability — not all three at once).
-4. **Test Dependency Map, not procedural TDD.** Don't tell the agent "do TDD" — tell it *which* tests cover the file. *"Before modifying X.ts, list which tests cover it. Run. Modify. Run. If none cover it, write one first."*
+4. **Test Dependency Map as pre-flight.** Before any change, tell the agent *which* tests cover the file. *"Before modifying X.ts, list which tests cover it. Run. Modify. Run. If none cover it, write one first."* TDM is orthogonal to the implementation regime: it pairs with TDG (this section) when the implementation strategy is the uncertain axis, and with TDD (§16) when the behavior is test-expressible up front. TDM rejects cargo-cult test-first ceremony, not test-first development itself.
 ## 10. Reviewer With Fresh Context
@@ -192,7 +196,7 @@ In Claude Code, this means a subagent (the `Task` tool, or a custom `.claude/age
 Modern agents handle most routine implementation. The work has shifted to catching what they got wrong.
-Two 2025 industry surveys point at the same wall. JetBrains' DevEcosystem 2025 reports that only **44%** of developers have AI fully or partially integrated into their workflow. Stack Overflow's 2025 Developer Survey adds: **66%** of developers cite "AI solutions that are almost right, but not quite" as their top frustration, and **45%** say debugging AI-generated code is more time-consuming.
+Industry data underlines the wall. Recent JetBrains and Stack Overflow developer surveys show a majority frustration with "AI solutions that are almost right, but not quite," and a near-majority report that debugging AI-generated code costs more time than writing it from scratch. See Sources for the surveys.
 The takeaway: §10 (Reviewer) and §11 (Quality Gates) are not optional. Skipping them is where bug density grows.
@@ -226,11 +230,9 @@ The flow has four parts:
 **When to use it:** the unknown is *how* — a library choice, a CV technique, a multi-stage transformation. Skip it when the *how* is routine.
-This is a combination of established practices, not new terminology: spike (XP), golden datasets, stage-segmented error analysis, trajectory evaluation, and visual debugging in CV pipelines.
 ## 15. Diagnose With Discipline
-For hard bugs and performance regressions, the failure mode is jumping to hypotheses before there is a way to check them. The discipline below is the counter; it owes its shape to standard debugging practice (Kernighan & Pike, *The Practice of Programming*, 1999) and the kit ships `ad-diagnose` as the operational implementation (ADR-0021).
+For hard bugs and performance regressions, the failure mode is jumping to hypotheses before there is a way to check them. The discipline below is the counter, grounded in standard debugging practice (Kernighan & Pike, *The Practice of Programming*).
 ### Phase 1 — Build a feedback loop
@@ -275,23 +277,61 @@ Each probe maps to a specific prediction from Phase 3. Change one variable at a
 Apply the fix. Re-run the Phase-1 loop and confirm the captured symptom is gone. Promote the loop's check into a permanent test that lives next to the code so the same failure mode cannot return silently.
----
+## 16. Test-Driven Development (TDD)
-These are starting points. Prune what doesn't fit your codebase.
+TDG (§9) gives the agent the finish line and asks it to find the path. TDD asks the agent to express one behavior as a test, drive the minimum implementation that makes the test pass, then deepen. The two are distinct LLM disciplines; the Test Dependency Map (§9, item 4) is a pre-flight that pairs with either.
+TDD is the cleanest **deterministic guardrail** when the change has a clear behavior to express up front: a failing test is unambiguous, so the "almost right but not quite" failure mode (§12) cannot ship silently. **Good tests read like a specification** — *"user can checkout with a valid cart"* tells you the capability exists. **Bad tests couple to implementation** — mock internal collaborators, assert on private state, or query the database directly when the public interface is what the caller uses. A test that breaks on a rename but not on a behavior change was testing implementation, not behavior.
+### Phase 1 — Plan vertically
+Before writing a test or code:
+1. Read `CONTEXT.md` so test names and interface vocabulary land in the project's ubiquitous language.
+2. Confirm the public interface — what does the caller need to know? Types, ordering, error modes. Interface design *is* testability design.
+3. Identify deepening opportunities (Layer 2 vocabulary per §8): small interface, deep implementation.
+4. List the behaviors to test, not the implementation steps. Pick the **first** behavior — the one that proves end-to-end the path works. The rest defer until the tracer bullet lands.
+5. Establish the green baseline. For existing code, list the tests already covering the surface (TDM, §9.4) and run them. For new code, write the first test fresh.
+6. Get the user's approval on the plan in one sentence before any test or code is written.
+### Phase 2 — Tracer bullet
+Write ONE test that confirms ONE behavior through the public interface. `RED → GREEN → end-to-end path proven`. The fail-reason matters: a test failing because the function is undefined is not the same as a test failing because the assertion is wrong; only the latter proves the test verifies behavior rather than the existence of a symbol.
+Do not write a second test until this one is green.
+### Phase 3 — Incremental loop
-## How this guide was built
+For each remaining behavior: `RED → minimum code → GREEN`. Three rules:
-This is not theory I read and copied. Most of the practices here come from years of shipping production code, with and without LLMs.
+- **One test at a time.** Bulk-writing all tests first then all implementation is *horizontal slicing* — the named anti-pattern. Bulk-written tests verify *imagined* behavior, not actual behavior; the suite becomes insensitive to real changes and you outrun your headlights, committing to test structure before understanding the implementation.
+- **Only enough code to pass the current test.** Anticipating the next test bloats the implementation and couples it to assumptions not yet verified.
+- **Tests verify behavior through public interfaces.** No private-method tests, no internal-collaborator mocks, no direct database/file-system assertions when the public interface is what the caller uses. A test that breaks on a rename but not on a behavior change was testing implementation, not behavior.
-**Patterns I was already using when I drafted this guide.** Several of them I used before knowing they had established names; once the industry converged on a label, I adopted it to make the conversation easier: **Spec-Driven Design** (§1), **Docs-vs-Code separation** (§2), **pattern matching by real examples** (§5), **explicit Action Commands** (§7), **Architectural Boundaries** (§8), **Outcome-Based Prompting / TDG** (§9), the **senior-reviewer technique** (§10), and **deterministic Quality Gates** (§11).
+### Phase 4 — Refactor
-**The XML observation in §3** is also drawn from practice, not benchmarks. It's a hypothesis worth testing on your own setup, not a settled finding.
+Once all planned tests pass: extract duplication, deepen modules (move complexity behind smaller interfaces), apply SOLID where natural. Run tests after each refactor step. **Never refactor while RED** — get to green first, then refactor with the green baseline as the safety net.
-**Practices that came in through iteration on this guide.** They weren't in my original draft, but each matches a problem I'd already encountered or a habit I'd only formalized loosely: **Find the Happy Path** (§4), **Explore → Plan → Implement → Commit** (§6), **The Bottleneck Is Discrimination, Not Generation** (§12 — the 2025 industry statistics ground the principle, they didn't generate it), and **Evals for Anything Autonomous** (§13).
+The refactor phase is where deepening (§8) happens. Treat tests as a fixed contract; the implementation is free to change shape as long as the contract holds.
-**§14 (Staged Spikes With Golden Fixtures) is my own working technique.** I haven't seen it documented end-to-end as a single named pattern, but each component (spike, golden dataset, stage-segmented error analysis, trajectory evaluation, visual CV debugging) has its own lineage in the literature listed under Sources. The combination — discovery → fixture → staged pipeline with debug artifacts → two-layer evaluation — is how I attack problems where the *technique* itself is uncertain.
+### TDD vs TDG — when to use which
-**Practices added through cross-pollination.** A second pass over the guide compared this kit's coverage against [Matt Pocock's `mattpocock/skills`](https://github.com/mattpocock/skills) — a separate body of agent-engineering practice grounded in the same canonical literature (DDD, *Pragmatic Programmer*, Ousterhout, Feathers, Beck). The comparison surfaced principles that earned their place in this document on independent merits: the **Domain layer** (§1 Layer 2, ADR-0019) from DDD's ubiquitous-language pattern; the **architectural vocabulary** (§8, ADR-0020) from Ousterhout's depth and Feathers's seams; **Diagnose with discipline** (§15, ADR-0021) from standard debugging practice; **vertical slicing** and **HITL/AFK tagging** (§6) from PragProg's tracer-bullet metaphor; **AI mechanical / human judgment** (§12 second paragraph). Where Pocock's framing sharpened our own — vocabulary or principle that we already gestured at without naming — the borrowed phrasing is acknowledged inline; everything else stays kit-original.
+- **TDD** — behavior is known and test-expressible up front. The test is the contract; implementation follows.
+- **TDG** — behavior may not yet be testable as a single pair, but the outcome is known. Three candidate implementations + one criterion picks the path.
+When both apply (test-expressible AND multiple implementation strategies are plausible), use TDD as the outer loop and TDG inside the GREEN phase to select the strategy for that cycle.
+---
+These are starting points. Prune what doesn't fit your codebase.
+## Provenance
+This guide is operational practice, not theory. Most principles come from years of shipping production code, with and without LLMs; some were in use before the industry converged on labels for them, and once a label landed the kit adopted it to make the conversation easier.
+§14 (Staged Spikes With Golden Fixtures) is the author's own working technique — each component (spike, golden dataset, stage-segmented error analysis, trajectory evaluation, visual CV debugging) has its own lineage in the literature under Sources; the combination — discovery → fixture → staged pipeline with debug artifacts → two-layer evaluation — is original to this kit.
+A cross-pollination pass against [Matt Pocock's `mattpocock/skills`](https://github.com/mattpocock/skills) — a separate body of agent-engineering practice grounded in the same canonical literature (DDD, *Pragmatic Programmer*, Ousterhout, Feathers, Beck) — surfaced principles that earned their place on independent merits: the **Domain layer** (§1 Layer 2), the **architectural vocabulary** (§8), **Diagnose with discipline** (§15), **vertical slicing** and **HITL/AFK tagging** (§6), and **AI mechanical / human judgment** (§12). Where Pocock's framing sharpened our own, the borrowed phrasing is acknowledged inline; everything else stays kit-original.
 External claims (specific percentages, named frameworks) are cited under Sources. Everything else is operational guidance from practice or synthesis across that material — a working model, refined over time, not academic claim.
@@ -323,3 +363,10 @@ External claims (specific percentages, named frameworks) are cited under Sources
 **§15 — Diagnose With Discipline**
 - *The Practice of Programming* (Kernighan & Pike, 1999) — chapters on debugging and testing.
 - Falsifiability framing — Karl Popper, *The Logic of Scientific Discovery* (1959).
+**§16 — Test-Driven Development (TDD)**
+- *Test-Driven Development: By Example* (Kent Beck, 2002) — canonical red-green-refactor framing.
+- *Working Effectively with Legacy Code* (Feathers, 2004) — seams as test surfaces.
+- *Unit Testing Principles, Practices, and Patterns* (Khorikov, 2020) — behavior-vs-implementation test classification.
+- [`mattpocock/skills` engineering/tdd](https://github.com/mattpocock/skills/blob/main/skills/engineering/tdd/SKILL.md) — vertical-tracer-bullet framing adopted with attribution.

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@alexandrealvaro/agentic",
-  "version": "0.15.2-beta.1",
+  "version": "0.16.0-beta.1",
   "description": "Bootstrap and audit AGENTS.md, ARCHITECTURE.md, ADRs, skills, and subagents for engineering production code with LLMs",
   "type": "module",
   "bin": {

package/src/commands/init.js CHANGED Viewed

@@ -370,6 +370,7 @@ export async function initCommand(opts) {
       '/ad-review (WORKFLOW §10)',
       '/ad-ground (WORKFLOW §4 + §5)',
       '/ad-next (state survey + recommendations)',
+      '/ad-archive (sweep done tasks / shipped specs / superseded ADRs into git history)',
       '/ad-spike (WORKFLOW §14 — staged spike with golden fixtures)',
       '/ad-tdg (WORKFLOW §9 — outcome-based prompting + TDM)',
       '/ad-domain (CONTEXT.md — Layer 2 ubiquitous language)',
@@ -397,6 +398,8 @@ export async function initCommand(opts) {
           'ad-spec': '/ad-spec (doc/specs/)',
           'ad-task': '/ad-task',
           'ad-audit': '/ad-audit',
+          'ad-archive':
+            '/ad-archive (sweep done tasks / shipped specs / superseded ADRs into git history)',
           'ad-review': '/ad-review (WORKFLOW §10)',
           'ad-ground': '/ad-ground (WORKFLOW §4 + §5)',
           'ad-deepen': '/ad-deepen (WORKFLOW §8 — deepening opportunities)',

package/src/lib/profiles.js CHANGED Viewed

@@ -25,19 +25,23 @@ export const PROFILES = {
       'ad-ground',
       'ad-audit',
       'ad-next',
+      'ad-archive',
       'ad-spike',
       'ad-tdg',
       'ad-domain',
       'ad-grill',
       'ad-diagnose',
+      'ad-tdd',
     ],
     conditional: {
       'ad-design': 'blocked',
       'ad-subagent': 'blocked',
       'ad-skill': 'blocked',
       'ad-hooks': 'blocked',
+      'ad-prd': 'blocked',
+      'ad-guidelines': 'blocked',
     },
-    note: 'PoC / spike / experiment. Nine universals — posture (philosophy), research (ground), drift (audit), navigation (next), spike + tdg + grill + domain + diagnose process scaffolds. No mandatory artifact-producing skills (no bootstrap / spec / task / adr / architecture). Adds discipline you can grow into; never pre-imposes ceremony.',
+    note: 'PoC / spike / experiment. Ten universals — posture (philosophy), research (ground), drift (audit), navigation (next), spike + tdg + tdd + grill + domain + diagnose process scaffolds. No mandatory artifact-producing skills (no bootstrap / spec / task / adr / architecture / prd / guidelines). Adds discipline you can grow into; never pre-imposes ceremony.',
   },
   solo: {
     universal: [
@@ -45,9 +49,13 @@ export const PROFILES = {
       'ad-ground',
       'ad-audit',
       'ad-next',
+      'ad-archive',
       'ad-spike',
       'ad-tdg',
+      'ad-tdd',
       'ad-bootstrap',
+      'ad-prd',
+      'ad-guidelines',
       'ad-spec',
       'ad-task',
       'ad-review',
@@ -66,7 +74,7 @@ export const PROFILES = {
       'ad-skill': false,
       'ad-hooks': false,
     },
-    note: 'Solo developer shipping a real product. Specs and tasks are universal; ADRs and architecture are opt-in for binding decisions only.',
+    note: 'Solo developer shipping a real product. PRD, engineering guidelines, specs and tasks are universal; ADRs and architecture are opt-in for binding decisions only.',
   },
   team: {
     universal: [
@@ -74,14 +82,18 @@ export const PROFILES = {
       'ad-philosophy',
       'ad-architecture',
       'ad-adr',
+      'ad-prd',
+      'ad-guidelines',
       'ad-spec',
       'ad-task',
       'ad-audit',
       'ad-review',
       'ad-ground',
       'ad-next',
+      'ad-archive',
       'ad-spike',
       'ad-tdg',
+      'ad-tdd',
       'ad-domain',
       'ad-grill',
       'ad-deepen',
@@ -96,7 +108,7 @@ export const PROFILES = {
       'ad-skill': false,
       'ad-hooks': false,
     },
-    note: 'Team product. Full universal stack including deepening (architectural refactor surfacing) and the v0.15 grill/domain/diagnose trio. Conditional skills auto-detect by signal. This was the v0.7 default and is the migration target for existing installs.',
+    note: 'Team product. Full universal stack including PRD, engineering guidelines, TDD, deepening, and the grill/domain/diagnose trio. Conditional skills auto-detect by signal.',
   },
   mature: {
     universal: [
@@ -104,14 +116,18 @@ export const PROFILES = {
       'ad-philosophy',
       'ad-architecture',
       'ad-adr',
+      'ad-prd',
+      'ad-guidelines',
       'ad-spec',
       'ad-task',
       'ad-audit',
       'ad-review',
       'ad-ground',
       'ad-next',
+      'ad-archive',
       'ad-spike',
       'ad-tdg',
+      'ad-tdd',
       'ad-domain',
       'ad-grill',
       'ad-deepen',
@@ -126,7 +142,7 @@ export const PROFILES = {
       'ad-skill': false,
       'ad-hooks': true,
     },
-    note: 'Mature / regulated product. Recommends ad-hooks alongside the team stack for deterministic gates per WORKFLOW §11. Includes deepening + the v0.15 grill/domain/diagnose trio.',
+    note: 'Mature / regulated product. Recommends ad-hooks alongside the team stack for deterministic gates per WORKFLOW §11. Full universal stack including PRD, engineering guidelines, TDD, deepening, and the grill/domain/diagnose trio.',
   },
 };