npm - discipline-md - Versions diffs - 0.1.0 - Mend

discipline-md 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (50) hide show

package/LICENSE +21 -0
package/README.md +80 -0
package/bin/discipline.js +587 -0
package/package.json +40 -0
package/templates/.claude/settings.json +58 -0
package/templates/AGENTS.md +463 -0
package/templates/AGENT_TRACKER.md +138 -0
package/templates/API_REFERENCE.md +131 -0
package/templates/ARCHITECTURE.md +89 -0
package/templates/ASSETS.md +90 -0
package/templates/AUTONOMOUS_QUEUE.md +119 -0
package/templates/BUILD_PLAN.md +89 -0
package/templates/CHANGELOG.md +90 -0
package/templates/CLAUDE.md +89 -0
package/templates/CREDITS.md +109 -0
package/templates/DATA_MODEL.md +88 -0
package/templates/DECISIONS.md +120 -0
package/templates/DEPLOYMENT.md +342 -0
package/templates/HANDOFF.md +289 -0
package/templates/IMPROVEMENT_LOOP.md +103 -0
package/templates/INVESTIGATION.md +154 -0
package/templates/LICENSE +68 -0
package/templates/NOTICE +55 -0
package/templates/OPEN_DECISIONS.md +61 -0
package/templates/PLAYBOOK_FEEDBACK.md +87 -0
package/templates/PROJECT_CONTEXT.md +91 -0
package/templates/README.md +60 -0
package/templates/ROADMAP.md +96 -0
package/templates/SECURITY_AUDIT.md +235 -0
package/templates/SETUP.md +162 -0
package/templates/SPEC.md +105 -0
package/templates/SPEC_WORKFLOW.md +173 -0
package/templates/TODO.md +118 -0
package/templates/USAGE.md +153 -0
package/templates/VERIFICATION_GATE.md +68 -0
package/templates/agents/CROSS_REPO_SYNC.md +124 -0
package/templates/agents/DEBUGGER.md +112 -0
package/templates/agents/PLANNER.md +111 -0
package/templates/agents/README.md +64 -0
package/templates/agents/RECON.md +99 -0
package/templates/agents/SECURITY_REVIEWER.md +123 -0
package/templates/agents/SPEC_ARCHITECT.md +133 -0
package/templates/agents/STAKEHOLDER.md +197 -0
package/templates/agents/_TEMPLATE.md +116 -0
package/templates/agents/optional/ARCHITECT.md +109 -0
package/templates/agents/optional/BACKEND_IMPACT.md +108 -0
package/templates/agents/optional/DOC_AUDIT.md +108 -0
package/templates/agents/optional/FRONTEND_IMPACT.md +109 -0
package/templates/agents/optional/QUEUE_CURATOR.md +114 -0
package/templates/agents/optional/TEST_STRATEGIST.md +107 -0

package/templates/AGENT_TRACKER.md ADDED Viewed

@@ -0,0 +1,138 @@
+<!--
+  TEMPLATE: AGENT_TRACKER.md (Tier B3 — empirical agent-outcome log)
+  WHEN TO USE:
+    - In any project that delegates work to multiple model tiers (Haiku / Sonnet / Opus
+      or future equivalents) and wants tier-routing decisions backed by data, not vibes.
+    - Pairs with an AGENTS.md / PLAYBOOK_FEEDBACK.md tier-routing doc — this file is
+      the EVIDENCE; AGENTS.md is the POLICY derived from the evidence.
+  HOW TO USE:
+    1. Copy this file into the project's `docs/` directory as `AGENT_TRACKER.md`.
+    2. Replace placeholder example rows under "Entries" with real attempts as they happen.
+    3. Log immediately while context is hot — a half-day later the failure mode is
+       already vague.
+    4. Apply the Two-Failure Rule and the Three-Clean-Runs Graduation as written below.
+    5. Do NOT clean up old entries — the historical record IS the value.
+-->
+# Agent Task Tracker
+## Purpose
+Empirical log of subagent task outcomes by **(model, task-type)**. The data
+drives tier-routing decisions in `docs/AGENTS.md` (or the project's equivalent
+policy doc) instead of being made on intuition. This file is the *evidence*;
+the policy doc is the *conclusion drawn from the evidence*.
+The tracker exists because the right tier for a task is empirical, not
+predicted: a reasoning-heavy task that "should" take Opus may turn out to be
+fine on Sonnet for THIS codebase, or a "simple" recon task may keep failing
+on Haiku for THIS file layout. Without a record, every routing decision is a
+fresh argument.
+## What gets logged
+- **Every clear failure** — hallucinated paths, incorrect output format,
+  ignored constraints, fix-broke-tests, false-positive reports, runaway scope.
+  Log immediately while context is hot.
+- **Every experimental-scope task** — outcome regardless of success/failure.
+  Trials need positive evidence too, not just failure data.
+- **Frontier-tier (Opus) decision overrides** — when an architect /
+  security-reviewer / planner makes a binding recommendation and the user
+  explicitly picks something else. These accumulate qualitatively to flag
+  patterns ("the architect keeps recommending X when the user wants Y").
+- **Notable partials** — subagent shipped most of the spec but missed
+  something subtle that took coordinator follow-up.
+## What does NOT get logged
+- Routine clean runs by validated-tier models on validated-scope tasks.
+  Voluminous, no decision value. Absence of failures IS the data.
+- Coordinator (host) actions and decisions — those live in commit messages
+  and PR diffs, not here. This file tracks SUBAGENT outcomes.
+- Subjective preferences ("I'd have phrased it differently"). Only objective
+  failure modes — code that broke, output that contradicted constraints,
+  paths that didn't exist.
+## Schema
+```
+- YYYY-MM-DD — model — task-type — outcome — context
+```
+- **date**: ISO date (`YYYY-MM-DD`) of when the attempt happened.
+- **model**: exact model name — `<model-tier>-<version>` (e.g. `haiku-4.5`,
+  `sonnet-4.6`, `opus-4.7`). Name versions exactly so future routing can
+  distinguish a current-version success from a prior-version failure.
+- **task-type**: short tag drawn from the project's named-subagent-role names
+  where possible: `recon` / `planner` / `architect` / `security-reviewer` /
+  `backend-impact` / `frontend-impact` / `test-strategist` / `cross-repo-sync`
+  / `doc-audit` / `queue-curator` / `implementation` / (or ad-hoc tags like
+  `regex-sweep`, `single-test-add`, `helper-extract`).
+- **outcome**: one of:
+  - `ok` — task completed correctly, no coordinator follow-up needed.
+  - `failed: <one-line mode>` — concrete failure (hallucinated path, ignored
+    constraint, false report, etc.).
+  - `partial: <one-line issue>` — mostly worked but coordinator had to fix or
+    extend something.
+  - `overridden: <one-line — what user picked instead>` — Frontier-tier
+    recommendation reversed by user.
+- **context**: one sentence. What the work was, what session/PR if available,
+  anything load-bearing for future readers.
+## Decision rules
+Both rules below are computed by **counting entry rows** for the `(model,
+task-type)` pair — never from memory or general impressions of how a model
+has been doing.
+### Two-Failure Rule
+Two failures on a `(model, task-type)` pair within the trial period contracts
+that combination back to the next-tier-up. Add a one-line entry in the policy
+doc's "Applied (recent)" section noting the scope contraction.
+The rule is "two within the trial period," not "two ever" — a model upgrade
+or a substantial codebase shift resets the period.
+### Three-Clean-Runs Graduation
+Three consecutive clean runs (no `failed`, no `partial`) on an experimental
+`(model, task-type)` pair graduates that combination to the "validated scope"
+list in the policy doc. Same — record the graduation in the project's
+PLAYBOOK_FEEDBACK.md or equivalent.
+### Repeated Frontier Overrides
+Three or more `overridden` entries against the same model + same kind of
+decision within a short window flags the work for paired-design treatment
+instead of a one-shot subagent call. The signal: the decision has more
+context than fits in a focused subagent prompt.
+## Current Status
+A point-in-time snapshot of which `(model, task-type)` pairs sit in which
+tier. Update whenever the Two-Failure Rule or Three-Clean-Runs Graduation
+triggers. Keep it short — one row per pair, most-changed-recent first.
+| Model           | Task type        | Tier         | Notes                                     |
+|-----------------|------------------|--------------|-------------------------------------------|
+| <model-vX.Y>    | <task-type>      | validated    | <e.g. graduated YYYY-MM-DD after 3 clean> |
+| <model-vX.Y>    | <task-type>      | experimental | <e.g. trial began YYYY-MM-DD>             |
+| <model-vX.Y>    | <task-type>      | deprecated   | <e.g. contracted YYYY-MM-DD after 2 fails>|
+## Entries
+(most-recent first; append at top after the heading)
+### YYYY-MM-DD
+- <YYYY-MM-DD> — <model-vX.Y> — <task-type> — `ok` — <one-sentence context: what work, what session/PR>
+- <YYYY-MM-DD> — <model-vX.Y> — <task-type> — `failed: <concrete failure mode>` — <one-sentence context>
+- <YYYY-MM-DD> — <model-vX.Y> — <task-type> — `partial: <what was missed>` — <one-sentence context>
+- <YYYY-MM-DD> — <model-vX.Y> — <task-type> — `overridden: <what user picked instead>` — <one-sentence context>
+### Older entries
+(none — tracker started <YYYY-MM-DD>)

package/templates/API_REFERENCE.md ADDED Viewed

@@ -0,0 +1,131 @@
+# API Reference
+Update this document when endpoints, request/response shapes, auth behavior, rate limits, or external integrations change.
+## Conventions
+- **Base URL / path prefix:** `https://api.example.com` (or `/api`)
+- **Versioning:** versioned via path (`/v1/...`), header (`X-API-Version`), or unversioned. Note the strategy.
+- **Content type:** `application/json` by default; note exceptions (streaming, multipart, non-JSON) per endpoint.
+- **Authentication:** e.g. `Authorization: Bearer <token>`, session cookie, API key header. State the format once here; per-endpoint sections only flag deviations.
+Adjust headings for non-REST surfaces (GraphQL → "Schema", "Operations", "Subscriptions"; tRPC → "Routers", "Procedures").
+## Rate Limiting
+If the API enforces rate limits, document the matrix once here so per-endpoint sections only flag deviations.
+| Route group | Limit | Window | 429 response shape |
+|---|---|---|---|
+| Auth (login, register, reset) | 10 | 15 min | `{ "error": "rate_limited", "retry_after": <seconds> }` |
+| Public reads | 60 | 1 min | (same) |
+Delete this section if the API has no rate limiting.
+## Public Endpoints
+(Endpoints that do not require authentication.)
+### `METHOD /path`
+Purpose:
+Auth required: no
+Request:
+```json
+{}
+```
+Response on success:
+```json
+{}
+```
+Errors:
+| Status | Code | Meaning |
+|---|---|---|
+| 400 | `invalid_input` | request body failed validation |
+| 404 | `not_found` | resource does not exist |
+Validation:
+- `field`: rule (e.g. min length 3, regex, allowed values).
+Privacy / never-returned fields:
+- Fields the response intentionally omits even though they exist on the underlying entity.
+## Authenticated Endpoints
+(Endpoints that require auth. State the auth method once at the top of this group if it differs from the global Conventions.)
+### `METHOD /path`
+Purpose:
+Auth required: yes
+Request:
+```json
+{}
+```
+Response on success:
+```json
+{}
+```
+Response on conflict / locked / partial state (if applicable, show distinct shapes):
+```json
+{}
+```
+Errors:
+| Status | Code | Meaning |
+|---|---|---|
+| 401 | `unauthorized` | missing or invalid token |
+| 403 | `forbidden` | authenticated but not allowed |
+| 409 | `conflict` | state conflict (e.g. already claimed) |
+Validation:
+- `field`: rule.
+Privacy / never-returned fields:
+- (list, or "none — see Conventions")
+Rate limit (if endpoint-specific): see Rate Limiting table above, or override here.
+## Streaming / Non-JSON Endpoints (optional)
+For endpoints returning bytes, server-sent events, multipart, or other non-JSON shapes. Document content-type, framing, and termination behavior per endpoint.
+### `METHOD /path`
+Content-Type:
+Stream framing / terminator:
+Auth required:
+Failure behavior (mid-stream errors):
+## External Integrations
+For each third-party API, queue, mail provider, etc.:
+- Service:
+- Purpose:
+- Env vars:
+- Retry / timeout behavior:
+- Data leaving the system (what fields are sent out):
+- Failure behavior (what the user sees if the integration is down):

package/templates/ARCHITECTURE.md ADDED Viewed

@@ -0,0 +1,89 @@
+# Architecture
+Update this document when file layout, runtime flow, frontend/backend structure, major dependencies, or testing strategy changes.
+## Overview
+Describe the system at a high level. One paragraph that another developer can read cold and orient from.
+## Runtime Architecture
+If the project has a single runtime, one fenced block is enough. If there are multiple (production vs. local dev, host vs. embedded, server vs. CLI), draw each separately so readers don't conflate them.
+Production runtime:
+```text
+client -> server -> database/external services
+```
+Alternate runtime (local dev, offline mode, embedded, etc. — delete if N/A):
+```text
+client -> local server -> in-memory or file-backed state
+```
+## Folder / Module Map
+- `path/`: Purpose.
+- `path/file.ext`: Purpose.
+If the project has clear architectural layers (frontend / backend, host / plugin, core / adapters), group entries by layer with sub-headings rather than listing flat. Flat lists work for small projects; layered groupings scale better.
+## Key Flows
+List the 3–7 user- or system-visible flows that matter — not every flow, just the ones a contributor needs to know about. Examples: a registration/login flow, a primary user action, a critical background job, a deploy flow.
+### Flow Name
+1. Step one.
+2. Step two.
+3. Step three.
+### Second Flow Name
+1. Step one.
+2. Step two.
+## Dependencies
+- Dependency: why it is used.
+Distinguish runtime dependencies from build-time / dev-time dependencies if the distinction matters for the project (e.g. things shipped to clients vs. things that only run on CI).
+## Security Boundaries
+Identify the trust boundaries in the system. Generic prompts:
+- **Secrets that must stay server-side** — credentials, API keys, signing keys, anything in `.env`.
+- **Data that must never reach clients or logs** — password hashes, reset tokens, internal user IDs, raw third-party payloads.
+- **Trust boundaries between components** — what each service or layer is allowed to assume about its inputs vs. what it must validate.
+Delete this section if the project genuinely has no security surface (a pure-function library, a static art piece with no input). Otherwise, fill it in honestly — the absence of this section is a tell that boundaries weren't thought through.
+## Testing Strategy
+Unit test command:
+```bash
+# command here
+```
+What unit tests cover:
+- Pure helper logic.
+- Important edge cases.
+- High-risk behavior.
+### Coverage Gaps
+What is intentionally not covered yet, and why:
+- Gap or deferred test area — short reason (manual smoke test suffices for now / blocked on infra / low priority).
+## Tradeoffs
+- Tradeoff or constraint that future maintainers should understand.
+## Future Architecture Notes
+- Possible improvement that is intentionally deferred.

package/templates/ASSETS.md ADDED Viewed

@@ -0,0 +1,90 @@
+# Assets
+Use this document to track media assets used by the project — images, audio, video, icons, fonts, downloadable files, fonts, etc. Update it whenever code references new assets or asset paths/formats change.
+**Skip or delete this file if the project ships no static assets** (a pure-logic library, a service with no media). The framework is fine without it.
+## Asset Locations
+Distinguish between optimized/built assets (what ships) and source/master assets (what authoring tools produce):
+```text
+path/to/assets/         -- optimized, served to clients
+path/to/assets-source/  -- masters; may live outside the repo if files are large
+```
+If masters live outside the repo (Git LFS, object storage, designer's drive), say so explicitly so contributors know where to find them.
+## Hosting Considerations
+- Repo size: large media inflates clones; check repo size before committing big files.
+- Alternatives: object storage, CDN, Git LFS, or a separate assets repo.
+- Build step: if assets are processed (resized, compressed, sprite-sheeted) by a build step, document the command in `docs/DEPLOYMENT.md` or `docs/ARCHITECTURE.md`.
+## Naming Rules
+- Use lowercase filenames.
+- Use hyphens instead of spaces (`release-name-cover.jpg`, not `Release Name Cover.jpg`).
+- Keep filenames URL-safe (no spaces, apostrophes, or special characters that need escaping in HTML / URLs).
+- Prefer descriptive names that survive renames elsewhere in the codebase.
+- Avoid committing huge raw source files unless required.
+## Current Asset Slots
+Group entries by asset category when there are many — e.g. `## Images`, `## Audio`, `## Fonts`. A single flat list works for small projects.
+### Asset Name
+Asset type: image / audio / video / font / icon / downloadable
+Canonical path (optimized, what ships):
+```text
+path/to/asset.ext
+```
+Source / master copy (delete if N/A):
+```text
+path/to/source/asset.psd
+```
+Used by:
+- `path/to/file.ext`
+Purpose:
+- Describe what the asset is used for.
+Recommended preparation (depends on asset type):
+- **Images**: dimensions, format, target file size, color profile, alt text.
+- **Audio**: bitrate, sample rate, channels (mono / stereo), encoding, peak/loudness target.
+- **Video**: resolution, codec, container, target bitrate.
+- **Fonts**: weights/styles included, formats (`.woff2`, `.ttf`), license, glyph subsetting.
+- **Icons**: format (SVG / PNG sprite), size sets, monochrome vs. multicolor.
+Fallback behavior:
+- Describe what happens if the asset is missing or fails to load.
+## Access-Controlled / Server-Mediated Assets (optional)
+For projects where some assets are not served as static files but are fetched through an authenticated endpoint (paywalled media, signed URLs, gated downloads):
+### Asset Name
+Endpoint: `GET /api/...`
+Auth model: who is allowed to fetch this and how it's verified.
+What the client sees: the response shape, content-type, framing.
+What stays server-side: the raw file path, signing keys, access logs.
+Skip this section if all assets are public static files.
+## Source Files
+Describe whether source/master files should be committed or kept outside the repo. Fold into per-asset "Source / master copy" entries above when convenient; keep this section for repo-wide policy if the rule is the same across all assets.

package/templates/AUTONOMOUS_QUEUE.md ADDED Viewed

@@ -0,0 +1,119 @@
+# Autonomous Queue
+Priority-ordered list of tasks pre-approved for unsupervised AI work. The autonomous host pulls from the top. Anything not in this queue requires user input even if tagged `[autonomy: safe]` in `docs/TODO.md`.
+This file is curated, not auto-generated. Add an entry when a TODO item is genuinely safe to land without user review beyond the standard PR. Remove an entry when it ships (and capture it in `docs/CHANGELOG.md`).
+## Autonomy Tag Legend (inlined for self-containment)
+This queue is intentionally self-contained: the same tag scheme is also documented in `docs/AGENTS.md`, but inlining it here means a fresh reader (human or agent) can act on the queue without hopping files. If the two ever drift, AGENTS.md wins — re-sync this section.
+- `[size: XS|S|M|L|XL]` — XS = under ~15 min. S = under an hour. M = a focused session. L = multi-session / design-required. XL = epic.
+- `[tier: haiku|sonnet|opus]` — model tier that should drive implementation. (Or model-agnostic `recon|workhorse|frontier`.)
+- `[risk: low|med|high]` — blast radius if the change goes wrong.
+- `[scope: isolated|cross-cutting|infra]` — single subsystem / multi-subsystem / build+deploy+secrets.
+- `[autonomy: safe|review|needs-human-collab]` — `safe` = autonomous host may pick up and open a normal PR. `review` = host may implement but MUST open a DRAFT PR for human merge. `needs-human-collab` = host must NOT start; requires explicit user approval and a paired session.
+Default for unmarked entries: `[size: S][tier: sonnet][risk: low][scope: isolated][autonomy: safe]`. Only call out tags that differ.
+## Tier Selection Hints
+Each queue entry carries a **Suggested agent tier** field that tells the autonomous host (or the user reading the queue) how to dispatch the work. Controlled vocabulary:
+- `Haiku` — recon, simple file lookups, status reads, mechanical find-replace, single-line tweaks, doc-consistency sweeps.
+- `Sonnet` — most implementation work: feature additions, refactors within a single subsystem, test writing, doc updates, CHANGELOG/DECISIONS entries.
+- `Opus` — ambiguous design decisions, security-critical work, architectural choices, cross-repo refactors, non-obvious debugging, anything destined for `docs/DECISIONS.md`.
+- `subagent: <role>` — specifically dispatched to a named subagent rather than running on the host. Standard roles: `recon`, `planner`, `architect`, `debugger`, `doc-audit`, `test-strategist`, `security-reviewer`, `cross-repo-sync`, `backend-impact`, `frontend-impact`, `queue-curator`. See `docs/AGENTS.md` for each role's contract.
+**Mapping table — task type → suggested tier:**
+| Task type | Suggested tier |
+|---|---|
+| File / symbol lookup, "where is X used" | `subagent: recon` (Haiku) |
+| Doc-consistency audit after rename | `subagent: doc-audit` (Haiku) |
+| Single-line tweak, default value bump, copy edit | `Haiku` |
+| Single-tool feature, test additions, doc update | `Sonnet` |
+| Refactor within one subsystem | `Sonnet` |
+| Cross-repo edit (palette swap, link insertion) | `subagent: cross-repo-sync` (Sonnet, worktree) |
+| Pre-implementation plan for `[size: M\|L]` work | `subagent: planner` (Opus) |
+| Architecture decision for `docs/DECISIONS.md` | `subagent: architect` (Opus) |
+| Auth / token / rate-limit / access-control review | `subagent: security-reviewer` (Opus) |
+| Non-obvious failure investigation | `subagent: debugger` (Opus) |
+| Ambiguous design choice, paired-design candidate | `Opus` (or escalate to direct Frontier session) |
+When in doubt, prefer a subagent (context isolation + structured output) over running on the host directly.
+## Queue Format
+Each entry is a one-line pointer into `docs/TODO.md` (or `docs/ROADMAP.md`) plus its tags, suggested tier, and a short note on why it is safe.
+```
+- [ ] <short title> — <where in TODO/ROADMAP> — [tags] — tier: <Suggested agent tier> — note
+```
+**Example shape** (placeholders, not real entries):
+```
+- [ ] Add <feature flag> for <scope> — `docs/TODO.md` §<section> — `[size: S][tier: sonnet][risk: low][scope: isolated][autonomy: safe]` — tier: Sonnet — UI-only, behind a default-off flag; no schema or migration impact.
+- [ ] Audit stale references after <rename> — `docs/TODO.md` §Hygiene — `[size: XS][risk: low][scope: cross-cutting]` — tier: subagent: doc-audit — read-only sweep across `docs/` and any code with hard-coded paths.
+- [ ] Plan <medium feature> before implementation — `docs/ROADMAP.md` §<milestone> — `[size: M][risk: med][scope: isolated][autonomy: review]` — tier: subagent: planner — Frontier-tier plan first; implementation tier drops to Sonnet once the plan lands.
+- [ ] Review <auth-adjacent change> before merge — `docs/TODO.md` §<section> — `[size: S][risk: high][scope: isolated][autonomy: review]` — tier: subagent: security-reviewer — DRAFT PR; security-reviewer must sign off before human merge.
+```
+## Active Queue
+Order = priority. Top of list runs first. Each entry points at a `docs/TODO.md` bullet that carries its own tag block; the line below is a human-readable summary so the queue reads at a glance.
+<!-- Populate this section by hand. Empty queue = autonomous host has nothing to pick up; surface to the user and wait. -->
+- [ ] (none yet — populate when ready to enable autonomous runs)
+When the host finishes one of these, it should remove the entry from this list, update `docs/TODO.md` (delete the shipped entry per the cleanup gate), add a `docs/CHANGELOG.md` line, and open a PR. Do not pull the next item until review of the previous PR is complete unless the user explicitly opts into batch mode.
+## Recurring: Product-Outcome Retro
+A standing, opt-in entry that closes the loop on the *product itself* — not just the workflow docs. Most of this queue tracks work to ship; this entry checks whether shipped work moved the numbers the pitch promised.
+On a set cadence (`<weekly | monthly | per-milestone>`), the host runs a retro pass:
+1. **Pull the targets.** Read the success metrics and kill criteria from the funded spec (`docs/PROJECT_CONTEXT.md` / the pitch). These were defined at funding time and are rarely revisited.
+2. **Compare to reality.** Check them against what actually shipped and what the product is actually doing — usage, revenue, support load, whatever the pitch named as the winning metric.
+3. **Route the verdict:**
+   - **On track** — note it; no action.
+   - **Underperforming but fixable** — file the specific fix as a `docs/TODO.md` entry.
+   - **A kill criterion is met** — stop adding scope and surface a `STAKEHOLDER` re-pitch/kill decision in `docs/OPEN_DECISIONS.md`. Do not autonomously keep building a product the evidence says should wind down.
+This pass is `[autonomy: review]` at most: it produces a verdict and queued items, never a silent pivot or a kill. The kill/continue call is the stakeholder's. Pairs with the Low-Effort Stakeholder mode's walk-away test in `docs/agents/STAKEHOLDER.md`.
+Example entry:
+- [ ] Product-outcome retro (`<month>`) — `docs/PROJECT_CONTEXT.md` §Success Metrics — `[size: S][tier: opus][risk: low][scope: isolated][autonomy: review]` — tier: Opus — check shipped work vs. success metrics + kill criteria; route to TODO / OPEN_DECISIONS.
+## Out-of-Queue (Explicit Holdouts)
+Items that look picky-uppable but are deliberately NOT in the queue right now. Each entry MUST note the rationale so a future session does not silently promote them. Even items tagged `[autonomy: safe]` belong here when there is a project-specific reason to gate them behind a human.
+**Format:**
+```
+- **<short title>** — `<doc reference>` — `[tags]` — <why held out>
+```
+**Common holdout categories** (drawn from real-project shape):
+- **Strategy / tuning parameters** — anywhere choice-of-parameter carries edge or expresses product judgment (gap thresholds, MA windows, ranking weights, rate-limit numbers). Operator should weigh in before the values bake into shipped behavior.
+- **Live-trading / production toggles** — any `LIVE=1` / `PROD=1` / "talk to real money or real users" code path. Production gate; never autonomously promoted regardless of tag.
+- **Security-sensitive defaults** — default password policy, default token lifetimes, default CORS origins, default rate limits, default auth scopes. Looks safe-to-tweak; isn't.
+- **Universe / seed data choices** — picking which N items populate a curated list (sectors, presets, demo content). Judgment work the operator owns.
+- **Architectural reversals** — anything that revises a prior `docs/DECISIONS.md` entry. Needs paired session.
+- **Large-bundle vendor integrations** — bundle weight, license check, identity question (e.g. swapping a major dependency, adding a renderer).
+- **Production deployment work** — DNS, hosting, secrets, schema migrations against live data, password-hash reseeding. Requires real credentials and human gates.
+- **Investigations / research** — output is a write-up, not code; needs operator judgment on the "fits / not a fit" verdict before downstream work proceeds.
+- **Operator-side actions** — anything the bot literally cannot do (account setup, real API key creation, smoke-testing against a live broker / vendor).
+- **Deploy-affecting changes when the security-audit gate would fail** — any task whose merge would put new code on a public surface without a current `SECURITY_AUDIT_*.md` (≤90 days old OR covering all changes since the last audit, whichever is stricter) is held out of autonomy. Either run `security-reviewer` first and let the audit land, or wait for a human to schedule an audit pass. See `docs/DEPLOYMENT.md` Pre-Deploy Gate for the gate condition.
+**Example entries** (shape only — replace with real holdouts):
+- **<Strategy parameter set>** — `docs/TODO.md` §<section> — `[autonomy: needs-human-collab]` — parameter values carry edge; operator weighs in before backtest results.
+- **<Live integration toggle>** — codebase-wide — production gate; never autonomously promoted.
+- **<Default security setting>** — `docs/TODO.md` §<section> — `[autonomy: needs-human-collab]` — looks like a config tweak, but defaults ship to all users.

package/templates/BUILD_PLAN.md ADDED Viewed

@@ -0,0 +1,89 @@
+# Build Plan: <product name>
+<!--
+  Stage B output of the Spec & Design phase (see SPEC_WORKFLOW.md). Turns the
+  LOCKED functional spec (SPEC.md, already SPEC_APPROVED) into the HOW: stack
+  chosen through explicit tradeoffs, architecture, and queue-ready stories.
+  This file is LOCKED by the verbatim token BUILD_PLAN_APPROVED. The Build
+  phase (queue execution) does not begin until it lands.
+-->
+**Status:** Draft — awaiting `BUILD_PLAN_APPROVED`
+**Spec:** [`SPEC.md`](SPEC.md) — locked at `SPEC_APPROVED`
+**Architecture:** [`ARCHITECTURE.md`](ARCHITECTURE.md)
+**Last updated:** <YYYY-MM-DD>
+## Stack Decisions
+<!--
+  One subsection per engineering fork the human weighed. Surface each as an
+  enumerated decision with options + pros/cons; record the chosen option and
+  the one-line reason. Depth follows the SPEC.md interaction tier.
+-->
+### <Decision, e.g. "Datastore">
+| Option | Pros | Cons |
+|---|---|---|
+| <option A> | <pros> | <cons> |
+| <option B> | <pros> | <cons> |
+**Chosen:** <option> — <one-line reason>.
+## Stories
+<!--
+  The spec decomposed into implementation tasks, ready for AUTONOMOUS_QUEUE.md.
+  Each story:
+    - [id: S-NN]            stable id
+    - satisfies: R#, R#     the SPEC.md requirement(s) it implements
+    - [dep: none]           parallel-safe — OR — [dep: S-01, S-02] serial
+    - standard tags         [size:][risk:][autonomy:] — see AGENTS.md
+    - acceptance            the [AUTO] check or [HUMAN] checklist line
+  The [dep:] markers ARE the build schedule: everything [dep: none] runs in
+  parallel; a [dep: S-xx] story waits for its predecessors.
+-->
+### S-01 — <story title>
+- **satisfies:** R1, R2
+- `[dep: none]` `[size: S]` `[risk: low]` `[autonomy: safe]`
+- **acceptance:** <the runnable check, or the human checklist line>
+### S-02 — <story title>
+- **satisfies:** R3
+- `[dep: S-01]` `[size: M]` `[risk: med]` `[autonomy: review]`
+- **acceptance:** <the runnable check, or the human checklist line>
+## Verifier Suite
+<!--
+  Generated from the locked spec's Acceptance Tests. One runnable check per
+  [AUTO] requirement (references the requirement number); a single runner that
+  reports pass/fail per requirement and exits non-zero on any failure.
+  Fallback when the stack has no natural test runner: best-effort runner the
+  stack DOES support (build succeeds, schema validates, links resolve, headless
+  smoke) + route the rest to the Human Checklist. Log what got pushed to human.
+-->
+- **Runner:** <command that runs all checks and exits non-zero on failure, e.g. `pnpm test`>
+- **R1** `[AUTO]` → <check + where it lives, e.g. `tests/r1.spec.ts`>
+- **R2** `[AUTO]` → <check>
+- **Pushed to Human Checklist (no automated check available):** <list, with why>
+## Human Checklist
+<Carried from SPEC.md plus anything the verifier couldn't automate. This is the
+reviewer's entire post-build surface — the [AUTO] half is trusted once green.>
+- [ ] **R3** — <the specific thing to confirm>
+## Open Questions
+<Anything still needing a human call before BUILD_PLAN_APPROVED.>
+- <open question>