npm - discipline-md - Versions diffs - 0.1.0 - Mend

discipline-md 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (50) hide show

package/LICENSE +21 -0
package/README.md +80 -0
package/bin/discipline.js +587 -0
package/package.json +40 -0
package/templates/.claude/settings.json +58 -0
package/templates/AGENTS.md +463 -0
package/templates/AGENT_TRACKER.md +138 -0
package/templates/API_REFERENCE.md +131 -0
package/templates/ARCHITECTURE.md +89 -0
package/templates/ASSETS.md +90 -0
package/templates/AUTONOMOUS_QUEUE.md +119 -0
package/templates/BUILD_PLAN.md +89 -0
package/templates/CHANGELOG.md +90 -0
package/templates/CLAUDE.md +89 -0
package/templates/CREDITS.md +109 -0
package/templates/DATA_MODEL.md +88 -0
package/templates/DECISIONS.md +120 -0
package/templates/DEPLOYMENT.md +342 -0
package/templates/HANDOFF.md +289 -0
package/templates/IMPROVEMENT_LOOP.md +103 -0
package/templates/INVESTIGATION.md +154 -0
package/templates/LICENSE +68 -0
package/templates/NOTICE +55 -0
package/templates/OPEN_DECISIONS.md +61 -0
package/templates/PLAYBOOK_FEEDBACK.md +87 -0
package/templates/PROJECT_CONTEXT.md +91 -0
package/templates/README.md +60 -0
package/templates/ROADMAP.md +96 -0
package/templates/SECURITY_AUDIT.md +235 -0
package/templates/SETUP.md +162 -0
package/templates/SPEC.md +105 -0
package/templates/SPEC_WORKFLOW.md +173 -0
package/templates/TODO.md +118 -0
package/templates/USAGE.md +153 -0
package/templates/VERIFICATION_GATE.md +68 -0
package/templates/agents/CROSS_REPO_SYNC.md +124 -0
package/templates/agents/DEBUGGER.md +112 -0
package/templates/agents/PLANNER.md +111 -0
package/templates/agents/README.md +64 -0
package/templates/agents/RECON.md +99 -0
package/templates/agents/SECURITY_REVIEWER.md +123 -0
package/templates/agents/SPEC_ARCHITECT.md +133 -0
package/templates/agents/STAKEHOLDER.md +197 -0
package/templates/agents/_TEMPLATE.md +116 -0
package/templates/agents/optional/ARCHITECT.md +109 -0
package/templates/agents/optional/BACKEND_IMPACT.md +108 -0
package/templates/agents/optional/DOC_AUDIT.md +108 -0
package/templates/agents/optional/FRONTEND_IMPACT.md +109 -0
package/templates/agents/optional/QUEUE_CURATOR.md +114 -0
package/templates/agents/optional/TEST_STRATEGIST.md +107 -0

package/templates/SPEC_WORKFLOW.md ADDED Viewed

@@ -0,0 +1,173 @@
+# Spec & Design Phase
+**The most expensive thing an agent builds is the wrong thing, built correctly.**
+This is the phase that sits between *deciding to build* and *building*. It exists because a funded idea (or an accepted ticket, or a one-line request) is a **business case, not a buildable contract**. Hand that directly to an implementer — especially an unattended one — and it resolves every unstated decision in the most generic, usually wrong, direction. The Spec & Design phase front-loads the human judgment so the build is mechanical.
+> **Front-load the judgment.** A wrong spec executed flawlessly is still wrong, and a verifier-driven build is the one thing downstream that *cannot* catch it. So the design decisions concentrate here, where a human is in the loop — not at build time, where they aren't.
+This doc is harness-agnostic and **model-agnostic**: it is written to be run by a local LLM, ChatGPT/Codex, or Claude, under any of the runtime patterns in [`AGENTS.md`](AGENTS.md). Where a step benefits from a stronger model, that is a tier note (Frontier / Workhorse / Recon), never a vendor name.
+---
+## Where it sits
+```
+Idea / request / funded pitch
+        │
+        ▼
+   ┌─────────────────────────── Spec & Design ───────────────────────────┐
+   │                                                                      │
+   │  Stage A — Functional Spec (plain English)  ──►  SPEC_APPROVED       │
+   │  Stage B — Technical Plan (stack + stories) ──►  BUILD_PLAN_APPROVED │
+   │                                                                      │
+   └──────────────────────────────────────────────────────────────────────┘
+        │
+        ▼
+   Build (autonomous queue, parallel except serial deps)
+```
+Two stages, two gates, in order. **Neither gate is skippable and neither is inferred** — each opens only on its verbatim approval token (see Gates). The phase has an explicit predecessor gate too: in a Founder/venture flow it runs *after* `FUNDING_APPROVED`; in a plain ticket flow it runs after the work is accepted. It never runs speculatively against an unapproved idea.
+---
+## The first prompt: pick an interaction tier
+The very first thing the Spec Architect does is ask the human **how much they want to be involved**. This is a dial, set once at the top of Stage A, recorded in the spec header, and overridable mid-flight ("go deeper here", "stop asking, infer the rest").
+| Tier | The Architect… | Use when |
+|---|---|---|
+| **Express** | infers almost everything; asks only true blockers (decisions it genuinely cannot make and cannot safely default). | the build is small, conventional, or you trust the defaults and want speed. |
+| **Guided** | asks a structured tradeoff question at each *major* decision (page grouping, key flows, the load-bearing stack picks); infers the routine. | a normal build where you want a say on the forks that matter. |
+| **Thorough** *(default)* | Guided, plus screen-by-screen and data-model confirmation before locking. | most real products — especially anything an unattended local model will execute, where more pinned down up front means fewer wrong guesses at build time. |
+| **Exhaustive** | confirms essentially every choice; nothing material is inferred silently. | high-stakes, novel, or compliance-sensitive work; or when you simply want to co-design every detail. |
+**Default is Thorough.** The default is deliberately *not* Express: under-specification is cheap to fix in this phase (one more question) and expensive to fix at build time (a wrong artifact, possibly merged unattended). When unsure, ask one more question.
+Record the chosen tier in the spec: `Interaction tier: Thorough`.
+---
+## Stage A — Functional Spec (plain English)
+**Posture: read-only design.** The Architect reads the codebase and context but writes only the spec — no implementation, no dependencies, no scaffolding. Express this however the harness allows: Claude Code *plan mode*, ChatGPT/Codex *plan*, or, on a bare local setup, simply the convention "this turn produces a spec, not code." The mechanism varies; the **read-only contract does not**.
+**Output: `SPEC.md` — entirely plain English about the app and how it behaves.** No tech choices live here. A non-technical stakeholder must be able to read it and recognize their product. It covers:
+- **Every page / screen / view** — what it is, what it's for, how the user reaches it.
+- **Every control** — buttons, inputs, menus, tools, gestures — and what each one *does*.
+- **Every user flow** — the paths through the product, the states a user moves between, what success looks like at each step.
+- **The negative space** (where most spec failures hide — absences, not wrong statements):
+  - boundary behavior — empty, max, zero, overflow, malformed input;
+  - what is explicitly **out of scope** (the single highest-leverage section — it's what stops scope creep at build time);
+  - what happens when two requirements **conflict**, and which one wins;
+  - **assumptions** the spec is making (an executor resolves every unstated assumption, usually wrongly — so state them and have the human confirm).
+### The spec is the verifier
+Every requirement must be **checkable**. If you can't state how a requirement would be verified, it is not a requirement yet — rewrite it until it's testable, or tag it human-judgment-only. Prefer `<thing> MUST <observable, checkable condition>` over `<thing> should be clean / fast / good`.
+Tag every requirement:
+- **`[AUTO]`** — a machine can check it (compiles, test passes, schema matches, output round-trips, numeric tolerance met, link resolves, element present).
+- **`[HUMAN]`** — needs human judgment; no automatic verifier exists ("the copy reads well", "the legal interpretation is sound").
+More compute makes `[AUTO]` requirements bulletproof and makes `[HUMAN]` requirements only more *confident*, never more *correct*. The `[HUMAN]` list is precisely the human reviewer's worklist after the build — everything else is trusted because it passed the suite.
+### Red-team before locking
+Before proposing `SPEC.md` for approval, the Architect attacks its own draft and resolves or surfaces each finding — never papers over a gap with a plausible default:
+1. **Ambiguities** — what can be read two ways? Quote it; give both readings.
+2. **Gaps** — what input / state / edge case is unhandled?
+3. **Conflicts** — which pairs of requirements can't both hold?
+4. **Hidden assumptions** — what does the spec rely on but not state?
+5. **Unverifiable `[AUTO]` claims** — tagged checkable but actually aren't; this is exactly where confident-wrong output hides.
+For high-stakes specs this red-team pass is worth **one Frontier-tier call** — the cheapest, highest-ROI step before any expensive build. (See Tier Routing.) Findings that need a human decision go to the spec's **Open Questions**; resolved ones fold into the requirements.
+**Gate 1:** the human replies with the verbatim token **`SPEC_APPROVED`**. Nothing else opens it. Stage B does not begin until it lands.
+---
+## Stage B — Technical Plan ("plan mode")
+Now, and only now, the design turns technical. Stage B converts the *what* (locked `SPEC.md`) into the *how*: an architecture, a stack chosen through explicit tradeoffs, and a backlog of stories ready to run unattended.
+**Interactive tradeoff decisions.** This is where the human weighs the engineering forks — framework, datastore, hosting, key libraries, auth, the build-vs-buy calls. The Architect surfaces each as an **enumerated decision with options and their pros/cons**, recommends one, and lets the human pick. Express this with whatever structured-choice affordance the harness has; on a bare runtime, present it as a numbered list the human answers. The depth of this back-and-forth follows the interaction tier from Stage A.
+**Outputs:**
+1. **`ARCHITECTURE.md`** — the full intended build architecture **with a diagram** (Mermaid or equivalent that renders in-repo): component/module breakdown, data model, major data/control flows, external dependencies and integration points, deployment target, and the trust/operational boundaries (what holds user state, where the security surface is).
+2. **Queue-ready stories** — the spec decomposed into implementation tasks, each one:
+   - traced to the `SPEC.md` requirement number(s) it satisfies (`satisfies: R3, R7`);
+   - carrying a stable id (`[id: S-04]`) and a dependency marker — **`[dep: none]`** (parallel-safe) or **`[dep: S-01, S-02]`** (serial; cannot start until those land);
+   - tagged with the standard autonomy/size/risk tags (see [`AGENTS.md`](AGENTS.md));
+   - carrying its own acceptance check (the `[AUTO]` test, or the `[HUMAN]` checklist line).
+   These land in `AUTONOMOUS_QUEUE.md` (+ `TODO.md`) **ready to run, in parallel except where a `[dep:]` marker forces serial order.** The dependency graph *is* the build schedule.
+3. **The verifier suite** — generated from the locked spec (next section).
+**Gate 2:** the human replies with the verbatim token **`BUILD_PLAN_APPROVED`**. That opens the Build phase. Until then, no implementation, no installs, no queue execution.
+---
+## The verifier suite
+Generated from the locked spec, the verifier suite is what makes an **unattended** build safe — it is the ground truth a weak or unsupervised executor cannot fake. (This is the whole reason the spec discipline above insists every requirement be checkable: a vague requirement produces a vague check, which passes confidently-wrong output.)
+For each `[AUTO]` requirement, emit a runnable check — unit test, schema validator, numeric-tolerance assertion, round-trip assertion, headless smoke, or shell check — that passes **iff** the requirement holds, and that references the requirement number. Plus a single **runner** that executes all checks, reports pass/fail per requirement, and exits non-zero on any failure. The build loop (sampling / best-of-N / fix-and-recheck on a local executor; or a single supervised pass) iterates until the runner is green.
+For each `[HUMAN]` requirement, emit a **checklist line** naming the specific thing to confirm. That checklist plus anything the runner flagged uncertain is the human's *entire* post-build review surface — the `[AUTO]` half is trusted because it passed.
+**Fallback when the stack has no natural test runner** (static content site, audio plugin, design-only asset): generate the **best-effort runner** the stack *does* support — build succeeds, schema validates, links resolve, headless smoke loads — and route everything unautomatable to the `[HUMAN]` checklist. Always emit a runner; never block on "no tests possible." Log what got pushed to the human checklist so the unattended-execution gap is visible, not silent.
+---
+## Tier routing (model-agnostic)
+The phase is designed to run cheap and verify hard. Route each step by tier (Frontier / Workhorse / Recon — see [`AGENTS.md`](AGENTS.md) for the cross-ecosystem mapping), not by vendor:
+| Step | Tier | Why |
+|---|---|---|
+| Stage A — draft the functional spec | **Workhorse** (local-smart) | structured synthesis; cheap to iterate with the human. |
+| Spec red-team pass | **Frontier** (one call) | adversarial gap-finding is the weak-model floor — weaker models soft-pedal the adversary role. Highest ROI in the whole phase. |
+| Stage B — architecture & tradeoffs | **Workhorse**, escalate to **Frontier** for genuinely ambiguous architecture (a `DECISIONS.md`-worthy call). | most stack picks are routine; the hard fork is worth a Frontier subagent. |
+| Generate the verifier suite | **Workhorse** | mechanical translation of `[AUTO]` requirements into checks. |
+| Build executor (later phase) | **Recon / Workhorse** (local-fast), with sampling | runs against the verifier as ground truth; the suite, not the model, guarantees correctness. |
+| Final `[HUMAN]`-review assist | **Frontier**, sparingly | judgment-heavy, low-volume. |
+**Weak-model floor (hard rule):** the spec red-team, the verifier-coverage check, and any edit to a gate token or the queue's membership are **never run below Frontier tier unattended**. A Workhorse/Recon agent that hits one stops and surfaces it. (Mirrors the floor in `AGENTS.md`.)
+---
+## Gates (the two tokens)
+Both tokens are matched **verbatim**, case-insensitively, ignoring surrounding whitespace and punctuation. Nothing else opens a gate — not paraphrase, not enthusiasm, not "looks good, ship it." If the human seems to approve but the token is absent, ask for the exact token; never infer it.
+| Token | Locks | Opens |
+|---|---|---|
+| `SPEC_APPROVED` | the plain-English functional spec (Stage A) | Stage B |
+| `BUILD_PLAN_APPROVED` | the technical plan + queued stories + verifier suite (Stage B) | Build |
+In a Founder/venture flow these chain after the funding gate: `FUNDING_APPROVED` → `SPEC_APPROVED` → `BUILD_PLAN_APPROVED` → build. A project that wants different token spellings records that as a durable decision in `DECISIONS.md` before using them.
+---
+## Phase signals
+Emit a visible signal at each stage boundary so a human watching an autonomous run knows where it is:
+- `Specifying...` — drafting / iterating the Stage A functional spec.
+- `Planning...` — Stage B architecture, tradeoffs, stories, verifier suite.
+(These sit alongside whatever signals the surrounding workflow already uses, e.g. a Founder's `Scouting... / Pitching... / Building...`.)
+---
+## Why this shape
+- **Plain-English spec first, tech second** keeps the human's judgment on *product* (which they own) separate from *engineering* (which they may delegate), and keeps `SPEC.md` reviewable by a non-engineer.
+- **Two gates, not one** because approving *what to build* and approving *how to build it* are different decisions, often made at different confidence levels.
+- **Stories carry their dependency graph** so the Build phase parallelizes correctly by construction — the plan, not the executor, decides what's serial.
+- **The verifier suite is generated from the locked spec**, not written after the code, so the tests encode the *intended* behavior, not the *implemented* behavior. A test written from the code passes by definition; a test written from the spec can fail — which is the entire point.

package/templates/TODO.md ADDED Viewed

@@ -0,0 +1,118 @@
+# TODO
+<!-- HOT PATH FILE. Static sections at top (commands, reference). Volatile task list at bottom. -->
+<!-- Keep the top stable so session-to-session prompt caching works. -->
+> Two-gate rule: an `[autonomy: safe]` tag on an entry here is necessary but NOT sufficient for unattended execution. The item must ALSO be listed in `docs/AUTONOMOUS_QUEUE.md`. If it isn't in the queue, it waits for user input — no exceptions.
+## Commands
+Start local dev:
+```bash
+# command
+```
+Run tests:
+```bash
+# command
+```
+Build (if applicable):
+```bash
+# command
+```
+## Deployment Notes
+Stable reference for deployment-time gotchas. Detail goes in `docs/DEPLOYMENT.md`; this section is for the "must remember" bullets agents need at hand.
+- Note 1.
+- Note 2.
+## Test Checklist
+Pre-commit / pre-merge sanity sweep:
+- [ ] Run unit tests.
+- [ ] Run build.
+- [ ] Smoke test key user flows.
+- [ ] Audit docs before final response or commit: inspect `git diff -- docs`, update relevant docs, search for stale names/paths after renames or behavior changes.
+- [ ] Keep core docs updated when behavior changes — `docs/HANDOFF.md`, `docs/CHANGELOG.md`, `docs/ROADMAP.md`, `docs/ARCHITECTURE.md`, `docs/API_REFERENCE.md`, `docs/DATA_MODEL.md` (only the ones that actually changed; don't touch unaffected docs).
+- [ ] **Cleanup gate (mandatory)**: delete shipped TODO entries once they hit `docs/CHANGELOG.md` (TODO is a queue, not a log — no `[x] DONE` carryover); mark completed `docs/ROADMAP.md` items ✅ or move under `## Completed`; trim `docs/AUTONOMOUS_QUEUE.md` as queued items ship. See `docs/AGENTS.md` for the full mandatory cleanup gate.
+## Tag Conventions
+Every entry in `## Active / Next` (and any milestone subsection below) should carry a tag block. Tags drive autonomy decisions, model-tier selection, and queue triage.
+**Format** (suffix on the bullet):
+```
+[size: XS|S|M|L|XL][risk: low|med|high][scope: isolated|cross-cutting|infra]
+```
+Optional additional tags layer on top: `[tier: haiku|sonnet|opus]`, `[autonomy: safe|review|needs-human-collab]`. See `## Autonomy Tag Legend` below.
+**Meanings:**
+- `size` — `XS` = single-line tweak (under ~15 min). `S` = under an hour. `M` = a focused session. `L` = multi-session or design-required. `XL` = epic / multi-week, expect to break down.
+- `risk` — blast radius if the change goes wrong. `low` = isolated UI tweak / dead-code removal. `med` = data-touching code, exported APIs, build config. `high` = auth, payments, schema migrations against live data, anything user-facing in production.
+- `scope` — `isolated` = single repo, single subsystem. `cross-cutting` = touches multiple subsystems / packages within one repo. `infra` = build, CI, deploy, secrets, hosting wiring.
+**Default for unmarked entries:** `[size: S][risk: low][scope: isolated]` plus `[tier: sonnet][autonomy: safe]` from the Autonomy Tag Legend. Omit tags that match the default; only call out where an entry differs.
+**Examples** (shape only — fill with real entries):
+```
+- [ ] Reset / clear catalog action with confirmation. `[size: XS][risk: low][scope: isolated]`
+- [ ] Library list virtualization for 1k+ rows. `[size: S][risk: low][scope: isolated]`
+- [ ] Duplicate detection (filename/path heuristics). `[size: M][risk: med][scope: isolated]`
+- [ ] JSON catalog import (export already shipped). `[size: S][risk: med][scope: isolated]`
+- [ ] End-to-end test: index folder → catalog → link → export packet. `[size: M][risk: med][scope: cross-cutting]`
+- [ ] Migrate <feature> from monolith to <package>. `[size: L][risk: high][scope: cross-cutting]`
+- [ ] Recruit external validators. `[size: M][risk: med][scope: isolated][autonomy: human-only]`
+```
+## Autonomy Tag Legend
+Tag scheme defined in `docs/AGENTS.md`. Default for an unmarked entry is `[size: S][tier: sonnet][risk: low][scope: isolated][autonomy: safe]`; only call out tags that differ.
+- `[size: XS|S|M|L|XL]` — see Tag Conventions above.
+- `[tier: haiku|sonnet|opus]` — model tier that should drive implementation. (Or use the model-agnostic `recon|workhorse|frontier` if the project doesn't run on Claude.)
+- `[risk: low|med|high]` — blast radius if the change goes wrong.
+- `[scope: isolated|cross-cutting|infra]` — see Tag Conventions above.
+- `[autonomy: safe|review|needs-human-collab]` — `safe` = autonomous host may pick up; `review` = autonomous host may implement but must open a draft PR; `needs-human-collab` = MUST NOT start without explicit user approval.
+## Backup / Mirror Hygiene
+Standing reference checklist — these are RECURRING items, not one-off tasks. Do not delete on completion; instead, note the most recent verification date inline. See `docs/HANDOFF.md` "Remotes & Backup" section (or equivalent) for the project-specific context.
+- **Annual SSH key rotation.** `~/.ssh/<key>` is the auth credential for all remotes (GitHub + any mirrors: GitLab, NAS, etc.). Rotate yearly. After rotation: re-register with GitHub via `gh auth refresh`, GitLab via web console or `glab ssh-key add`, and any self-hosted mirror's `authorized_keys` for each user account. Last rotated: `<YYYY-MM-DD>`.
+- **Pre-commit gitleaks hook.** Install gitleaks as a pre-commit hook so accidental secrets get caught before any push (especially before fan-out to multiple mirrors). Respect `.gitleaksignore` where present. Verified installed: `<YYYY-MM-DD>`.
+- **GitHub push protection enabled.** In repo settings on github.com, "Secret scanning push protection" must be ON so even a typo cannot accidentally push a real credential. Verified enabled: `<YYYY-MM-DD>`.
+- **Secret scanning enabled.** GitHub repository-level secret scanning (passive scanner over history) must be ON in addition to push protection. Verified enabled: `<YYYY-MM-DD>`.
+- **Fan-out verification.** Run `git push origin --dry-run` from each repo to confirm pushurls still resolve (catches hostname changes, key revocation, mirror outage) before the next real push needs them. Last verified: `<YYYY-MM-DD>`.
+---
+## Current Context
+*A few bullets covering where the session is picking up. Examples of useful content:*
+- *Where production / local-dev state lives, and how to reset.*
+- *In-flight feature branches and their status.*
+- *Decisions that were made recently but haven't fully propagated.*
+- *Things deferred to a later session and why.*
+(Single bullet is fine for small projects; expand as the project grows.)
+## Active / Next
+- [ ] Task. `[size: S][risk: low][scope: isolated]`
+- [ ] Task. `[size: S][risk: low][scope: isolated]`
+## Blockers (delete this section if empty)
+- Blocker or dependency.

package/templates/USAGE.md ADDED Viewed

@@ -0,0 +1,153 @@
+<!--
+  TEMPLATE: USAGE.md (Tier D — user-facing controls + Consumer Contract)
+  WHEN TO USE:
+    - For end-user-facing tools (CLIs, browser-loaded HTML widgets, libraries
+      with a UI surface) where someone OTHER than a developer will operate
+      the thing. Distinct from README.md (install/intro) and ARCHITECTURE.md
+      (internals).
+    - For projects that emit OUTPUT consumed by sibling repos or downstream
+      tools — the Consumer Contract section is the load-bearing part there.
+    - When the controls / settings reference exceeds what fits comfortably
+      in README.md and warrants its own page.
+  HOW TO USE:
+    1. Copy this file into the project's `docs/` directory as `USAGE.md`.
+    2. Replace every <placeholder> with project-specific content.
+    3. Drop sections that don't apply (e.g. if there's no downstream consumer,
+       skip the Consumer Contract — but think twice; many projects have an
+       implicit consumer they should make explicit).
+    4. Treat the Consumer Contract as a CONTRACT — every breaking change to
+       items listed there is a major-version event and warrants a CHANGELOG.md
+       call-out plus, if reasonable, advance notice to the consumer repos.
+  WHERE THIS LIVES IN THE DOC SET:
+    - README.md     — install, what-it-is, one-screen demo. The lobby.
+    - USAGE.md      — controls, settings, output formats, consumer contract.
+    - ARCHITECTURE.md — internals, modules, why the design is what it is.
+    - DECISIONS.md  — locked-in choices and their rationale.
+    A reader who wants to USE the tool reads README + USAGE.
+    A reader who wants to MAINTAIN the tool reads ARCHITECTURE + DECISIONS.
+-->
+# Usage — <project-name>
+## Quick Start
+<the shortest possible "how do I get this thing running" sequence. Three to
+six lines. Assume the reader has read README.md's "what is this" but nothing
+deeper. Examples:>
+```bash
+<install or open command>
+<run command>
+```
+<one paragraph on what they should see after the run completes — the smoke
+test that confirms it's working.>
+## Controls / Settings
+<one paragraph framing: where the controls live (panel, CLI flags, config
+file, env vars), and the design principle behind them — e.g. "All controls
+live in the panel below the canvas. No hidden tabs.">
+### <Control-1 name> — <type, e.g. dropdown | slider | flag | env var>
+<one paragraph: what it does, default value, value range or enum, when you'd
+change it.>
+- **<sub-option-1>** <— what this enum value means>
+- **<sub-option-2>** <— what this enum value means>
+### <Control-2 name> — <type>
+<…>
+### <Control-3 name> (<range>, default <default>)
+<…>
+> Alternative: if the control set is large and homogeneous (e.g. 20+ CLI
+> flags), use a table:
+| Control       | Type      | Default       | Range / Values    | What it does                         |
+| ------------- | --------- | ------------- | ----------------- | ------------------------------------ |
+| <name>        | <slider>  | <5>           | <1–10>            | <one-line summary>                   |
+| <name>        | <flag>    | <off>         | <on/off>          | <one-line summary>                   |
+| <name>        | <enum>    | <"default">   | <a | b | c>      | <one-line summary>                   |
+## Output Formats
+<if the tool emits files / data, document each format here. Skip the section
+entirely if the tool produces no persistent output.>
+### <Format-1, e.g. JS array, JSON, PNG, ASCII>
+<one paragraph: where it gets emitted, when, what's in it. If the format is
+load-bearing for a downstream tool, note that and link to the Consumer
+Contract section below.>
+```<lang>
+<minimal example showing the shape>
+```
+- <field-1 explanation>
+- <field-2 explanation>
+- <indexing convention, axis ordering, units>
+- <invariants the consumer can rely on>
+### <Format-2>
+<…>
+## Consumer Contract
+> The shape this project emits that downstream tools / sibling repos can rely
+> on. **This section is the contract.** Items listed here are stable across
+> patch and minor releases; breaking changes are major-version events.
+Downstream consumers — primarily <consumer-repo-1>, <consumer-repo-2> — depend
+on the shape below. Breaking changes here silently break the consumer; the
+breaking-change warranty section spells out what counts as breaking.
+### Stable shape
+- **<contract-item-1>:** <description of the invariant. Example: "Array shape: `MAZE[y][x]`, where `y` is the row (top→bottom) and `x` is the column (left→right).">
+- **<contract-item-2>:** <e.g. "Cell values: `1` = wall, `0` = open. No other values.">
+- **<contract-item-3>:** <e.g. "Outer ring: always walls.">
+- **<contract-item-4>:** <e.g. "Determinism: the same seed string produces the same output on the same algorithm version.">
+- **<contract-item-…>:**
+### Breaking-change warranty
+A change to any of the following is a **major-version event** (bump the major
+version, announce in CHANGELOG.md, give downstream consumers advance notice
+where reasonable):
+- Removing or renaming a field listed under "Stable shape" above.
+- Changing the type or value range of any documented field.
+- Changing axis ordering, indexing convention (0-based vs 1-based), or units.
+- Changing the meaning of an enum value (e.g. swapping `1` and `0` in a wall/open grid).
+- Removing a documented output format entirely.
+- Changing the determinism guarantee (e.g. switching the PRNG so the same seed produces a different output).
+A change that is **not** breaking, and may ship in a minor or patch release:
+- Adding a new field that consumers can ignore.
+- Adding a new output format alongside existing ones.
+- Improving accuracy or performance without changing the documented shape.
+- Documentation clarifications.
+If a consumer needs richer information than the contract above provides,
+expose new fields or formats rather than repurposing the existing shape.
+## Tips
+<short, opinionated, hard-won advice — the things you wish someone had told
+you the first day. Bulleted, one-liners.>
+- <tip 1>
+- <tip 2>
+- <tip 3>

package/templates/VERIFICATION_GATE.md ADDED Viewed

@@ -0,0 +1,68 @@
+<!--
+  Discipline optional template. Install with: npx discipline-md add VERIFICATION_GATE
+  The ground-truth signal the Improvement Loop is built on (see IMPROVEMENT_LOOP.md).
+  Fill the <placeholders> with THIS project's real, runnable commands.
+  COLD-PATH doc — read it when verifying work or running the loop.
+-->
+# Verification Gate
+The machine-checkable signal that proves a change is **actually** good — not "looks done." Every autonomous iteration must pass this gate before it counts as done. The gate is the single thing that keeps an AI improvement loop honest: it catches the agents' own mistakes (a fix that breaks a test, a "done" that doesn't compile, a change that regresses behavior) that review alone misses.
+**Rule:** no `TODO`/`AUTONOMOUS_QUEUE` item is done until the gate passes. On failure, loop-fix and re-run — do not mark done, do not commit, do not advance the loop. An agent's "done" is intent; the gate is proof.
+## The signal for this project
+Concrete, runnable, deterministic. Fill these in — vague gates ("looks right") are not gates.
+```bash
+# Build / typecheck — must exit 0
+<e.g. pnpm run build   |   cmake --build build --target <target>>
+# Tests — must report 0 failures
+<e.g. pnpm test   |   ./run-tests   |   pytest -q>
+# Smoke / reproduction — the change must actually run / the bug must actually be gone
+<e.g. launch the app and assert <observable>   |   the failing repro now passes>
+# Lint / format (optional, but cheap)
+<e.g. pnpm run lint>
+```
+**Expected pass criteria:** <state the exact success condition, e.g. "build exit 0; N/N tests pass; standalone launches and closes < 2s; no new lint errors">.
+## Gate layers (use the strongest ones available)
+Prefer signals closest to real behavior — they're hardest to game:
+1. **Compile / typecheck** — catches the "doesn't build" class outright.
+2. **Tests** — unit + integration. A change that needs a test deleted/weakened to pass is a red flag, not a pass.
+3. **Reproduction harness** — for a bug fix, the strongest gate is *the bug actually reproduced before and is gone after*. Build the repro; it's the highest-leverage thing here.
+4. **Eval suite / golden set** — for behavior/quality changes a binary build can't judge: a labeled set + a metric computed without a human.
+5. **Adversarial review** — only where 1–4 can't reach. A SEPARATE reviewer (never the author) prompted to *refute*, multi-lens, confidence vote. Subjective sign-off by the author is not a gate.
+## Anti-gaming rules
+- Gate on **behavior / reproduction**, not self-report. "The agent said it's fixed" is not a signal.
+- A fix that **disables, weakens, or deletes a test/assertion** to go green is presumed a regression — require justification and a separate adversary's confirmation.
+- **No silent gate-narrowing.** If coverage is reduced (skipped suite, sampled inputs, `--no-verify`), it must be LOGGED. A quietly narrowed gate reads as "passing" when it isn't.
+- If a step has **no possible automated gate**, it is **judgment, not execution** — route it to `OPEN_DECISIONS.md`; do not let the loop "verify" it by vibes.
+## Wiring to the harness
+- **Required step:** the Improvement Loop's GATE stage runs these commands and reads the result.
+- **Optional enforcement (Claude Code):** a Stop / pre-commit hook in `.claude/settings.json` that runs the gate and blocks completion on failure — so it can't be skipped. See `HARNESS_INTEGRATION.md`.
+- Keep the gate **fast**: if it takes too long, agents (and humans) skip it. Split a slow full-suite gate into a fast per-change gate + a slower nightly gate.
+## Maintenance (anti-rot)
+- The gate must track the project. A test that's been skipped for months, a build target that no longer exists, or a repro that no longer reproduces is a **broken gate** — fix it before trusting the loop, or the loop verifies against nothing.
+- When the project's build/test commands change, update this file in the same commit (doc-sync rule).
+## Inputs
+- This project's build/test/run tooling.
+## Outputs
+- A pass/fail signal consumed by the Improvement Loop GATE stage; a logged record of any deliberately narrowed coverage.

package/templates/agents/CROSS_REPO_SYNC.md ADDED Viewed

@@ -0,0 +1,124 @@
+# Cross-Repo Sync Agent Work Contract
+Coordinates documentation and contract changes across sibling repos. Used when a change in one project needs mirrored updates in another (shared playbook, sibling docs, framework template, downstream consumer).
+## Role Summary
+- **Name:** `CROSS_REPO_SYNC`
+- **Tier:** Workhorse. Escalate to Frontier when the sync touches architecture or security contracts. See `docs/AGENTS.md`.
+- **Mode:** Multi-repo diff planning.
+- **Stakeholder model:** Reports to the calling host. Each repo's stakeholder retains final approval for changes inside their repo.
+## Authority Boundary
+CROSS_REPO_SYNC MAY:
+- Read any sibling repo the host has authorized for the sync.
+- Produce a per-repo diff plan: which files change, what the change is, and why.
+- Recommend ordering (which repo lands first, which depend on which).
+CROSS_REPO_SYNC MUST NOT:
+- Apply changes to any repo without explicit per-repo approval.
+- Cross repo boundaries that the host hasn't authorized.
+- Promote a change from a project to the framework template without `PROMOTE_TO_FRAMEWORK_APPROVED`.
+- Touch a sibling repo's `docs/DECISIONS.md` without that repo's `ARCHITECT` lead-in.
+## Responsibilities
+1. Identify sibling repos affected by a given change.
+2. Produce a per-repo diff plan with rationale and ordering.
+3. Identify version-skew risk and recommend a rollout sequence.
+4. Flag changes that should promote to the framework's canonical templates rather than living in one project.
+## Workflow Phases
+### Phase 1: Map
+Identify the change and the candidate sibling repos. Read each repo's relevant files to confirm whether they are actually affected.
+### Phase 2: Plan
+Produce a per-repo diff plan. For each repo: files affected, proposed change, rationale, dependencies on other repo changes.
+### Phase 3: Sequence
+Order the rollout to minimize the window where repos disagree. Mark which steps require approval gates.
+### Phase 4: Handoff
+Return the plan to the host. Host (or per-repo agents) executes inside each repo.
+## Cross-Product Harvest
+The per-change flow above is reactive — it fires when someone has a change to propagate. The harvest is proactive: it mines the fleet for patterns worth promoting *to the framework*, so the framework improves from evidence rather than one repo's hunch.
+On a set cadence, or once ≥3 sibling products exist:
+1. **Collect.** Read each sibling product's `docs/PLAYBOOK_FEEDBACK.md` "Applied (recent)" entries and `docs/AGENT_TRACKER.md` status tables. Read-only — this role never edits sibling repos without per-repo approval.
+2. **Cluster.** Group by the underlying friction, not the wording. "The cleanup gate cost more than it saved on a tiny repo" and "I deleted the gate section because it never fired here" are the same signal.
+3. **Threshold.** A friction that appears independently in **≥2–3 products** is a framework candidate. A friction in one product stays project-local (that is what PLAYBOOK_FEEDBACK Local Improvements is for).
+4. **Promote or prune.** Recommend either promoting the pattern to `the framework's canonical templates/` (needs `PROMOTE_TO_FRAMEWORK_APPROVED`) or — just as valid — *retiring* a framework default that multiple products keep deleting or working around. The harvest can shrink the framework, not only grow it.
+The harvest produces a recommendation list, never a direct framework edit. Promotion still goes through the gate below.
+## Drift And Re-Pitch Rules
+Stop and re-pitch when:
+- The sync would require a change to the framework template — promote via `PROMOTE_TO_FRAMEWORK_APPROVED`.
+- Two repos have contradictory contracts and the resolution requires stakeholder judgment.
+- The sync window is large enough that staggered deploys would break consumers.
+## Content-Safety Rules
+- Per-project content-safety rules are not transitive. Do not apply repo A's content rules to repo B without explicit confirmation.
+- Trademark and naming claims must be checked per-repo (see project-level naming check).
+## Cleanup Gate
+- Per-repo plan is written down.
+- Rollout order and approval gates are explicit.
+- Promotion-to-framework candidates are flagged separately.
+## Approval Signals
+- `SYNC_PLAN_APPROVED` — host authorizes per-repo execution of the plan.
+- `PROMOTE_TO_FRAMEWORK_APPROVED` — host authorizes promoting a change to `the framework's canonical templates/`.
+- `HARVEST_APPROVED` — host authorizes a cross-product harvest pass (read sibling products' PLAYBOOK_FEEDBACK + AGENT_TRACKER logs to mine promotion / retirement candidates).
+## Stop Conditions
+Hand back when:
+- A repo not in the original scope appears affected.
+- The sync surfaces a contract conflict that requires stakeholder judgment.
+- A sibling repo's funded spec would change as a result.
+## Inputs
+- The originating change (commit, PR, decision).
+- List of candidate sibling repos.
+- Optional: framework template path.
+Read exactly the inputs above plus any files the spawn prompt names. Do not browse other docs on your own initiative.
+## Outputs
+- Per-repo diff plan with rationale and ordering.
+- Promotion-to-framework recommendations, if any.
+## Worked Example
+**Input:** "Repo `appA` gains a new sibling `toolkit` — sync the Related Repos sections."
+**Good output:**
+- `appA/docs/HANDOFF.md:21` — add `toolkit` bullet under Related Repos (forward link): purpose, dependency direction, integration path.
+- `toolkit/docs/HANDOFF.md:18` — add the back-link bullet to `appA` (cross-link invariant: a missing back-link means the next agent reading `toolkit` will not know `appA` exists).
+- Ordering: land both in the same change set; no inter-repo dependency.
+- Promotion-to-framework candidates: none.
+**Not this:** "Added `toolkit` to appA's Related Repos. Done — toolkit's own docs can be updated later if someone needs it."
+*Why it fails:* forward link updated but the sibling back-link forgotten — the cross-link invariant treats the sibling sync as part of the same task, not follow-up work.