discipline-md 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (50) hide show
  1. package/LICENSE +21 -0
  2. package/README.md +80 -0
  3. package/bin/discipline.js +587 -0
  4. package/package.json +40 -0
  5. package/templates/.claude/settings.json +58 -0
  6. package/templates/AGENTS.md +463 -0
  7. package/templates/AGENT_TRACKER.md +138 -0
  8. package/templates/API_REFERENCE.md +131 -0
  9. package/templates/ARCHITECTURE.md +89 -0
  10. package/templates/ASSETS.md +90 -0
  11. package/templates/AUTONOMOUS_QUEUE.md +119 -0
  12. package/templates/BUILD_PLAN.md +89 -0
  13. package/templates/CHANGELOG.md +90 -0
  14. package/templates/CLAUDE.md +89 -0
  15. package/templates/CREDITS.md +109 -0
  16. package/templates/DATA_MODEL.md +88 -0
  17. package/templates/DECISIONS.md +120 -0
  18. package/templates/DEPLOYMENT.md +342 -0
  19. package/templates/HANDOFF.md +289 -0
  20. package/templates/IMPROVEMENT_LOOP.md +103 -0
  21. package/templates/INVESTIGATION.md +154 -0
  22. package/templates/LICENSE +68 -0
  23. package/templates/NOTICE +55 -0
  24. package/templates/OPEN_DECISIONS.md +61 -0
  25. package/templates/PLAYBOOK_FEEDBACK.md +87 -0
  26. package/templates/PROJECT_CONTEXT.md +91 -0
  27. package/templates/README.md +60 -0
  28. package/templates/ROADMAP.md +96 -0
  29. package/templates/SECURITY_AUDIT.md +235 -0
  30. package/templates/SETUP.md +162 -0
  31. package/templates/SPEC.md +105 -0
  32. package/templates/SPEC_WORKFLOW.md +173 -0
  33. package/templates/TODO.md +118 -0
  34. package/templates/USAGE.md +153 -0
  35. package/templates/VERIFICATION_GATE.md +68 -0
  36. package/templates/agents/CROSS_REPO_SYNC.md +124 -0
  37. package/templates/agents/DEBUGGER.md +112 -0
  38. package/templates/agents/PLANNER.md +111 -0
  39. package/templates/agents/README.md +64 -0
  40. package/templates/agents/RECON.md +99 -0
  41. package/templates/agents/SECURITY_REVIEWER.md +123 -0
  42. package/templates/agents/SPEC_ARCHITECT.md +133 -0
  43. package/templates/agents/STAKEHOLDER.md +197 -0
  44. package/templates/agents/_TEMPLATE.md +116 -0
  45. package/templates/agents/optional/ARCHITECT.md +109 -0
  46. package/templates/agents/optional/BACKEND_IMPACT.md +108 -0
  47. package/templates/agents/optional/DOC_AUDIT.md +108 -0
  48. package/templates/agents/optional/FRONTEND_IMPACT.md +109 -0
  49. package/templates/agents/optional/QUEUE_CURATOR.md +114 -0
  50. package/templates/agents/optional/TEST_STRATEGIST.md +107 -0
@@ -0,0 +1,138 @@
1
+ <!--
2
+ TEMPLATE: AGENT_TRACKER.md (Tier B3 — empirical agent-outcome log)
3
+
4
+ WHEN TO USE:
5
+ - In any project that delegates work to multiple model tiers (Haiku / Sonnet / Opus
6
+ or future equivalents) and wants tier-routing decisions backed by data, not vibes.
7
+ - Pairs with an AGENTS.md / PLAYBOOK_FEEDBACK.md tier-routing doc — this file is
8
+ the EVIDENCE; AGENTS.md is the POLICY derived from the evidence.
9
+
10
+ HOW TO USE:
11
+ 1. Copy this file into the project's `docs/` directory as `AGENT_TRACKER.md`.
12
+ 2. Replace placeholder example rows under "Entries" with real attempts as they happen.
13
+ 3. Log immediately while context is hot — a half-day later the failure mode is
14
+ already vague.
15
+ 4. Apply the Two-Failure Rule and the Three-Clean-Runs Graduation as written below.
16
+ 5. Do NOT clean up old entries — the historical record IS the value.
17
+ -->
18
+
19
+ # Agent Task Tracker
20
+
21
+ ## Purpose
22
+
23
+ Empirical log of subagent task outcomes by **(model, task-type)**. The data
24
+ drives tier-routing decisions in `docs/AGENTS.md` (or the project's equivalent
25
+ policy doc) instead of being made on intuition. This file is the *evidence*;
26
+ the policy doc is the *conclusion drawn from the evidence*.
27
+
28
+ The tracker exists because the right tier for a task is empirical, not
29
+ predicted: a reasoning-heavy task that "should" take Opus may turn out to be
30
+ fine on Sonnet for THIS codebase, or a "simple" recon task may keep failing
31
+ on Haiku for THIS file layout. Without a record, every routing decision is a
32
+ fresh argument.
33
+
34
+ ## What gets logged
35
+
36
+ - **Every clear failure** — hallucinated paths, incorrect output format,
37
+ ignored constraints, fix-broke-tests, false-positive reports, runaway scope.
38
+ Log immediately while context is hot.
39
+ - **Every experimental-scope task** — outcome regardless of success/failure.
40
+ Trials need positive evidence too, not just failure data.
41
+ - **Frontier-tier (Opus) decision overrides** — when an architect /
42
+ security-reviewer / planner makes a binding recommendation and the user
43
+ explicitly picks something else. These accumulate qualitatively to flag
44
+ patterns ("the architect keeps recommending X when the user wants Y").
45
+ - **Notable partials** — subagent shipped most of the spec but missed
46
+ something subtle that took coordinator follow-up.
47
+
48
+ ## What does NOT get logged
49
+
50
+ - Routine clean runs by validated-tier models on validated-scope tasks.
51
+ Voluminous, no decision value. Absence of failures IS the data.
52
+ - Coordinator (host) actions and decisions — those live in commit messages
53
+ and PR diffs, not here. This file tracks SUBAGENT outcomes.
54
+ - Subjective preferences ("I'd have phrased it differently"). Only objective
55
+ failure modes — code that broke, output that contradicted constraints,
56
+ paths that didn't exist.
57
+
58
+ ## Schema
59
+
60
+ ```
61
+ - YYYY-MM-DD — model — task-type — outcome — context
62
+ ```
63
+
64
+ - **date**: ISO date (`YYYY-MM-DD`) of when the attempt happened.
65
+ - **model**: exact model name — `<model-tier>-<version>` (e.g. `haiku-4.5`,
66
+ `sonnet-4.6`, `opus-4.7`). Name versions exactly so future routing can
67
+ distinguish a current-version success from a prior-version failure.
68
+ - **task-type**: short tag drawn from the project's named-subagent-role names
69
+ where possible: `recon` / `planner` / `architect` / `security-reviewer` /
70
+ `backend-impact` / `frontend-impact` / `test-strategist` / `cross-repo-sync`
71
+ / `doc-audit` / `queue-curator` / `implementation` / (or ad-hoc tags like
72
+ `regex-sweep`, `single-test-add`, `helper-extract`).
73
+ - **outcome**: one of:
74
+ - `ok` — task completed correctly, no coordinator follow-up needed.
75
+ - `failed: <one-line mode>` — concrete failure (hallucinated path, ignored
76
+ constraint, false report, etc.).
77
+ - `partial: <one-line issue>` — mostly worked but coordinator had to fix or
78
+ extend something.
79
+ - `overridden: <one-line — what user picked instead>` — Frontier-tier
80
+ recommendation reversed by user.
81
+ - **context**: one sentence. What the work was, what session/PR if available,
82
+ anything load-bearing for future readers.
83
+
84
+ ## Decision rules
85
+
86
+ Both rules below are computed by **counting entry rows** for the `(model,
87
+ task-type)` pair — never from memory or general impressions of how a model
88
+ has been doing.
89
+
90
+ ### Two-Failure Rule
91
+
92
+ Two failures on a `(model, task-type)` pair within the trial period contracts
93
+ that combination back to the next-tier-up. Add a one-line entry in the policy
94
+ doc's "Applied (recent)" section noting the scope contraction.
95
+
96
+ The rule is "two within the trial period," not "two ever" — a model upgrade
97
+ or a substantial codebase shift resets the period.
98
+
99
+ ### Three-Clean-Runs Graduation
100
+
101
+ Three consecutive clean runs (no `failed`, no `partial`) on an experimental
102
+ `(model, task-type)` pair graduates that combination to the "validated scope"
103
+ list in the policy doc. Same — record the graduation in the project's
104
+ PLAYBOOK_FEEDBACK.md or equivalent.
105
+
106
+ ### Repeated Frontier Overrides
107
+
108
+ Three or more `overridden` entries against the same model + same kind of
109
+ decision within a short window flags the work for paired-design treatment
110
+ instead of a one-shot subagent call. The signal: the decision has more
111
+ context than fits in a focused subagent prompt.
112
+
113
+ ## Current Status
114
+
115
+ A point-in-time snapshot of which `(model, task-type)` pairs sit in which
116
+ tier. Update whenever the Two-Failure Rule or Three-Clean-Runs Graduation
117
+ triggers. Keep it short — one row per pair, most-changed-recent first.
118
+
119
+ | Model | Task type | Tier | Notes |
120
+ |-----------------|------------------|--------------|-------------------------------------------|
121
+ | <model-vX.Y> | <task-type> | validated | <e.g. graduated YYYY-MM-DD after 3 clean> |
122
+ | <model-vX.Y> | <task-type> | experimental | <e.g. trial began YYYY-MM-DD> |
123
+ | <model-vX.Y> | <task-type> | deprecated | <e.g. contracted YYYY-MM-DD after 2 fails>|
124
+
125
+ ## Entries
126
+
127
+ (most-recent first; append at top after the heading)
128
+
129
+ ### YYYY-MM-DD
130
+
131
+ - <YYYY-MM-DD> — <model-vX.Y> — <task-type> — `ok` — <one-sentence context: what work, what session/PR>
132
+ - <YYYY-MM-DD> — <model-vX.Y> — <task-type> — `failed: <concrete failure mode>` — <one-sentence context>
133
+ - <YYYY-MM-DD> — <model-vX.Y> — <task-type> — `partial: <what was missed>` — <one-sentence context>
134
+ - <YYYY-MM-DD> — <model-vX.Y> — <task-type> — `overridden: <what user picked instead>` — <one-sentence context>
135
+
136
+ ### Older entries
137
+
138
+ (none — tracker started <YYYY-MM-DD>)
@@ -0,0 +1,131 @@
1
+ # API Reference
2
+
3
+ Update this document when endpoints, request/response shapes, auth behavior, rate limits, or external integrations change.
4
+
5
+ ## Conventions
6
+
7
+ - **Base URL / path prefix:** `https://api.example.com` (or `/api`)
8
+ - **Versioning:** versioned via path (`/v1/...`), header (`X-API-Version`), or unversioned. Note the strategy.
9
+ - **Content type:** `application/json` by default; note exceptions (streaming, multipart, non-JSON) per endpoint.
10
+ - **Authentication:** e.g. `Authorization: Bearer <token>`, session cookie, API key header. State the format once here; per-endpoint sections only flag deviations.
11
+
12
+ Adjust headings for non-REST surfaces (GraphQL → "Schema", "Operations", "Subscriptions"; tRPC → "Routers", "Procedures").
13
+
14
+ ## Rate Limiting
15
+
16
+ If the API enforces rate limits, document the matrix once here so per-endpoint sections only flag deviations.
17
+
18
+ | Route group | Limit | Window | 429 response shape |
19
+ |---|---|---|---|
20
+ | Auth (login, register, reset) | 10 | 15 min | `{ "error": "rate_limited", "retry_after": <seconds> }` |
21
+ | Public reads | 60 | 1 min | (same) |
22
+
23
+ Delete this section if the API has no rate limiting.
24
+
25
+ ## Public Endpoints
26
+
27
+ (Endpoints that do not require authentication.)
28
+
29
+ ### `METHOD /path`
30
+
31
+ Purpose:
32
+
33
+ Auth required: no
34
+
35
+ Request:
36
+
37
+ ```json
38
+ {}
39
+ ```
40
+
41
+ Response on success:
42
+
43
+ ```json
44
+ {}
45
+ ```
46
+
47
+ Errors:
48
+
49
+ | Status | Code | Meaning |
50
+ |---|---|---|
51
+ | 400 | `invalid_input` | request body failed validation |
52
+ | 404 | `not_found` | resource does not exist |
53
+
54
+ Validation:
55
+
56
+ - `field`: rule (e.g. min length 3, regex, allowed values).
57
+
58
+ Privacy / never-returned fields:
59
+
60
+ - Fields the response intentionally omits even though they exist on the underlying entity.
61
+
62
+ ## Authenticated Endpoints
63
+
64
+ (Endpoints that require auth. State the auth method once at the top of this group if it differs from the global Conventions.)
65
+
66
+ ### `METHOD /path`
67
+
68
+ Purpose:
69
+
70
+ Auth required: yes
71
+
72
+ Request:
73
+
74
+ ```json
75
+ {}
76
+ ```
77
+
78
+ Response on success:
79
+
80
+ ```json
81
+ {}
82
+ ```
83
+
84
+ Response on conflict / locked / partial state (if applicable, show distinct shapes):
85
+
86
+ ```json
87
+ {}
88
+ ```
89
+
90
+ Errors:
91
+
92
+ | Status | Code | Meaning |
93
+ |---|---|---|
94
+ | 401 | `unauthorized` | missing or invalid token |
95
+ | 403 | `forbidden` | authenticated but not allowed |
96
+ | 409 | `conflict` | state conflict (e.g. already claimed) |
97
+
98
+ Validation:
99
+
100
+ - `field`: rule.
101
+
102
+ Privacy / never-returned fields:
103
+
104
+ - (list, or "none — see Conventions")
105
+
106
+ Rate limit (if endpoint-specific): see Rate Limiting table above, or override here.
107
+
108
+ ## Streaming / Non-JSON Endpoints (optional)
109
+
110
+ For endpoints returning bytes, server-sent events, multipart, or other non-JSON shapes. Document content-type, framing, and termination behavior per endpoint.
111
+
112
+ ### `METHOD /path`
113
+
114
+ Content-Type:
115
+
116
+ Stream framing / terminator:
117
+
118
+ Auth required:
119
+
120
+ Failure behavior (mid-stream errors):
121
+
122
+ ## External Integrations
123
+
124
+ For each third-party API, queue, mail provider, etc.:
125
+
126
+ - Service:
127
+ - Purpose:
128
+ - Env vars:
129
+ - Retry / timeout behavior:
130
+ - Data leaving the system (what fields are sent out):
131
+ - Failure behavior (what the user sees if the integration is down):
@@ -0,0 +1,89 @@
1
+ # Architecture
2
+
3
+ Update this document when file layout, runtime flow, frontend/backend structure, major dependencies, or testing strategy changes.
4
+
5
+ ## Overview
6
+
7
+ Describe the system at a high level. One paragraph that another developer can read cold and orient from.
8
+
9
+ ## Runtime Architecture
10
+
11
+ If the project has a single runtime, one fenced block is enough. If there are multiple (production vs. local dev, host vs. embedded, server vs. CLI), draw each separately so readers don't conflate them.
12
+
13
+ Production runtime:
14
+
15
+ ```text
16
+ client -> server -> database/external services
17
+ ```
18
+
19
+ Alternate runtime (local dev, offline mode, embedded, etc. — delete if N/A):
20
+
21
+ ```text
22
+ client -> local server -> in-memory or file-backed state
23
+ ```
24
+
25
+ ## Folder / Module Map
26
+
27
+ - `path/`: Purpose.
28
+ - `path/file.ext`: Purpose.
29
+
30
+ If the project has clear architectural layers (frontend / backend, host / plugin, core / adapters), group entries by layer with sub-headings rather than listing flat. Flat lists work for small projects; layered groupings scale better.
31
+
32
+ ## Key Flows
33
+
34
+ List the 3–7 user- or system-visible flows that matter — not every flow, just the ones a contributor needs to know about. Examples: a registration/login flow, a primary user action, a critical background job, a deploy flow.
35
+
36
+ ### Flow Name
37
+
38
+ 1. Step one.
39
+ 2. Step two.
40
+ 3. Step three.
41
+
42
+ ### Second Flow Name
43
+
44
+ 1. Step one.
45
+ 2. Step two.
46
+
47
+ ## Dependencies
48
+
49
+ - Dependency: why it is used.
50
+
51
+ Distinguish runtime dependencies from build-time / dev-time dependencies if the distinction matters for the project (e.g. things shipped to clients vs. things that only run on CI).
52
+
53
+ ## Security Boundaries
54
+
55
+ Identify the trust boundaries in the system. Generic prompts:
56
+
57
+ - **Secrets that must stay server-side** — credentials, API keys, signing keys, anything in `.env`.
58
+ - **Data that must never reach clients or logs** — password hashes, reset tokens, internal user IDs, raw third-party payloads.
59
+ - **Trust boundaries between components** — what each service or layer is allowed to assume about its inputs vs. what it must validate.
60
+
61
+ Delete this section if the project genuinely has no security surface (a pure-function library, a static art piece with no input). Otherwise, fill it in honestly — the absence of this section is a tell that boundaries weren't thought through.
62
+
63
+ ## Testing Strategy
64
+
65
+ Unit test command:
66
+
67
+ ```bash
68
+ # command here
69
+ ```
70
+
71
+ What unit tests cover:
72
+
73
+ - Pure helper logic.
74
+ - Important edge cases.
75
+ - High-risk behavior.
76
+
77
+ ### Coverage Gaps
78
+
79
+ What is intentionally not covered yet, and why:
80
+
81
+ - Gap or deferred test area — short reason (manual smoke test suffices for now / blocked on infra / low priority).
82
+
83
+ ## Tradeoffs
84
+
85
+ - Tradeoff or constraint that future maintainers should understand.
86
+
87
+ ## Future Architecture Notes
88
+
89
+ - Possible improvement that is intentionally deferred.
@@ -0,0 +1,90 @@
1
+ # Assets
2
+
3
+ Use this document to track media assets used by the project — images, audio, video, icons, fonts, downloadable files, fonts, etc. Update it whenever code references new assets or asset paths/formats change.
4
+
5
+ **Skip or delete this file if the project ships no static assets** (a pure-logic library, a service with no media). The framework is fine without it.
6
+
7
+ ## Asset Locations
8
+
9
+ Distinguish between optimized/built assets (what ships) and source/master assets (what authoring tools produce):
10
+
11
+ ```text
12
+ path/to/assets/ -- optimized, served to clients
13
+ path/to/assets-source/ -- masters; may live outside the repo if files are large
14
+ ```
15
+
16
+ If masters live outside the repo (Git LFS, object storage, designer's drive), say so explicitly so contributors know where to find them.
17
+
18
+ ## Hosting Considerations
19
+
20
+ - Repo size: large media inflates clones; check repo size before committing big files.
21
+ - Alternatives: object storage, CDN, Git LFS, or a separate assets repo.
22
+ - Build step: if assets are processed (resized, compressed, sprite-sheeted) by a build step, document the command in `docs/DEPLOYMENT.md` or `docs/ARCHITECTURE.md`.
23
+
24
+ ## Naming Rules
25
+
26
+ - Use lowercase filenames.
27
+ - Use hyphens instead of spaces (`release-name-cover.jpg`, not `Release Name Cover.jpg`).
28
+ - Keep filenames URL-safe (no spaces, apostrophes, or special characters that need escaping in HTML / URLs).
29
+ - Prefer descriptive names that survive renames elsewhere in the codebase.
30
+ - Avoid committing huge raw source files unless required.
31
+
32
+ ## Current Asset Slots
33
+
34
+ Group entries by asset category when there are many — e.g. `## Images`, `## Audio`, `## Fonts`. A single flat list works for small projects.
35
+
36
+ ### Asset Name
37
+
38
+ Asset type: image / audio / video / font / icon / downloadable
39
+
40
+ Canonical path (optimized, what ships):
41
+
42
+ ```text
43
+ path/to/asset.ext
44
+ ```
45
+
46
+ Source / master copy (delete if N/A):
47
+
48
+ ```text
49
+ path/to/source/asset.psd
50
+ ```
51
+
52
+ Used by:
53
+
54
+ - `path/to/file.ext`
55
+
56
+ Purpose:
57
+
58
+ - Describe what the asset is used for.
59
+
60
+ Recommended preparation (depends on asset type):
61
+
62
+ - **Images**: dimensions, format, target file size, color profile, alt text.
63
+ - **Audio**: bitrate, sample rate, channels (mono / stereo), encoding, peak/loudness target.
64
+ - **Video**: resolution, codec, container, target bitrate.
65
+ - **Fonts**: weights/styles included, formats (`.woff2`, `.ttf`), license, glyph subsetting.
66
+ - **Icons**: format (SVG / PNG sprite), size sets, monochrome vs. multicolor.
67
+
68
+ Fallback behavior:
69
+
70
+ - Describe what happens if the asset is missing or fails to load.
71
+
72
+ ## Access-Controlled / Server-Mediated Assets (optional)
73
+
74
+ For projects where some assets are not served as static files but are fetched through an authenticated endpoint (paywalled media, signed URLs, gated downloads):
75
+
76
+ ### Asset Name
77
+
78
+ Endpoint: `GET /api/...`
79
+
80
+ Auth model: who is allowed to fetch this and how it's verified.
81
+
82
+ What the client sees: the response shape, content-type, framing.
83
+
84
+ What stays server-side: the raw file path, signing keys, access logs.
85
+
86
+ Skip this section if all assets are public static files.
87
+
88
+ ## Source Files
89
+
90
+ Describe whether source/master files should be committed or kept outside the repo. Fold into per-asset "Source / master copy" entries above when convenient; keep this section for repo-wide policy if the rule is the same across all assets.
@@ -0,0 +1,119 @@
1
+ # Autonomous Queue
2
+
3
+ Priority-ordered list of tasks pre-approved for unsupervised AI work. The autonomous host pulls from the top. Anything not in this queue requires user input even if tagged `[autonomy: safe]` in `docs/TODO.md`.
4
+
5
+ This file is curated, not auto-generated. Add an entry when a TODO item is genuinely safe to land without user review beyond the standard PR. Remove an entry when it ships (and capture it in `docs/CHANGELOG.md`).
6
+
7
+ ## Autonomy Tag Legend (inlined for self-containment)
8
+
9
+ This queue is intentionally self-contained: the same tag scheme is also documented in `docs/AGENTS.md`, but inlining it here means a fresh reader (human or agent) can act on the queue without hopping files. If the two ever drift, AGENTS.md wins — re-sync this section.
10
+
11
+ - `[size: XS|S|M|L|XL]` — XS = under ~15 min. S = under an hour. M = a focused session. L = multi-session / design-required. XL = epic.
12
+ - `[tier: haiku|sonnet|opus]` — model tier that should drive implementation. (Or model-agnostic `recon|workhorse|frontier`.)
13
+ - `[risk: low|med|high]` — blast radius if the change goes wrong.
14
+ - `[scope: isolated|cross-cutting|infra]` — single subsystem / multi-subsystem / build+deploy+secrets.
15
+ - `[autonomy: safe|review|needs-human-collab]` — `safe` = autonomous host may pick up and open a normal PR. `review` = host may implement but MUST open a DRAFT PR for human merge. `needs-human-collab` = host must NOT start; requires explicit user approval and a paired session.
16
+
17
+ Default for unmarked entries: `[size: S][tier: sonnet][risk: low][scope: isolated][autonomy: safe]`. Only call out tags that differ.
18
+
19
+ ## Tier Selection Hints
20
+
21
+ Each queue entry carries a **Suggested agent tier** field that tells the autonomous host (or the user reading the queue) how to dispatch the work. Controlled vocabulary:
22
+
23
+ - `Haiku` — recon, simple file lookups, status reads, mechanical find-replace, single-line tweaks, doc-consistency sweeps.
24
+ - `Sonnet` — most implementation work: feature additions, refactors within a single subsystem, test writing, doc updates, CHANGELOG/DECISIONS entries.
25
+ - `Opus` — ambiguous design decisions, security-critical work, architectural choices, cross-repo refactors, non-obvious debugging, anything destined for `docs/DECISIONS.md`.
26
+ - `subagent: <role>` — specifically dispatched to a named subagent rather than running on the host. Standard roles: `recon`, `planner`, `architect`, `debugger`, `doc-audit`, `test-strategist`, `security-reviewer`, `cross-repo-sync`, `backend-impact`, `frontend-impact`, `queue-curator`. See `docs/AGENTS.md` for each role's contract.
27
+
28
+ **Mapping table — task type → suggested tier:**
29
+
30
+ | Task type | Suggested tier |
31
+ |---|---|
32
+ | File / symbol lookup, "where is X used" | `subagent: recon` (Haiku) |
33
+ | Doc-consistency audit after rename | `subagent: doc-audit` (Haiku) |
34
+ | Single-line tweak, default value bump, copy edit | `Haiku` |
35
+ | Single-tool feature, test additions, doc update | `Sonnet` |
36
+ | Refactor within one subsystem | `Sonnet` |
37
+ | Cross-repo edit (palette swap, link insertion) | `subagent: cross-repo-sync` (Sonnet, worktree) |
38
+ | Pre-implementation plan for `[size: M\|L]` work | `subagent: planner` (Opus) |
39
+ | Architecture decision for `docs/DECISIONS.md` | `subagent: architect` (Opus) |
40
+ | Auth / token / rate-limit / access-control review | `subagent: security-reviewer` (Opus) |
41
+ | Non-obvious failure investigation | `subagent: debugger` (Opus) |
42
+ | Ambiguous design choice, paired-design candidate | `Opus` (or escalate to direct Frontier session) |
43
+
44
+ When in doubt, prefer a subagent (context isolation + structured output) over running on the host directly.
45
+
46
+ ## Queue Format
47
+
48
+ Each entry is a one-line pointer into `docs/TODO.md` (or `docs/ROADMAP.md`) plus its tags, suggested tier, and a short note on why it is safe.
49
+
50
+ ```
51
+ - [ ] <short title> — <where in TODO/ROADMAP> — [tags] — tier: <Suggested agent tier> — note
52
+ ```
53
+
54
+ **Example shape** (placeholders, not real entries):
55
+
56
+ ```
57
+ - [ ] Add <feature flag> for <scope> — `docs/TODO.md` §<section> — `[size: S][tier: sonnet][risk: low][scope: isolated][autonomy: safe]` — tier: Sonnet — UI-only, behind a default-off flag; no schema or migration impact.
58
+ - [ ] Audit stale references after <rename> — `docs/TODO.md` §Hygiene — `[size: XS][risk: low][scope: cross-cutting]` — tier: subagent: doc-audit — read-only sweep across `docs/` and any code with hard-coded paths.
59
+ - [ ] Plan <medium feature> before implementation — `docs/ROADMAP.md` §<milestone> — `[size: M][risk: med][scope: isolated][autonomy: review]` — tier: subagent: planner — Frontier-tier plan first; implementation tier drops to Sonnet once the plan lands.
60
+ - [ ] Review <auth-adjacent change> before merge — `docs/TODO.md` §<section> — `[size: S][risk: high][scope: isolated][autonomy: review]` — tier: subagent: security-reviewer — DRAFT PR; security-reviewer must sign off before human merge.
61
+ ```
62
+
63
+ ## Active Queue
64
+
65
+ Order = priority. Top of list runs first. Each entry points at a `docs/TODO.md` bullet that carries its own tag block; the line below is a human-readable summary so the queue reads at a glance.
66
+
67
+ <!-- Populate this section by hand. Empty queue = autonomous host has nothing to pick up; surface to the user and wait. -->
68
+
69
+ - [ ] (none yet — populate when ready to enable autonomous runs)
70
+
71
+ When the host finishes one of these, it should remove the entry from this list, update `docs/TODO.md` (delete the shipped entry per the cleanup gate), add a `docs/CHANGELOG.md` line, and open a PR. Do not pull the next item until review of the previous PR is complete unless the user explicitly opts into batch mode.
72
+
73
+ ## Recurring: Product-Outcome Retro
74
+
75
+ A standing, opt-in entry that closes the loop on the *product itself* — not just the workflow docs. Most of this queue tracks work to ship; this entry checks whether shipped work moved the numbers the pitch promised.
76
+
77
+ On a set cadence (`<weekly | monthly | per-milestone>`), the host runs a retro pass:
78
+
79
+ 1. **Pull the targets.** Read the success metrics and kill criteria from the funded spec (`docs/PROJECT_CONTEXT.md` / the pitch). These were defined at funding time and are rarely revisited.
80
+ 2. **Compare to reality.** Check them against what actually shipped and what the product is actually doing — usage, revenue, support load, whatever the pitch named as the winning metric.
81
+ 3. **Route the verdict:**
82
+ - **On track** — note it; no action.
83
+ - **Underperforming but fixable** — file the specific fix as a `docs/TODO.md` entry.
84
+ - **A kill criterion is met** — stop adding scope and surface a `STAKEHOLDER` re-pitch/kill decision in `docs/OPEN_DECISIONS.md`. Do not autonomously keep building a product the evidence says should wind down.
85
+
86
+ This pass is `[autonomy: review]` at most: it produces a verdict and queued items, never a silent pivot or a kill. The kill/continue call is the stakeholder's. Pairs with the Low-Effort Stakeholder mode's walk-away test in `docs/agents/STAKEHOLDER.md`.
87
+
88
+ Example entry:
89
+
90
+ - [ ] Product-outcome retro (`<month>`) — `docs/PROJECT_CONTEXT.md` §Success Metrics — `[size: S][tier: opus][risk: low][scope: isolated][autonomy: review]` — tier: Opus — check shipped work vs. success metrics + kill criteria; route to TODO / OPEN_DECISIONS.
91
+
92
+ ## Out-of-Queue (Explicit Holdouts)
93
+
94
+ Items that look picky-uppable but are deliberately NOT in the queue right now. Each entry MUST note the rationale so a future session does not silently promote them. Even items tagged `[autonomy: safe]` belong here when there is a project-specific reason to gate them behind a human.
95
+
96
+ **Format:**
97
+
98
+ ```
99
+ - **<short title>** — `<doc reference>` — `[tags]` — <why held out>
100
+ ```
101
+
102
+ **Common holdout categories** (drawn from real-project shape):
103
+
104
+ - **Strategy / tuning parameters** — anywhere choice-of-parameter carries edge or expresses product judgment (gap thresholds, MA windows, ranking weights, rate-limit numbers). Operator should weigh in before the values bake into shipped behavior.
105
+ - **Live-trading / production toggles** — any `LIVE=1` / `PROD=1` / "talk to real money or real users" code path. Production gate; never autonomously promoted regardless of tag.
106
+ - **Security-sensitive defaults** — default password policy, default token lifetimes, default CORS origins, default rate limits, default auth scopes. Looks safe-to-tweak; isn't.
107
+ - **Universe / seed data choices** — picking which N items populate a curated list (sectors, presets, demo content). Judgment work the operator owns.
108
+ - **Architectural reversals** — anything that revises a prior `docs/DECISIONS.md` entry. Needs paired session.
109
+ - **Large-bundle vendor integrations** — bundle weight, license check, identity question (e.g. swapping a major dependency, adding a renderer).
110
+ - **Production deployment work** — DNS, hosting, secrets, schema migrations against live data, password-hash reseeding. Requires real credentials and human gates.
111
+ - **Investigations / research** — output is a write-up, not code; needs operator judgment on the "fits / not a fit" verdict before downstream work proceeds.
112
+ - **Operator-side actions** — anything the bot literally cannot do (account setup, real API key creation, smoke-testing against a live broker / vendor).
113
+ - **Deploy-affecting changes when the security-audit gate would fail** — any task whose merge would put new code on a public surface without a current `SECURITY_AUDIT_*.md` (≤90 days old OR covering all changes since the last audit, whichever is stricter) is held out of autonomy. Either run `security-reviewer` first and let the audit land, or wait for a human to schedule an audit pass. See `docs/DEPLOYMENT.md` Pre-Deploy Gate for the gate condition.
114
+
115
+ **Example entries** (shape only — replace with real holdouts):
116
+
117
+ - **<Strategy parameter set>** — `docs/TODO.md` §<section> — `[autonomy: needs-human-collab]` — parameter values carry edge; operator weighs in before backtest results.
118
+ - **<Live integration toggle>** — codebase-wide — production gate; never autonomously promoted.
119
+ - **<Default security setting>** — `docs/TODO.md` §<section> — `[autonomy: needs-human-collab]` — looks like a config tweak, but defaults ship to all users.
@@ -0,0 +1,89 @@
1
+ # Build Plan: <product name>
2
+
3
+ <!--
4
+ Stage B output of the Spec & Design phase (see SPEC_WORKFLOW.md). Turns the
5
+ LOCKED functional spec (SPEC.md, already SPEC_APPROVED) into the HOW: stack
6
+ chosen through explicit tradeoffs, architecture, and queue-ready stories.
7
+
8
+ This file is LOCKED by the verbatim token BUILD_PLAN_APPROVED. The Build
9
+ phase (queue execution) does not begin until it lands.
10
+ -->
11
+
12
+ **Status:** Draft — awaiting `BUILD_PLAN_APPROVED`
13
+ **Spec:** [`SPEC.md`](SPEC.md) — locked at `SPEC_APPROVED`
14
+ **Architecture:** [`ARCHITECTURE.md`](ARCHITECTURE.md)
15
+ **Last updated:** <YYYY-MM-DD>
16
+
17
+ ## Stack Decisions
18
+
19
+ <!--
20
+ One subsection per engineering fork the human weighed. Surface each as an
21
+ enumerated decision with options + pros/cons; record the chosen option and
22
+ the one-line reason. Depth follows the SPEC.md interaction tier.
23
+ -->
24
+
25
+ ### <Decision, e.g. "Datastore">
26
+
27
+ | Option | Pros | Cons |
28
+ |---|---|---|
29
+ | <option A> | <pros> | <cons> |
30
+ | <option B> | <pros> | <cons> |
31
+
32
+ **Chosen:** <option> — <one-line reason>.
33
+
34
+ ## Stories
35
+
36
+ <!--
37
+ The spec decomposed into implementation tasks, ready for AUTONOMOUS_QUEUE.md.
38
+ Each story:
39
+ - [id: S-NN] stable id
40
+ - satisfies: R#, R# the SPEC.md requirement(s) it implements
41
+ - [dep: none] parallel-safe — OR — [dep: S-01, S-02] serial
42
+ - standard tags [size:][risk:][autonomy:] — see AGENTS.md
43
+ - acceptance the [AUTO] check or [HUMAN] checklist line
44
+
45
+ The [dep:] markers ARE the build schedule: everything [dep: none] runs in
46
+ parallel; a [dep: S-xx] story waits for its predecessors.
47
+ -->
48
+
49
+ ### S-01 — <story title>
50
+
51
+ - **satisfies:** R1, R2
52
+ - `[dep: none]` `[size: S]` `[risk: low]` `[autonomy: safe]`
53
+ - **acceptance:** <the runnable check, or the human checklist line>
54
+
55
+ ### S-02 — <story title>
56
+
57
+ - **satisfies:** R3
58
+ - `[dep: S-01]` `[size: M]` `[risk: med]` `[autonomy: review]`
59
+ - **acceptance:** <the runnable check, or the human checklist line>
60
+
61
+ ## Verifier Suite
62
+
63
+ <!--
64
+ Generated from the locked spec's Acceptance Tests. One runnable check per
65
+ [AUTO] requirement (references the requirement number); a single runner that
66
+ reports pass/fail per requirement and exits non-zero on any failure.
67
+
68
+ Fallback when the stack has no natural test runner: best-effort runner the
69
+ stack DOES support (build succeeds, schema validates, links resolve, headless
70
+ smoke) + route the rest to the Human Checklist. Log what got pushed to human.
71
+ -->
72
+
73
+ - **Runner:** <command that runs all checks and exits non-zero on failure, e.g. `pnpm test`>
74
+ - **R1** `[AUTO]` → <check + where it lives, e.g. `tests/r1.spec.ts`>
75
+ - **R2** `[AUTO]` → <check>
76
+ - **Pushed to Human Checklist (no automated check available):** <list, with why>
77
+
78
+ ## Human Checklist
79
+
80
+ <Carried from SPEC.md plus anything the verifier couldn't automate. This is the
81
+ reviewer's entire post-build surface — the [AUTO] half is trusted once green.>
82
+
83
+ - [ ] **R3** — <the specific thing to confirm>
84
+
85
+ ## Open Questions
86
+
87
+ <Anything still needing a human call before BUILD_PLAN_APPROVED.>
88
+
89
+ - <open question>