@glrs-dev/cli 0.1.1 → 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (35) hide show
  1. package/CHANGELOG.md +18 -0
  2. package/dist/vendor/harness-opencode/dist/agents/prompts/pilot-builder.md +29 -4
  3. package/dist/vendor/harness-opencode/dist/agents/prompts/pilot-planner.md +26 -1
  4. package/dist/vendor/harness-opencode/dist/agents/prompts/research-auto.md +37 -0
  5. package/dist/vendor/harness-opencode/dist/agents/prompts/research-local.md +33 -0
  6. package/dist/vendor/harness-opencode/dist/agents/prompts/research-web.md +32 -0
  7. package/dist/vendor/harness-opencode/dist/agents/prompts/research.md +15 -20
  8. package/dist/vendor/harness-opencode/dist/chunk-57EOY72Y.js +174 -0
  9. package/dist/vendor/harness-opencode/dist/chunk-5TAMY7P6.js +67 -0
  10. package/dist/vendor/harness-opencode/dist/chunk-BKTFWXLG.js +204 -0
  11. package/dist/vendor/harness-opencode/dist/{chunk-XCZ3NOXR.js → chunk-CZMAJISX.js} +28 -0
  12. package/dist/vendor/harness-opencode/dist/chunk-KB7M7JXU.js +145 -0
  13. package/dist/vendor/harness-opencode/dist/chunk-RNRCXQ65.js +56 -0
  14. package/dist/vendor/harness-opencode/dist/{chunk-VVMP6QWS.js → chunk-WBBN7OVN.js} +162 -2
  15. package/dist/vendor/harness-opencode/dist/cli.js +964 -1383
  16. package/dist/vendor/harness-opencode/dist/index.js +2 -2
  17. package/dist/vendor/harness-opencode/dist/install-X5KEANRB.js +13 -0
  18. package/dist/vendor/harness-opencode/dist/paths-LT3QQKCF.js +18 -0
  19. package/dist/vendor/harness-opencode/dist/pilot/mcp/status-server.d.ts +1 -0
  20. package/dist/vendor/harness-opencode/dist/pilot/mcp/status-server.js +228 -0
  21. package/dist/vendor/harness-opencode/dist/pilot-config-7LJZ23YK.js +55 -0
  22. package/dist/vendor/harness-opencode/dist/runs-QWPL3TKV.js +18 -0
  23. package/dist/vendor/harness-opencode/dist/safety-gate-WM3EWOCY.js +10 -0
  24. package/dist/vendor/harness-opencode/dist/setup-hook-FHTXMAQL.js +88 -0
  25. package/dist/vendor/harness-opencode/dist/skills/adr/SKILL.md +328 -0
  26. package/dist/vendor/harness-opencode/dist/skills/pilot-planning/SKILL.md +41 -10
  27. package/dist/vendor/harness-opencode/dist/skills/pilot-planning/rules/decomposition.md +27 -0
  28. package/dist/vendor/harness-opencode/dist/skills/pilot-planning/rules/qa-expectations.md +120 -0
  29. package/dist/vendor/harness-opencode/dist/skills/pilot-planning/rules/self-review.md +1 -1
  30. package/dist/vendor/harness-opencode/dist/skills/pilot-planning/rules/touches-scope.md +34 -0
  31. package/dist/vendor/harness-opencode/dist/skills/pilot-planning/rules/verify-design.md +81 -13
  32. package/dist/vendor/harness-opencode/dist/tasks-KJ3WN2KY.js +32 -0
  33. package/dist/vendor/harness-opencode/package.json +1 -1
  34. package/package.json +1 -1
  35. package/dist/vendor/harness-opencode/dist/install-4EYR56OR.js +0 -9
@@ -0,0 +1,328 @@
1
+ ---
2
+ name: adr
3
+ description: "Use when drafting, revising, or reading any engineering ADR in `docs/adr/`. Encodes grounding steps, the mandatory section template, the Unspecified-interactions-vs-Open-questions rubric, the security-default-deny rule, and self-check red flags. Use when the task is to write an ADR, draft an architecture decision, produce a design doc for a schema/contract/cross-package change, propose a new table/entity, or capture a consequential decision. Do NOT draft an ADR without this skill loaded."
4
+ ---
5
+
6
+ # Engineering ADR Skill (docs/adr/)
7
+
8
+ Purpose: every engineering ADR in this repo starts from the same
9
+ opinionated foundation. Read prior ADRs in `docs/adr/` before drafting
10
+ (see Step 1) — each one's lessons compound.
11
+
12
+ This skill describes **what** to do and **how** to structure an ADR.
13
+ It deliberately does NOT prescribe a review process — how an ADR gets
14
+ scrutinized before merge is up to whoever is shipping it and whichever
15
+ harness or team workflow applies. The skill's job is to make the draft
16
+ good; the review process is a separate concern.
17
+
18
+ ## When you MUST load this skill
19
+
20
+ - Drafting a new file in `docs/adr/`.
21
+ - Revising an existing ADR (even a typo-sized change — you may trip
22
+ one of the red flags below).
23
+ - Reading an existing ADR to understand a past decision, if you need
24
+ to write a supersession or cite its pattern.
25
+
26
+ ## When this skill does NOT apply
27
+
28
+ - Product decisions (if a `docs/product/` directory exists, use that).
29
+ - LLM-feature proposals (if a dedicated template exists, use that).
30
+ - Implementation plans, task breakdowns, build sequencing — Linear
31
+ issues or plan files.
32
+ - Bug fixes, refactors, single-PR work — Linear issue, no ADR.
33
+
34
+ ## The iron rules (five rules; every ADR should honor them)
35
+
36
+ 1. **Ground before you draft.** Run the grounding checklist below
37
+ BEFORE writing the Decision section. Invented table/column/module
38
+ names are the #1 cause of ADR rework.
39
+ 2. **Section order is frozen** (see Template). Don't reorder. Don't
40
+ omit. A missing section is a signal you skipped work, not that the
41
+ work wasn't needed.
42
+ 3. **Security-sensitive capabilities DEFAULT DENY.** Every new role
43
+ grant, every new partner scope, every new cross-org read path
44
+ starts in the `off` position with an explicit, logged
45
+ per-principal enablement path. "Probably fine" is not a stance.
46
+ 4. **Cross-system couplings go in `Consequences -> Unspecified
47
+ interactions`, not `Open questions`.** See the rubric below.
48
+ 5. **"Pre-implementation codebase investigation" items must be
49
+ genuinely unknown at write time.** If it's "verify my bullets are
50
+ right", it's already your job — do it before drafting.
51
+
52
+ ## Step 1: Grounding (mandatory, before drafting)
53
+
54
+ This is not optional. Perform each step and capture the real
55
+ names/paths in a scratch note you'll use while drafting:
56
+
57
+ 1. **Discover prior ADRs.** Read existing ADRs in `docs/adr/` to
58
+ understand established conventions and patterns. If an `adr-index`
59
+ MCP tool is available, use it to find ADRs by subject-area tags.
60
+ Otherwise, list and skim the directory. Pay particular attention to
61
+ conventions in each ADR's `establishes` frontmatter — those are in
62
+ force (unless a later ADR's `supersedes:` includes it).
63
+
64
+ 2. **Read every referenced file.** For the decision you're about to
65
+ make, identify the 3-10 existing files/tables/contracts your ADR
66
+ will touch or adjoin. Read them. Copy real symbol names into your
67
+ scratch note — do not paraphrase from memory.
68
+
69
+ 3. **Grep-verify every table, column, entity, and symbol name before
70
+ it lands in the draft.** Use AST-aware symbol lookup for code
71
+ symbols where available; fall back to `grep`. An invented name in
72
+ the Decision section is the #1 cause of ADR rework.
73
+
74
+ 4. **Identify the access/tenancy story.** Is the new entity scoped to
75
+ a user, an org, global, or cross-tenant? Confirm it follows
76
+ existing access patterns and doesn't accidentally bypass them.
77
+
78
+ 5. **Identify every touched contract.** Internal vs external, file
79
+ paths, permission keys. The ADR must cite the real file paths.
80
+
81
+ 6. **Identify circuit breakers and cross-system coupling.** List every
82
+ module/table/entity whose behavior will change because of this
83
+ decision.
84
+
85
+ 7. **Decide whether this ADR warrants a follow-up project.** If the
86
+ decision produces 3+ implementable issues, file a project when the
87
+ ADR merges. Small decisions that land in one PR don't need one.
88
+
89
+ Only after these seven steps do you touch the template.
90
+
91
+ ## Step 2: Template (frozen section order)
92
+
93
+ ```markdown
94
+ ---
95
+ touches: [<coarse subject-area tags>]
96
+ establishes:
97
+ - <convention-slug-this-adr-introduces>
98
+ - <another-convention-if-any>
99
+ supersedes: [] # or [<prior-adr-filename-without-.md>] if this replaces one
100
+ ---
101
+
102
+ # ADR: <Short decision title>
103
+
104
+ ---
105
+
106
+ ---
107
+
108
+ ## 1. Context
109
+
110
+ What system state exists today, cited with real file paths + symbol
111
+ names. Who the actors/roles are. What's broken, missing, or ambiguous.
112
+ Include a "Prior art in this repo" subsection listing existing
113
+ patterns that inform or constrain the decision.
114
+
115
+ ## 2. Decision
116
+
117
+ What we will do, subsectioned by concern:
118
+
119
+ 2.1 Data model (if any — new tables/columns/enums with real names)
120
+ 2.2 Resolution / runtime semantics (pure functions, state transitions)
121
+ 2.3 External API contract (paths, verbs, schemas, file locations)
122
+ 2.4 Internal API contract (same)
123
+ 2.5 UI design (surfaces, routes, key flows, broken-state treatment)
124
+ 2.6 External integration surface (third-party APIs, adapters, etc.)
125
+ 2.7 Role-based access matrix (see iron rule #3)
126
+ 2.8 Migration strategy (new table? rename? backfill? legacy handling?)
127
+
128
+ Execution planning — merge units, task sequencing, PR boundaries —
129
+ does NOT belong in an ADR. Those are implementation concerns tracked
130
+ separately. If a project exists for the decision, the project is
131
+ where sequencing lives, not here.
132
+
133
+ ## 3. Consequences
134
+
135
+ ### Positive
136
+ ### Negative / trade-offs
137
+ ### Neutral / noted
138
+
139
+ ### Unspecified interactions with existing mechanisms
140
+ (see rubric below; this subsection is mandatory if any exist)
141
+
142
+ ## 4. Alternatives considered
143
+
144
+ Alt 1, Alt 2, ..., each with a one-paragraph rejection reason. Include
145
+ the genuinely-considered options; don't straw-man. If only one
146
+ alternative existed, this section is a red flag — you haven't
147
+ explored the decision space.
148
+
149
+ ## 5. Decision linkages
150
+
151
+ Consumers, dependencies, blockers, future extensions, what this ADR
152
+ establishes (e.g. a new convention).
153
+
154
+ ## 6. Open questions
155
+
156
+ ITERATE UNTIL EMPTY. An ADR should not merge with unresolved open
157
+ questions. Each question is either: (a) answerable now — answer it
158
+ inline and move to a "Resolved during drafting" appendix, or (b) a
159
+ blocker that requires external input — in which case the ADR is not
160
+ ready to merge. Do not use this section as a parking lot for
161
+ laziness. If you can grep the codebase or reason through the
162
+ tradeoffs to resolve a question, do it before declaring the draft
163
+ complete.
164
+
165
+ Format when all questions are resolved:
166
+ "None. All questions resolved during drafting:"
167
+ followed by a "### Resolved during drafting" subsection with
168
+ numbered answers preserving the original question for traceability.
169
+
170
+ ## 7. Pre-implementation codebase investigation
171
+
172
+ ITERATE UNTIL EMPTY. Same rule as S6. Every item here must be
173
+ resolved before the ADR merges — either by doing the investigation
174
+ during drafting (preferred) or by explicitly blocking the ADR on the
175
+ investigation. An ADR with unresolved S7 items is an ADR that will
176
+ produce wrong implementation work.
177
+
178
+ Format when all items are confirmed:
179
+ "None. All items confirmed during drafting:"
180
+ followed by a "### Resolved during drafting" subsection with
181
+ numbered findings.
182
+
183
+ ## 8. References
184
+
185
+ Every file cited, every external doc, every ticket/issue, and the
186
+ convention this ADR establishes or modifies.
187
+ ```
188
+
189
+ Sections with no content in your decision: write "Not applicable" and
190
+ one sentence explaining why. Do not delete the heading.
191
+
192
+ ### Frontmatter contract
193
+
194
+ The YAML frontmatter is the **only** machine-readable metadata on an
195
+ ADR. There is no prose header block — no `Date`, no `Authors`, no
196
+ status. The date is in the filename, authorship is in `git log`,
197
+ and whether an ADR is in force is determined by Git (on `main` = in
198
+ force; named in a later ADR's `supersedes:` = superseded).
199
+ Duplicating any of this in the body would create drift. The body
200
+ opens straight with the `# ADR: <title>` heading and goes to S1
201
+ Context.
202
+
203
+ The frontmatter carries only facts about the ADR's content, never
204
+ state or intent about implementation follow-through (whether a
205
+ project gets created, whether the decision has been acted on, etc. —
206
+ those are independently observable and don't belong here).
207
+
208
+ Rules:
209
+
210
+ - **`touches`** — inline list of coarse subject-area tags. Err toward
211
+ more tags — matching is cheap, missing a cross-reference is
212
+ expensive.
213
+ - **`establishes`** — block list of convention slugs this ADR
214
+ introduces (kebab-case; descriptive, not clever). These are what
215
+ future ADR authors discover when their decision is constrained by
216
+ conventions you set.
217
+ - **`supersedes`** — list of prior ADR filenames (without `.md`) that
218
+ this ADR replaces. Empty for most ADRs. Supersession lives in the
219
+ superseding ADR's frontmatter, not as a flag on the superseded ADR —
220
+ that one stays unchanged on `main` as a truthful historical record.
221
+
222
+ ## Rubric: Unspecified interactions vs. Open questions vs. Pre-implementation investigation
223
+
224
+ This is the most common ADR failure. Use this table:
225
+
226
+ | Item type | Goes in | Test |
227
+ |---|---|---|
228
+ | A coupling we know exists in the codebase today that this decision changes or newly touches, but we deliberately are not specifying here | `Consequences -> Unspecified interactions with existing mechanisms` | "Implementers need to know about X coupling to avoid breaking it." |
229
+ | A design sub-decision we deferred because it isn't blocking and has multiple valid answers | `Open questions` | "A reasonable person could answer this two ways and either is defensible; we'll pick one during implementation." |
230
+ | A fact we don't know yet about the codebase that must be verified before the first PR | `Pre-implementation codebase investigation` | "The answer is knowable by grepping / reading code, not by discussion." |
231
+
232
+ If an item is really "I haven't done my homework" dressed up as an
233
+ open question, it fails this rubric. Do the homework or move it to
234
+ Pre-implementation investigation with a specific grep/read
235
+ prescribed.
236
+
237
+ ## Security default-deny rule (iron rule #3, expanded)
238
+
239
+ For every capability that can:
240
+
241
+ - Write to another user's/org's data
242
+ - Stamp long-lived credentials used on outbound traffic
243
+ - Grant a partner/API-key/integration-user role any verb beyond
244
+ `read` on its own scope
245
+
246
+ the ADR must:
247
+
248
+ 1. Default to `off` (not-granted). Do not write "probably fine, worth
249
+ confirming."
250
+ 2. Specify the enablement mechanism: who grants it, where it's logged,
251
+ and how it's revoked.
252
+ 3. State the blast radius if the grant is misused (a mistaken or
253
+ compromised principal).
254
+ 4. Name the expected flow without the grant (what does the actor do
255
+ instead?).
256
+
257
+ ## Red flags — author self-check
258
+
259
+ These are common failure modes observed across ADRs. Use this list as
260
+ a self-check before you consider a draft complete.
261
+
262
+ - Any table, column, enum, or code symbol in your draft has not been
263
+ grep-confirmed against the actual codebase.
264
+ - Your Decision section says "probably fine" about a security grant.
265
+ Make it default-deny.
266
+ - You have zero alternatives in S4 beyond the chosen one.
267
+ - Your S7 "pre-implementation investigation" reads like "verify my
268
+ bullets are right." Move these to grounding and do them now.
269
+ - A coupling with existing mechanisms is not mentioned. If you
270
+ honestly looked and found none, state that.
271
+ - Your ADR introduces a new enum/channel/role/surface whose naming
272
+ collides with an existing one.
273
+ - Your S2 Decision subsections leak into execution planning — merge
274
+ units, PR boundaries, task sequencing. That belongs in issues, not
275
+ in the ADR.
276
+ - Your UI section doesn't describe the broken-state case (what
277
+ happens when a referenced entity is archived/inactive/missing).
278
+ - Your migration section doesn't describe the down() path.
279
+ - Your S6 Open questions are really S3 Unspecified interactions (they
280
+ describe *existing* couplings, not *deferred* design decisions).
281
+ - Your S6 or S7 has unresolved items. Both sections must be iterated
282
+ to empty before the ADR merges. If you can answer a question by
283
+ reading code or reasoning through tradeoffs, do it now — don't
284
+ defer to implementation what you can resolve during drafting.
285
+ - Your ADR is missing YAML frontmatter. Without frontmatter, the ADR
286
+ is invisible to discovery and future authors will rediscover your
287
+ lessons from scratch.
288
+ - A convention you introduce in S2 is not listed in `establishes:`
289
+ frontmatter. Future ADRs can't find that it exists.
290
+
291
+ ## Inline-vs-follow-on decision rubric
292
+
293
+ When you discover during drafting that a sub-decision is bigger than
294
+ you thought:
295
+
296
+ - **Inline it** if: the sub-decision touches <=3 files, introduces no
297
+ new abstractions, and doesn't shift the boundary of any existing
298
+ subsystem.
299
+ - **Follow-on ADR** if: crosses a package boundary you haven't
300
+ mapped, introduces a new abstraction (new model pattern, new
301
+ helper), or requires re-architecting an existing subsystem.
302
+ - **Resolve it now** if: you can answer the question by reading code
303
+ or reasoning through tradeoffs. S6 must be empty at merge — don't
304
+ defer what you can decide during drafting.
305
+
306
+ A follow-on ADR is cited in S5 Decision linkages as a "Blocker" or
307
+ "Future extension."
308
+
309
+ ## File placement and naming
310
+
311
+ - **Location:** `docs/adr/`.
312
+ - **Filename:** `YYYY-MM-DD-<slug>.md`. ISO date (authored date),
313
+ kebab-case slug, 3-7 words.
314
+ - **Branch name:** `docs/<slug>` or `<user>/<ticket>-<slug>` if
315
+ tracked by an issue.
316
+
317
+ ## Commit sequence
318
+
319
+ 1. Verify the frontmatter block parses (no tabs, list items use
320
+ ` - ` indent). Check that `touches` tags are meaningful and any
321
+ new conventions are listed in `establishes`.
322
+ 2. `git add docs/adr/<file>.md`
323
+ 3. Commit message: `docs(adr): <title>`.
324
+ 4. Push branch and open PR. Link the issue in the PR body if one
325
+ exists.
326
+ 5. If the decision warrants a follow-up project (per grounding step
327
+ 7), create the project on merge and link it from the ADR's S5
328
+ Decision linkages in a follow-up commit.
@@ -11,7 +11,7 @@ A good plan trades a planning-session's worth of patient thought for hours of un
11
11
 
12
12
  ## Workflow
13
13
 
14
- Apply these eight rules in order. Each rule has its own file in `rules/` for the full text:
14
+ Apply these nine rules in order. Each rule has its own file in `rules/` for the full text:
15
15
 
16
16
  1. [`first-principles.md`](rules/first-principles.md) — Frame the task FROM the user's intent, not from a templated checklist. Ask "what does the user actually want done?" before "what files might change?"
17
17
 
@@ -25,25 +25,56 @@ Apply these eight rules in order. Each rule has its own file in `rules/` for the
25
25
 
26
26
  6. [`milestones.md`](rules/milestones.md) — Optional grouping. Use when several tasks share a "is this batch done?" check (e.g. integration tests after a chunk of unit-test work).
27
27
 
28
- 7. [`self-review.md`](rules/self-review.md) — Before declaring the plan ready, run through a 7-question checklist. Find the holes yourself; the validator only catches schema errors.
28
+ 7. [`self-review.md`](rules/self-review.md) — Before declaring the plan ready, run through a 7-question checklist. Find the holes yourself; the validator only catches schema errors. And before declaring "refuse", revisit the bundle-vs-split decision below.
29
29
 
30
30
  8. [`task-context.md`](rules/task-context.md) — Every non-trivial task carries a `context:` block. Thin plans fail because the builder works each task from scratch with no carry-over; rich context pre-loads what the builder needs to work confidently. Cover outcome, rationale, code pointers, acceptance.
31
31
 
32
+ 9. [`qa-expectations.md`](rules/qa-expectations.md) — Detect → propose → confirm per-surface verify patterns for UI, API, DB, integration, browser-based component, and CLI surfaces.
33
+
32
34
  ## After applying the rules
33
35
 
34
36
  1. Save the YAML to the path returned by `bunx @glrs-dev/harness-plugin-opencode pilot plan-dir`.
35
- 2. Run `bunx @glrs-dev/harness-plugin-opencode pilot validate <path>` and fix every error / warning.
36
- 3. Hand off to the user with: `Plan saved to <path>. Next: bunx @glrs-dev/harness-plugin-opencode pilot build`.
37
+ 2. Remind the user the plan assumes their dev stack is already running (install, compose, migrate, seed). Plans no longer bootstrap their own environment.
38
+ 3. Run `bunx @glrs-dev/harness-plugin-opencode pilot validate <path>` and fix every error / warning.
39
+ 4. Hand off to the user with: `Plan saved to <path>. Next: bunx @glrs-dev/harness-plugin-opencode pilot build`.
37
40
 
38
41
  Do NOT summarize the plan in chat. The user can read the YAML.
39
42
 
43
+ ## When to bundle vs. split plans
44
+
45
+ Multi-issue cross-cutting plans are a first-class pilot shape. When a user's scope spans 2–4 related issues, default to **one plan** covering all of them — as long as they share:
46
+
47
+ - Same repo (or monorepo).
48
+ - Same package manager / install command.
49
+ - Same `docker-compose` (or equivalent local-infra) stack.
50
+ - Same test runner and verify style.
51
+ - Same migrations/seed pipeline.
52
+
53
+ Bundling amortizes setup cost (install, compose up, migrate, seed — minutes each, paid once per pilot run) across all the work. Tasks from different issues typically form disconnected subtrees in the DAG — see [`dag-shape.md`](rules/dag-shape.md)'s "Disconnected" pattern. Task-level `cascadeFail` only blocks transitive dependents, so a failure in one subtree does NOT cascade into its siblings.
54
+
55
+ **Split into separate pilot plans when:**
56
+
57
+ - Issues live in different repositories.
58
+ - Issues require fundamentally different setup environments.
59
+ - Issues have fundamentally different acceptance shapes (e.g., automated typecheck vs. manual operator playbook).
60
+
61
+ See [`decomposition.md`](rules/decomposition.md) "Plan sizing — count of tasks" for more.
62
+
40
63
  ## When to refuse
41
64
 
42
- If, after applying the methodology, you cannot produce a plan with at least:
65
+ Refuse ONLY when the **work itself** is underspecified or ambiguous — no concrete acceptance criteria, no clear "done" condition. Examples that warrant refusal:
66
+
67
+ - "Make the API better."
68
+ - "Refactor auth."
69
+ - "Clean up tech debt."
70
+
71
+ These don't name specific behaviors the pilot-builder can verify. Ask the user to narrow the scope before planning.
72
+
73
+ **Do NOT refuse for:**
43
74
 
44
- - 2 tasks
45
- - Each with non-trivial verify
46
- - Each with tight `touches`
47
- - A coherent DAG
75
+ - Plan size (5–30 tasks is fine; even more is fine when the work is well-defined).
76
+ - Multi-issue scope (2–4 related issues in one plan is first-class — see "When to bundle" above).
77
+ - Disconnected-subtree DAG shape (tasks from different concerns don't need artificial edges).
78
+ - Concerns about PR shape (that's a reviewer decision; the pilot run can produce one PR or several).
48
79
 
49
- tell the user the work isn't ready for pilot. Suggest they break it down themselves first, or use the regular `/plan` agent (markdown plans, human-driven execution). It is far better to refuse than to ship a bad plan.
80
+ When you do refuse: tell the user honestly and specifically what's missing. Suggest the regular `/plan` agent (markdown plans, human-driven execution) for ambiguous work that needs human iteration before it's pilotable. It is far better to refuse an unspecified request than to ship a plan full of `echo done` verifies — but narrow what "bad plan" means. Ambitious is not bad; ambiguous is bad.
@@ -34,3 +34,30 @@ A "right-sized" pilot task is one the pilot-builder can complete in a single ses
34
34
  ## When you can't decompose
35
35
 
36
36
  If the work genuinely doesn't decompose (e.g., a 200-line algorithm that has to land atomically), it might not be a fit for pilot. Tell the user; they may want to run it as a regular `/build` task instead.
37
+
38
+ ## Plan sizing — count of tasks
39
+
40
+ Per-task size is covered above. Plan-level size (total task count) is a different dimension and has its own sweet spot: **roughly 5–30 tasks per `pilot.yaml`**. Outside this range:
41
+
42
+ - **Fewer than 5 tasks:** usually means the work is a single change that doesn't benefit from the pilot harness. Consider `/plan` + `/build` instead.
43
+ - **More than 30 tasks:** fine in principle, but at that size the plan probably spans enough distinct concerns that a human reviewer will want it split — not a pilot problem, a PR-shape problem.
44
+
45
+ ### Multi-issue cross-cutting plans are a first-class shape
46
+
47
+ It is **normal and correct** for a single pilot plan to span 2–4 related issues (Linear tickets, GitHub issues) **when those issues share setup and verify infrastructure** — same repo, same package manager, same `docker-compose`, same test runner, same migrations. Reasons to bundle:
48
+
49
+ - **Setup amortization.** `pnpm install`, `docker compose up`, `pnpm db:migrate`, seed scripts — each of these is minutes of wall time. Running them once per pilot session vs. once per Linear issue saves hours across a multi-issue push.
50
+ - **Context reuse.** The builder learns the codebase through reading during early tasks; that context benefits every subsequent task in the run.
51
+ - **Shared acceptance.** Cross-issue integration checks (a milestone-close verify that exercises all three issues' changes together) are natural in one plan, awkward across three runs.
52
+
53
+ **Reference shape (not a red flag):** rule-engine cleanup + LISTEN/NOTIFY cache invalidation + read-only admin UI landed together in one plan of ~19 tasks across 4 milestones, covering 3 Linear issues. This is the shape pilot is built for.
54
+
55
+ When bundling, the tasks from different issues typically form **disconnected subtrees** in the DAG (no real semantic dependency between them). That's fine — see [`dag-shape.md`](dag-shape.md)'s "Disconnected" pattern. Task-level `cascadeFail` only blocks transitive dependents, so a failure in one subtree doesn't cascade into the siblings.
56
+
57
+ ### When to split instead of bundle
58
+
59
+ Split into separate pilot plans when:
60
+
61
+ - The issues live in **different repositories**.
62
+ - The issues require **fundamentally different setup environments** (e.g., one needs Postgres + Temporal, the other needs a headless browser grid — sharing setup is worse than paying the cost twice).
63
+ - The issues have **fundamentally different acceptance criteria** (e.g., one is a TypeScript refactor verified via typecheck, the other is an infrastructure change verified via a manual operator playbook — no shared verify makes sense).
@@ -0,0 +1,120 @@
1
+ # Rule 10 — QA-expectations establishment
2
+
3
+ **Detect → propose → confirm per-surface verify patterns.**
4
+
5
+ A plan's verify commands are its contract with the builder. Generic verifies ("run tests") waste builder time; specific verifies ("run the API tests that exercise the files this task touches") catch real failures. This rule establishes concrete, per-surface QA expectations with the user before emitting the plan.
6
+
7
+ ## The six surfaces
8
+
9
+ For each surface below, detect signals in the codebase, propose a canonical verify pattern, and confirm with the user.
10
+
11
+ ### UI — Browser-based user interface
12
+
13
+ **Detection signals:**
14
+ - `@playwright/test`, `cypress`, or `@vitest/browser` in `package.json` dependencies
15
+ - `playwright.config.{ts,js}` or `cypress.config.*` present
16
+
17
+ **Proposed verify pattern:**
18
+ Playwright MCP invocation for visual/interaction assertions:
19
+ ```yaml
20
+ verify:
21
+ - playwright test --project=chromium --grep "@task-specific-tag"
22
+ ```
23
+
24
+ ### API — HTTP endpoints
25
+
26
+ **Detection signals:**
27
+ - `openapi.yaml` / `openapi.json` present
28
+ - `curl` or `httpie` usage in existing scripts
29
+ - Postman collection files
30
+
31
+ **Proposed verify pattern:**
32
+ Direct HTTP assertion against a local port:
33
+ ```yaml
34
+ verify:
35
+ - curl -fsS http://localhost:3000/health | jq '.status == "ok"'
36
+ ```
37
+
38
+ ### DB — Database schema and queries
39
+
40
+ **Detection signals:**
41
+ - `docker-compose` postgres service defined
42
+ - `prisma`, `drizzle-kit`, `knex`, or `flyway` in dependencies
43
+ - `test/db` or similar helper directory
44
+
45
+ **Proposed verify pattern:**
46
+ Postgres readiness + migration + assertion:
47
+ ```yaml
48
+ verify:
49
+ - pg_isready -h localhost -p 5432
50
+ - pnpm prisma migrate deploy
51
+ - pnpm tsx scripts/verify-db.ts
52
+ ```
53
+
54
+ ### Integration — Cross-module workflows
55
+
56
+ **Detection signals:**
57
+ - `test/integration/**` directory exists
58
+ - `e2e/**` directory exists
59
+ - `*.integration.test.ts` files
60
+
61
+ **Proposed verify pattern:**
62
+ Integration test runner scoped to relevant paths:
63
+ ```yaml
64
+ verify:
65
+ - pnpm test test/integration
66
+ ```
67
+
68
+ ### Browser-based component — Storybook stories
69
+
70
+ **Detection signals:**
71
+ - `storybook` or `@storybook/*` in dependencies
72
+ - `*.stories.{ts,tsx}` files present
73
+
74
+ **Proposed verify pattern:**
75
+ Storybook test or Chromatic visual verification:
76
+ ```yaml
77
+ verify:
78
+ - pnpm storybook test --stories "ComponentName"
79
+ ```
80
+
81
+ ### CLI — Command-line interface
82
+
83
+ **Detection signals:**
84
+ - `bin/*` directory with executables
85
+ - `package.json` `bin:` entry defined
86
+
87
+ **Proposed verify pattern:**
88
+ Smoke test via help flag or scripted invocation:
89
+ ```yaml
90
+ verify:
91
+ - pnpm my-cli --help
92
+ - pnpm tsx scripts/smoke-test-cli.ts
93
+ ```
94
+
95
+ ## Question-bundling rule
96
+
97
+ **Two or more surfaces detected:** Bundle into a single structured `question` tool call with one checkbox group per surface.
98
+
99
+ **One surface detected:** Still ask (confirmation, not interrogation), but use a single-field call.
100
+
101
+ **Zero surfaces detected:** Skip the QA-expectation question entirely. Fall back to generic verifies:
102
+ ```yaml
103
+ defaults:
104
+ verify_after_each:
105
+ - pnpm run typecheck
106
+ - pnpm test
107
+ ```
108
+
109
+ ## Emission
110
+
111
+ Confirmed patterns become:
112
+
113
+ 1. **Per-task verify templates** — tasks targeting specific files use scoped verifies (e.g., `pnpm test test/api/users.test.ts` for a task touching `src/api/users.ts`)
114
+ 2. **defaults.verify_after_each** — global breakage catchers (typecheck, full test suite)
115
+
116
+ The rule: per-task verify targets the specific files touched; defaults catches global breakage.
117
+
118
+ ## Cross-reference to verify-design.md
119
+
120
+ This rule (10) is the per-surface tactical layer — it names the tools to detect and the patterns to propose. Rule 3 (verify-design.md) owns the principles: deterministic, assertive, would-have-failed-before. Every proposed command must satisfy both layers.
@@ -16,7 +16,7 @@ The validator catches schema, DAG, and glob errors. It cannot catch "this verify
16
16
 
17
17
  5. **Are there missing edges?** Look at every pair of tasks that share files in their `touches:`. Do they need an order? If T2's verify exercises code T1 introduces, T2 depends on T1 — even if their `touches:` don't overlap.
18
18
 
19
- 6. **Can the plan recover from a per-task failure?** If T3 fails, the cascade-fail blocks T4 onward. Is the resulting "failed=T3, blocked=[T4..T7]" state useful for the human operator? Or did you concentrate too much value into T3 such that its failure is catastrophic?
19
+ 6. **Does the DAG concentrate too much value in one task?** Task-level `cascadeFail` only blocks transitive DEPENDENTS of the failed task — sibling subtrees in a disconnected DAG keep running. So plan size is not itself a risk. The real risk is a task everything else depends on: a schema migration that all downstream work reads, a core-type definition all imports reference, a shared config every consumer parses. If THAT task fails, the whole run stalls. Is there such a task in your plan? If yes, can it be simplified smaller diff, tighter verify, higher success probability? Don't over-concentrate; a plan where 80% of tasks depend on T1 and T1 is complex is fragile by design.
20
20
 
21
21
  7. **Could you read this plan in 6 months and understand it?** Plan names + task titles + prompts should be a self-explanatory summary of the work. If the plan needs a verbal preamble to make sense, rewrite the prompts.
22
22
 
@@ -45,3 +45,37 @@ If the verify commands would FAIL without edits, an empty `touches` is a STOP
45
45
  - **Including the migrations dir for a non-migration task.** Tight scope.
46
46
 
47
47
  When in doubt, write the tightest possible scope first. If the task fails verify with "touches violation: src/X.ts", the worker shows you which file got touched — broaden then.
48
+
49
+ ## `tolerate:` — files allowed in the diff but outside the contract
50
+
51
+ When a task's verify step runs a tool that writes files as a side-effect (codegen, build, snapshots), those files will appear in `git diff` even though the agent didn't author them. Add them to `tolerate:` so enforcement accepts them without counting them as part of the task's output.
52
+
53
+ Two categories to watch for:
54
+
55
+ **Built-in defaults (already tolerated — don't list these):**
56
+ - `**/next-env.d.ts` — Next.js regenerates on every `next build`.
57
+ - `**/.next/types/**`, `**/.next/dev/types/**` — Next.js app-router generated types.
58
+ - `**/*.tsbuildinfo` — TypeScript project-reference build cache.
59
+ - `**/__snapshots__/**`, `**/*.snap` — Jest / Vitest snapshot files rewritten by `-u`.
60
+
61
+ **Project-specific (list in `tolerate:` per task):**
62
+ - Prisma client output (e.g., `prisma/client/**` if `prisma generate` runs in verify).
63
+ - GraphQL codegen output (`graphql/generated/**`, `*.graphql.d.ts`).
64
+ - OpenAPI codegen output (`api-types/generated/**`).
65
+ - Anywhere you have a build step that writes type declarations downstream of the agent's source edits.
66
+
67
+ A good test: if the task's verify step runs `prisma generate`, `pnpm codegen`, `next build`, or similar, ask: "does that command write files anywhere?" If yes, those paths go in `tolerate:`.
68
+
69
+ ### Example
70
+
71
+ ```yaml
72
+ - id: T-ADD-RULE-MODEL
73
+ touches:
74
+ - prisma/schema.prisma
75
+ - src/models/rule.ts
76
+ tolerate:
77
+ - prisma/client/** # prisma generate output
78
+ verify:
79
+ - pnpm prisma generate
80
+ - pnpm --filter core test rule-model
81
+ ```