@glrs-dev/cli 0.1.1 → 1.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +18 -0
- package/dist/vendor/harness-opencode/dist/agents/prompts/pilot-builder.md +29 -4
- package/dist/vendor/harness-opencode/dist/agents/prompts/pilot-planner.md +26 -1
- package/dist/vendor/harness-opencode/dist/agents/prompts/research-auto.md +37 -0
- package/dist/vendor/harness-opencode/dist/agents/prompts/research-local.md +33 -0
- package/dist/vendor/harness-opencode/dist/agents/prompts/research-web.md +32 -0
- package/dist/vendor/harness-opencode/dist/agents/prompts/research.md +15 -20
- package/dist/vendor/harness-opencode/dist/chunk-57EOY72Y.js +174 -0
- package/dist/vendor/harness-opencode/dist/chunk-5TAMY7P6.js +67 -0
- package/dist/vendor/harness-opencode/dist/chunk-BKTFWXLG.js +204 -0
- package/dist/vendor/harness-opencode/dist/{chunk-XCZ3NOXR.js → chunk-CZMAJISX.js} +28 -0
- package/dist/vendor/harness-opencode/dist/chunk-KB7M7JXU.js +145 -0
- package/dist/vendor/harness-opencode/dist/chunk-RNRCXQ65.js +56 -0
- package/dist/vendor/harness-opencode/dist/{chunk-VVMP6QWS.js → chunk-WBBN7OVN.js} +162 -2
- package/dist/vendor/harness-opencode/dist/cli.js +964 -1383
- package/dist/vendor/harness-opencode/dist/index.js +2 -2
- package/dist/vendor/harness-opencode/dist/install-X5KEANRB.js +13 -0
- package/dist/vendor/harness-opencode/dist/paths-LT3QQKCF.js +18 -0
- package/dist/vendor/harness-opencode/dist/pilot/mcp/status-server.d.ts +1 -0
- package/dist/vendor/harness-opencode/dist/pilot/mcp/status-server.js +228 -0
- package/dist/vendor/harness-opencode/dist/pilot-config-7LJZ23YK.js +55 -0
- package/dist/vendor/harness-opencode/dist/runs-QWPL3TKV.js +18 -0
- package/dist/vendor/harness-opencode/dist/safety-gate-WM3EWOCY.js +10 -0
- package/dist/vendor/harness-opencode/dist/setup-hook-FHTXMAQL.js +88 -0
- package/dist/vendor/harness-opencode/dist/skills/adr/SKILL.md +328 -0
- package/dist/vendor/harness-opencode/dist/skills/pilot-planning/SKILL.md +41 -10
- package/dist/vendor/harness-opencode/dist/skills/pilot-planning/rules/decomposition.md +27 -0
- package/dist/vendor/harness-opencode/dist/skills/pilot-planning/rules/qa-expectations.md +120 -0
- package/dist/vendor/harness-opencode/dist/skills/pilot-planning/rules/self-review.md +1 -1
- package/dist/vendor/harness-opencode/dist/skills/pilot-planning/rules/touches-scope.md +34 -0
- package/dist/vendor/harness-opencode/dist/skills/pilot-planning/rules/verify-design.md +81 -13
- package/dist/vendor/harness-opencode/dist/tasks-KJ3WN2KY.js +32 -0
- package/dist/vendor/harness-opencode/package.json +1 -1
- package/package.json +1 -1
- package/dist/vendor/harness-opencode/dist/install-4EYR56OR.js +0 -9
|
@@ -0,0 +1,328 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: adr
|
|
3
|
+
description: "Use when drafting, revising, or reading any engineering ADR in `docs/adr/`. Encodes grounding steps, the mandatory section template, the Unspecified-interactions-vs-Open-questions rubric, the security-default-deny rule, and self-check red flags. Use when the task is to write an ADR, draft an architecture decision, produce a design doc for a schema/contract/cross-package change, propose a new table/entity, or capture a consequential decision. Do NOT draft an ADR without this skill loaded."
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# Engineering ADR Skill (docs/adr/)
|
|
7
|
+
|
|
8
|
+
Purpose: every engineering ADR in this repo starts from the same
|
|
9
|
+
opinionated foundation. Read prior ADRs in `docs/adr/` before drafting
|
|
10
|
+
(see Step 1) — each one's lessons compound.
|
|
11
|
+
|
|
12
|
+
This skill describes **what** to do and **how** to structure an ADR.
|
|
13
|
+
It deliberately does NOT prescribe a review process — how an ADR gets
|
|
14
|
+
scrutinized before merge is up to whoever is shipping it and whichever
|
|
15
|
+
harness or team workflow applies. The skill's job is to make the draft
|
|
16
|
+
good; the review process is a separate concern.
|
|
17
|
+
|
|
18
|
+
## When you MUST load this skill
|
|
19
|
+
|
|
20
|
+
- Drafting a new file in `docs/adr/`.
|
|
21
|
+
- Revising an existing ADR (even a typo-sized change — you may trip
|
|
22
|
+
one of the red flags below).
|
|
23
|
+
- Reading an existing ADR to understand a past decision, if you need
|
|
24
|
+
to write a supersession or cite its pattern.
|
|
25
|
+
|
|
26
|
+
## When this skill does NOT apply
|
|
27
|
+
|
|
28
|
+
- Product decisions (if a `docs/product/` directory exists, use that).
|
|
29
|
+
- LLM-feature proposals (if a dedicated template exists, use that).
|
|
30
|
+
- Implementation plans, task breakdowns, build sequencing — Linear
|
|
31
|
+
issues or plan files.
|
|
32
|
+
- Bug fixes, refactors, single-PR work — Linear issue, no ADR.
|
|
33
|
+
|
|
34
|
+
## The iron rules (five rules; every ADR should honor them)
|
|
35
|
+
|
|
36
|
+
1. **Ground before you draft.** Run the grounding checklist below
|
|
37
|
+
BEFORE writing the Decision section. Invented table/column/module
|
|
38
|
+
names are the #1 cause of ADR rework.
|
|
39
|
+
2. **Section order is frozen** (see Template). Don't reorder. Don't
|
|
40
|
+
omit. A missing section is a signal you skipped work, not that the
|
|
41
|
+
work wasn't needed.
|
|
42
|
+
3. **Security-sensitive capabilities DEFAULT DENY.** Every new role
|
|
43
|
+
grant, every new partner scope, every new cross-org read path
|
|
44
|
+
starts in the `off` position with an explicit, logged
|
|
45
|
+
per-principal enablement path. "Probably fine" is not a stance.
|
|
46
|
+
4. **Cross-system couplings go in `Consequences -> Unspecified
|
|
47
|
+
interactions`, not `Open questions`.** See the rubric below.
|
|
48
|
+
5. **"Pre-implementation codebase investigation" items must be
|
|
49
|
+
genuinely unknown at write time.** If it's "verify my bullets are
|
|
50
|
+
right", it's already your job — do it before drafting.
|
|
51
|
+
|
|
52
|
+
## Step 1: Grounding (mandatory, before drafting)
|
|
53
|
+
|
|
54
|
+
This is not optional. Perform each step and capture the real
|
|
55
|
+
names/paths in a scratch note you'll use while drafting:
|
|
56
|
+
|
|
57
|
+
1. **Discover prior ADRs.** Read existing ADRs in `docs/adr/` to
|
|
58
|
+
understand established conventions and patterns. If an `adr-index`
|
|
59
|
+
MCP tool is available, use it to find ADRs by subject-area tags.
|
|
60
|
+
Otherwise, list and skim the directory. Pay particular attention to
|
|
61
|
+
conventions in each ADR's `establishes` frontmatter — those are in
|
|
62
|
+
force (unless a later ADR's `supersedes:` includes it).
|
|
63
|
+
|
|
64
|
+
2. **Read every referenced file.** For the decision you're about to
|
|
65
|
+
make, identify the 3-10 existing files/tables/contracts your ADR
|
|
66
|
+
will touch or adjoin. Read them. Copy real symbol names into your
|
|
67
|
+
scratch note — do not paraphrase from memory.
|
|
68
|
+
|
|
69
|
+
3. **Grep-verify every table, column, entity, and symbol name before
|
|
70
|
+
it lands in the draft.** Use AST-aware symbol lookup for code
|
|
71
|
+
symbols where available; fall back to `grep`. An invented name in
|
|
72
|
+
the Decision section is the #1 cause of ADR rework.
|
|
73
|
+
|
|
74
|
+
4. **Identify the access/tenancy story.** Is the new entity scoped to
|
|
75
|
+
a user, an org, global, or cross-tenant? Confirm it follows
|
|
76
|
+
existing access patterns and doesn't accidentally bypass them.
|
|
77
|
+
|
|
78
|
+
5. **Identify every touched contract.** Internal vs external, file
|
|
79
|
+
paths, permission keys. The ADR must cite the real file paths.
|
|
80
|
+
|
|
81
|
+
6. **Identify circuit breakers and cross-system coupling.** List every
|
|
82
|
+
module/table/entity whose behavior will change because of this
|
|
83
|
+
decision.
|
|
84
|
+
|
|
85
|
+
7. **Decide whether this ADR warrants a follow-up project.** If the
|
|
86
|
+
decision produces 3+ implementable issues, file a project when the
|
|
87
|
+
ADR merges. Small decisions that land in one PR don't need one.
|
|
88
|
+
|
|
89
|
+
Only after these seven steps do you touch the template.
|
|
90
|
+
|
|
91
|
+
## Step 2: Template (frozen section order)
|
|
92
|
+
|
|
93
|
+
```markdown
|
|
94
|
+
---
|
|
95
|
+
touches: [<coarse subject-area tags>]
|
|
96
|
+
establishes:
|
|
97
|
+
- <convention-slug-this-adr-introduces>
|
|
98
|
+
- <another-convention-if-any>
|
|
99
|
+
supersedes: [] # or [<prior-adr-filename-without-.md>] if this replaces one
|
|
100
|
+
---
|
|
101
|
+
|
|
102
|
+
# ADR: <Short decision title>
|
|
103
|
+
|
|
104
|
+
---
|
|
105
|
+
|
|
106
|
+
---
|
|
107
|
+
|
|
108
|
+
## 1. Context
|
|
109
|
+
|
|
110
|
+
What system state exists today, cited with real file paths + symbol
|
|
111
|
+
names. Who the actors/roles are. What's broken, missing, or ambiguous.
|
|
112
|
+
Include a "Prior art in this repo" subsection listing existing
|
|
113
|
+
patterns that inform or constrain the decision.
|
|
114
|
+
|
|
115
|
+
## 2. Decision
|
|
116
|
+
|
|
117
|
+
What we will do, subsectioned by concern:
|
|
118
|
+
|
|
119
|
+
2.1 Data model (if any — new tables/columns/enums with real names)
|
|
120
|
+
2.2 Resolution / runtime semantics (pure functions, state transitions)
|
|
121
|
+
2.3 External API contract (paths, verbs, schemas, file locations)
|
|
122
|
+
2.4 Internal API contract (same)
|
|
123
|
+
2.5 UI design (surfaces, routes, key flows, broken-state treatment)
|
|
124
|
+
2.6 External integration surface (third-party APIs, adapters, etc.)
|
|
125
|
+
2.7 Role-based access matrix (see iron rule #3)
|
|
126
|
+
2.8 Migration strategy (new table? rename? backfill? legacy handling?)
|
|
127
|
+
|
|
128
|
+
Execution planning — merge units, task sequencing, PR boundaries —
|
|
129
|
+
does NOT belong in an ADR. Those are implementation concerns tracked
|
|
130
|
+
separately. If a project exists for the decision, the project is
|
|
131
|
+
where sequencing lives, not here.
|
|
132
|
+
|
|
133
|
+
## 3. Consequences
|
|
134
|
+
|
|
135
|
+
### Positive
|
|
136
|
+
### Negative / trade-offs
|
|
137
|
+
### Neutral / noted
|
|
138
|
+
|
|
139
|
+
### Unspecified interactions with existing mechanisms
|
|
140
|
+
(see rubric below; this subsection is mandatory if any exist)
|
|
141
|
+
|
|
142
|
+
## 4. Alternatives considered
|
|
143
|
+
|
|
144
|
+
Alt 1, Alt 2, ..., each with a one-paragraph rejection reason. Include
|
|
145
|
+
the genuinely-considered options; don't straw-man. If only one
|
|
146
|
+
alternative existed, this section is a red flag — you haven't
|
|
147
|
+
explored the decision space.
|
|
148
|
+
|
|
149
|
+
## 5. Decision linkages
|
|
150
|
+
|
|
151
|
+
Consumers, dependencies, blockers, future extensions, what this ADR
|
|
152
|
+
establishes (e.g. a new convention).
|
|
153
|
+
|
|
154
|
+
## 6. Open questions
|
|
155
|
+
|
|
156
|
+
ITERATE UNTIL EMPTY. An ADR should not merge with unresolved open
|
|
157
|
+
questions. Each question is either: (a) answerable now — answer it
|
|
158
|
+
inline and move to a "Resolved during drafting" appendix, or (b) a
|
|
159
|
+
blocker that requires external input — in which case the ADR is not
|
|
160
|
+
ready to merge. Do not use this section as a parking lot for
|
|
161
|
+
laziness. If you can grep the codebase or reason through the
|
|
162
|
+
tradeoffs to resolve a question, do it before declaring the draft
|
|
163
|
+
complete.
|
|
164
|
+
|
|
165
|
+
Format when all questions are resolved:
|
|
166
|
+
"None. All questions resolved during drafting:"
|
|
167
|
+
followed by a "### Resolved during drafting" subsection with
|
|
168
|
+
numbered answers preserving the original question for traceability.
|
|
169
|
+
|
|
170
|
+
## 7. Pre-implementation codebase investigation
|
|
171
|
+
|
|
172
|
+
ITERATE UNTIL EMPTY. Same rule as S6. Every item here must be
|
|
173
|
+
resolved before the ADR merges — either by doing the investigation
|
|
174
|
+
during drafting (preferred) or by explicitly blocking the ADR on the
|
|
175
|
+
investigation. An ADR with unresolved S7 items is an ADR that will
|
|
176
|
+
produce wrong implementation work.
|
|
177
|
+
|
|
178
|
+
Format when all items are confirmed:
|
|
179
|
+
"None. All items confirmed during drafting:"
|
|
180
|
+
followed by a "### Resolved during drafting" subsection with
|
|
181
|
+
numbered findings.
|
|
182
|
+
|
|
183
|
+
## 8. References
|
|
184
|
+
|
|
185
|
+
Every file cited, every external doc, every ticket/issue, and the
|
|
186
|
+
convention this ADR establishes or modifies.
|
|
187
|
+
```
|
|
188
|
+
|
|
189
|
+
Sections with no content in your decision: write "Not applicable" and
|
|
190
|
+
one sentence explaining why. Do not delete the heading.
|
|
191
|
+
|
|
192
|
+
### Frontmatter contract
|
|
193
|
+
|
|
194
|
+
The YAML frontmatter is the **only** machine-readable metadata on an
|
|
195
|
+
ADR. There is no prose header block — no `Date`, no `Authors`, no
|
|
196
|
+
status. The date is in the filename, authorship is in `git log`,
|
|
197
|
+
and whether an ADR is in force is determined by Git (on `main` = in
|
|
198
|
+
force; named in a later ADR's `supersedes:` = superseded).
|
|
199
|
+
Duplicating any of this in the body would create drift. The body
|
|
200
|
+
opens straight with the `# ADR: <title>` heading and goes to S1
|
|
201
|
+
Context.
|
|
202
|
+
|
|
203
|
+
The frontmatter carries only facts about the ADR's content, never
|
|
204
|
+
state or intent about implementation follow-through (whether a
|
|
205
|
+
project gets created, whether the decision has been acted on, etc. —
|
|
206
|
+
those are independently observable and don't belong here).
|
|
207
|
+
|
|
208
|
+
Rules:
|
|
209
|
+
|
|
210
|
+
- **`touches`** — inline list of coarse subject-area tags. Err toward
|
|
211
|
+
more tags — matching is cheap, missing a cross-reference is
|
|
212
|
+
expensive.
|
|
213
|
+
- **`establishes`** — block list of convention slugs this ADR
|
|
214
|
+
introduces (kebab-case; descriptive, not clever). These are what
|
|
215
|
+
future ADR authors discover when their decision is constrained by
|
|
216
|
+
conventions you set.
|
|
217
|
+
- **`supersedes`** — list of prior ADR filenames (without `.md`) that
|
|
218
|
+
this ADR replaces. Empty for most ADRs. Supersession lives in the
|
|
219
|
+
superseding ADR's frontmatter, not as a flag on the superseded ADR —
|
|
220
|
+
that one stays unchanged on `main` as a truthful historical record.
|
|
221
|
+
|
|
222
|
+
## Rubric: Unspecified interactions vs. Open questions vs. Pre-implementation investigation
|
|
223
|
+
|
|
224
|
+
This is the most common ADR failure. Use this table:
|
|
225
|
+
|
|
226
|
+
| Item type | Goes in | Test |
|
|
227
|
+
|---|---|---|
|
|
228
|
+
| A coupling we know exists in the codebase today that this decision changes or newly touches, but we deliberately are not specifying here | `Consequences -> Unspecified interactions with existing mechanisms` | "Implementers need to know about X coupling to avoid breaking it." |
|
|
229
|
+
| A design sub-decision we deferred because it isn't blocking and has multiple valid answers | `Open questions` | "A reasonable person could answer this two ways and either is defensible; we'll pick one during implementation." |
|
|
230
|
+
| A fact we don't know yet about the codebase that must be verified before the first PR | `Pre-implementation codebase investigation` | "The answer is knowable by grepping / reading code, not by discussion." |
|
|
231
|
+
|
|
232
|
+
If an item is really "I haven't done my homework" dressed up as an
|
|
233
|
+
open question, it fails this rubric. Do the homework or move it to
|
|
234
|
+
Pre-implementation investigation with a specific grep/read
|
|
235
|
+
prescribed.
|
|
236
|
+
|
|
237
|
+
## Security default-deny rule (iron rule #3, expanded)
|
|
238
|
+
|
|
239
|
+
For every capability that can:
|
|
240
|
+
|
|
241
|
+
- Write to another user's/org's data
|
|
242
|
+
- Stamp long-lived credentials used on outbound traffic
|
|
243
|
+
- Grant a partner/API-key/integration-user role any verb beyond
|
|
244
|
+
`read` on its own scope
|
|
245
|
+
|
|
246
|
+
the ADR must:
|
|
247
|
+
|
|
248
|
+
1. Default to `off` (not-granted). Do not write "probably fine, worth
|
|
249
|
+
confirming."
|
|
250
|
+
2. Specify the enablement mechanism: who grants it, where it's logged,
|
|
251
|
+
and how it's revoked.
|
|
252
|
+
3. State the blast radius if the grant is misused (a mistaken or
|
|
253
|
+
compromised principal).
|
|
254
|
+
4. Name the expected flow without the grant (what does the actor do
|
|
255
|
+
instead?).
|
|
256
|
+
|
|
257
|
+
## Red flags — author self-check
|
|
258
|
+
|
|
259
|
+
These are common failure modes observed across ADRs. Use this list as
|
|
260
|
+
a self-check before you consider a draft complete.
|
|
261
|
+
|
|
262
|
+
- Any table, column, enum, or code symbol in your draft has not been
|
|
263
|
+
grep-confirmed against the actual codebase.
|
|
264
|
+
- Your Decision section says "probably fine" about a security grant.
|
|
265
|
+
Make it default-deny.
|
|
266
|
+
- You have zero alternatives in S4 beyond the chosen one.
|
|
267
|
+
- Your S7 "pre-implementation investigation" reads like "verify my
|
|
268
|
+
bullets are right." Move these to grounding and do them now.
|
|
269
|
+
- A coupling with existing mechanisms is not mentioned. If you
|
|
270
|
+
honestly looked and found none, state that.
|
|
271
|
+
- Your ADR introduces a new enum/channel/role/surface whose naming
|
|
272
|
+
collides with an existing one.
|
|
273
|
+
- Your S2 Decision subsections leak into execution planning — merge
|
|
274
|
+
units, PR boundaries, task sequencing. That belongs in issues, not
|
|
275
|
+
in the ADR.
|
|
276
|
+
- Your UI section doesn't describe the broken-state case (what
|
|
277
|
+
happens when a referenced entity is archived/inactive/missing).
|
|
278
|
+
- Your migration section doesn't describe the down() path.
|
|
279
|
+
- Your S6 Open questions are really S3 Unspecified interactions (they
|
|
280
|
+
describe *existing* couplings, not *deferred* design decisions).
|
|
281
|
+
- Your S6 or S7 has unresolved items. Both sections must be iterated
|
|
282
|
+
to empty before the ADR merges. If you can answer a question by
|
|
283
|
+
reading code or reasoning through tradeoffs, do it now — don't
|
|
284
|
+
defer to implementation what you can resolve during drafting.
|
|
285
|
+
- Your ADR is missing YAML frontmatter. Without frontmatter, the ADR
|
|
286
|
+
is invisible to discovery and future authors will rediscover your
|
|
287
|
+
lessons from scratch.
|
|
288
|
+
- A convention you introduce in S2 is not listed in `establishes:`
|
|
289
|
+
frontmatter. Future ADRs can't find that it exists.
|
|
290
|
+
|
|
291
|
+
## Inline-vs-follow-on decision rubric
|
|
292
|
+
|
|
293
|
+
When you discover during drafting that a sub-decision is bigger than
|
|
294
|
+
you thought:
|
|
295
|
+
|
|
296
|
+
- **Inline it** if: the sub-decision touches <=3 files, introduces no
|
|
297
|
+
new abstractions, and doesn't shift the boundary of any existing
|
|
298
|
+
subsystem.
|
|
299
|
+
- **Follow-on ADR** if: crosses a package boundary you haven't
|
|
300
|
+
mapped, introduces a new abstraction (new model pattern, new
|
|
301
|
+
helper), or requires re-architecting an existing subsystem.
|
|
302
|
+
- **Resolve it now** if: you can answer the question by reading code
|
|
303
|
+
or reasoning through tradeoffs. S6 must be empty at merge — don't
|
|
304
|
+
defer what you can decide during drafting.
|
|
305
|
+
|
|
306
|
+
A follow-on ADR is cited in S5 Decision linkages as a "Blocker" or
|
|
307
|
+
"Future extension."
|
|
308
|
+
|
|
309
|
+
## File placement and naming
|
|
310
|
+
|
|
311
|
+
- **Location:** `docs/adr/`.
|
|
312
|
+
- **Filename:** `YYYY-MM-DD-<slug>.md`. ISO date (authored date),
|
|
313
|
+
kebab-case slug, 3-7 words.
|
|
314
|
+
- **Branch name:** `docs/<slug>` or `<user>/<ticket>-<slug>` if
|
|
315
|
+
tracked by an issue.
|
|
316
|
+
|
|
317
|
+
## Commit sequence
|
|
318
|
+
|
|
319
|
+
1. Verify the frontmatter block parses (no tabs, list items use
|
|
320
|
+
` - ` indent). Check that `touches` tags are meaningful and any
|
|
321
|
+
new conventions are listed in `establishes`.
|
|
322
|
+
2. `git add docs/adr/<file>.md`
|
|
323
|
+
3. Commit message: `docs(adr): <title>`.
|
|
324
|
+
4. Push branch and open PR. Link the issue in the PR body if one
|
|
325
|
+
exists.
|
|
326
|
+
5. If the decision warrants a follow-up project (per grounding step
|
|
327
|
+
7), create the project on merge and link it from the ADR's S5
|
|
328
|
+
Decision linkages in a follow-up commit.
|
|
@@ -11,7 +11,7 @@ A good plan trades a planning-session's worth of patient thought for hours of un
|
|
|
11
11
|
|
|
12
12
|
## Workflow
|
|
13
13
|
|
|
14
|
-
Apply these
|
|
14
|
+
Apply these nine rules in order. Each rule has its own file in `rules/` for the full text:
|
|
15
15
|
|
|
16
16
|
1. [`first-principles.md`](rules/first-principles.md) — Frame the task FROM the user's intent, not from a templated checklist. Ask "what does the user actually want done?" before "what files might change?"
|
|
17
17
|
|
|
@@ -25,25 +25,56 @@ Apply these eight rules in order. Each rule has its own file in `rules/` for the
|
|
|
25
25
|
|
|
26
26
|
6. [`milestones.md`](rules/milestones.md) — Optional grouping. Use when several tasks share a "is this batch done?" check (e.g. integration tests after a chunk of unit-test work).
|
|
27
27
|
|
|
28
|
-
7. [`self-review.md`](rules/self-review.md) — Before declaring the plan ready, run through a 7-question checklist. Find the holes yourself; the validator only catches schema errors.
|
|
28
|
+
7. [`self-review.md`](rules/self-review.md) — Before declaring the plan ready, run through a 7-question checklist. Find the holes yourself; the validator only catches schema errors. And before declaring "refuse", revisit the bundle-vs-split decision below.
|
|
29
29
|
|
|
30
30
|
8. [`task-context.md`](rules/task-context.md) — Every non-trivial task carries a `context:` block. Thin plans fail because the builder works each task from scratch with no carry-over; rich context pre-loads what the builder needs to work confidently. Cover outcome, rationale, code pointers, acceptance.
|
|
31
31
|
|
|
32
|
+
9. [`qa-expectations.md`](rules/qa-expectations.md) — Detect → propose → confirm per-surface verify patterns for UI, API, DB, integration, browser-based component, and CLI surfaces.
|
|
33
|
+
|
|
32
34
|
## After applying the rules
|
|
33
35
|
|
|
34
36
|
1. Save the YAML to the path returned by `bunx @glrs-dev/harness-plugin-opencode pilot plan-dir`.
|
|
35
|
-
2.
|
|
36
|
-
3.
|
|
37
|
+
2. Remind the user the plan assumes their dev stack is already running (install, compose, migrate, seed). Plans no longer bootstrap their own environment.
|
|
38
|
+
3. Run `bunx @glrs-dev/harness-plugin-opencode pilot validate <path>` and fix every error / warning.
|
|
39
|
+
4. Hand off to the user with: `Plan saved to <path>. Next: bunx @glrs-dev/harness-plugin-opencode pilot build`.
|
|
37
40
|
|
|
38
41
|
Do NOT summarize the plan in chat. The user can read the YAML.
|
|
39
42
|
|
|
43
|
+
## When to bundle vs. split plans
|
|
44
|
+
|
|
45
|
+
Multi-issue cross-cutting plans are a first-class pilot shape. When a user's scope spans 2–4 related issues, default to **one plan** covering all of them — as long as they share:
|
|
46
|
+
|
|
47
|
+
- Same repo (or monorepo).
|
|
48
|
+
- Same package manager / install command.
|
|
49
|
+
- Same `docker-compose` (or equivalent local-infra) stack.
|
|
50
|
+
- Same test runner and verify style.
|
|
51
|
+
- Same migrations/seed pipeline.
|
|
52
|
+
|
|
53
|
+
Bundling amortizes setup cost (install, compose up, migrate, seed — minutes each, paid once per pilot run) across all the work. Tasks from different issues typically form disconnected subtrees in the DAG — see [`dag-shape.md`](rules/dag-shape.md)'s "Disconnected" pattern. Task-level `cascadeFail` only blocks transitive dependents, so a failure in one subtree does NOT cascade into its siblings.
|
|
54
|
+
|
|
55
|
+
**Split into separate pilot plans when:**
|
|
56
|
+
|
|
57
|
+
- Issues live in different repositories.
|
|
58
|
+
- Issues require fundamentally different setup environments.
|
|
59
|
+
- Issues have fundamentally different acceptance shapes (e.g., automated typecheck vs. manual operator playbook).
|
|
60
|
+
|
|
61
|
+
See [`decomposition.md`](rules/decomposition.md) "Plan sizing — count of tasks" for more.
|
|
62
|
+
|
|
40
63
|
## When to refuse
|
|
41
64
|
|
|
42
|
-
|
|
65
|
+
Refuse ONLY when the **work itself** is underspecified or ambiguous — no concrete acceptance criteria, no clear "done" condition. Examples that warrant refusal:
|
|
66
|
+
|
|
67
|
+
- "Make the API better."
|
|
68
|
+
- "Refactor auth."
|
|
69
|
+
- "Clean up tech debt."
|
|
70
|
+
|
|
71
|
+
These don't name specific behaviors the pilot-builder can verify. Ask the user to narrow the scope before planning.
|
|
72
|
+
|
|
73
|
+
**Do NOT refuse for:**
|
|
43
74
|
|
|
44
|
-
-
|
|
45
|
-
-
|
|
46
|
-
-
|
|
47
|
-
-
|
|
75
|
+
- Plan size (5–30 tasks is fine; even more is fine when the work is well-defined).
|
|
76
|
+
- Multi-issue scope (2–4 related issues in one plan is first-class — see "When to bundle" above).
|
|
77
|
+
- Disconnected-subtree DAG shape (tasks from different concerns don't need artificial edges).
|
|
78
|
+
- Concerns about PR shape (that's a reviewer decision; the pilot run can produce one PR or several).
|
|
48
79
|
|
|
49
|
-
|
|
80
|
+
When you do refuse: tell the user honestly and specifically what's missing. Suggest the regular `/plan` agent (markdown plans, human-driven execution) for ambiguous work that needs human iteration before it's pilotable. It is far better to refuse an unspecified request than to ship a plan full of `echo done` verifies — but narrow what "bad plan" means. Ambitious is not bad; ambiguous is bad.
|
|
@@ -34,3 +34,30 @@ A "right-sized" pilot task is one the pilot-builder can complete in a single ses
|
|
|
34
34
|
## When you can't decompose
|
|
35
35
|
|
|
36
36
|
If the work genuinely doesn't decompose (e.g., a 200-line algorithm that has to land atomically), it might not be a fit for pilot. Tell the user; they may want to run it as a regular `/build` task instead.
|
|
37
|
+
|
|
38
|
+
## Plan sizing — count of tasks
|
|
39
|
+
|
|
40
|
+
Per-task size is covered above. Plan-level size (total task count) is a different dimension and has its own sweet spot: **roughly 5–30 tasks per `pilot.yaml`**. Outside this range:
|
|
41
|
+
|
|
42
|
+
- **Fewer than 5 tasks:** usually means the work is a single change that doesn't benefit from the pilot harness. Consider `/plan` + `/build` instead.
|
|
43
|
+
- **More than 30 tasks:** fine in principle, but at that size the plan probably spans enough distinct concerns that a human reviewer will want it split — not a pilot problem, a PR-shape problem.
|
|
44
|
+
|
|
45
|
+
### Multi-issue cross-cutting plans are a first-class shape
|
|
46
|
+
|
|
47
|
+
It is **normal and correct** for a single pilot plan to span 2–4 related issues (Linear tickets, GitHub issues) **when those issues share setup and verify infrastructure** — same repo, same package manager, same `docker-compose`, same test runner, same migrations. Reasons to bundle:
|
|
48
|
+
|
|
49
|
+
- **Setup amortization.** `pnpm install`, `docker compose up`, `pnpm db:migrate`, seed scripts — each of these is minutes of wall time. Running them once per pilot session vs. once per Linear issue saves hours across a multi-issue push.
|
|
50
|
+
- **Context reuse.** The builder learns the codebase through reading during early tasks; that context benefits every subsequent task in the run.
|
|
51
|
+
- **Shared acceptance.** Cross-issue integration checks (a milestone-close verify that exercises all three issues' changes together) are natural in one plan, awkward across three runs.
|
|
52
|
+
|
|
53
|
+
**Reference shape (not a red flag):** rule-engine cleanup + LISTEN/NOTIFY cache invalidation + read-only admin UI landed together in one plan of ~19 tasks across 4 milestones, covering 3 Linear issues. This is the shape pilot is built for.
|
|
54
|
+
|
|
55
|
+
When bundling, the tasks from different issues typically form **disconnected subtrees** in the DAG (no real semantic dependency between them). That's fine — see [`dag-shape.md`](dag-shape.md)'s "Disconnected" pattern. Task-level `cascadeFail` only blocks transitive dependents, so a failure in one subtree doesn't cascade into the siblings.
|
|
56
|
+
|
|
57
|
+
### When to split instead of bundle
|
|
58
|
+
|
|
59
|
+
Split into separate pilot plans when:
|
|
60
|
+
|
|
61
|
+
- The issues live in **different repositories**.
|
|
62
|
+
- The issues require **fundamentally different setup environments** (e.g., one needs Postgres + Temporal, the other needs a headless browser grid — sharing setup is worse than paying the cost twice).
|
|
63
|
+
- The issues have **fundamentally different acceptance criteria** (e.g., one is a TypeScript refactor verified via typecheck, the other is an infrastructure change verified via a manual operator playbook — no shared verify makes sense).
|
|
@@ -0,0 +1,120 @@
|
|
|
1
|
+
# Rule 10 — QA-expectations establishment
|
|
2
|
+
|
|
3
|
+
**Detect → propose → confirm per-surface verify patterns.**
|
|
4
|
+
|
|
5
|
+
A plan's verify commands are its contract with the builder. Generic verifies ("run tests") waste builder time; specific verifies ("run the API tests that exercise the files this task touches") catch real failures. This rule establishes concrete, per-surface QA expectations with the user before emitting the plan.
|
|
6
|
+
|
|
7
|
+
## The six surfaces
|
|
8
|
+
|
|
9
|
+
For each surface below, detect signals in the codebase, propose a canonical verify pattern, and confirm with the user.
|
|
10
|
+
|
|
11
|
+
### UI — Browser-based user interface
|
|
12
|
+
|
|
13
|
+
**Detection signals:**
|
|
14
|
+
- `@playwright/test`, `cypress`, or `@vitest/browser` in `package.json` dependencies
|
|
15
|
+
- `playwright.config.{ts,js}` or `cypress.config.*` present
|
|
16
|
+
|
|
17
|
+
**Proposed verify pattern:**
|
|
18
|
+
Playwright MCP invocation for visual/interaction assertions:
|
|
19
|
+
```yaml
|
|
20
|
+
verify:
|
|
21
|
+
- playwright test --project=chromium --grep "@task-specific-tag"
|
|
22
|
+
```
|
|
23
|
+
|
|
24
|
+
### API — HTTP endpoints
|
|
25
|
+
|
|
26
|
+
**Detection signals:**
|
|
27
|
+
- `openapi.yaml` / `openapi.json` present
|
|
28
|
+
- `curl` or `httpie` usage in existing scripts
|
|
29
|
+
- Postman collection files
|
|
30
|
+
|
|
31
|
+
**Proposed verify pattern:**
|
|
32
|
+
Direct HTTP assertion against a local port:
|
|
33
|
+
```yaml
|
|
34
|
+
verify:
|
|
35
|
+
- curl -fsS http://localhost:3000/health | jq '.status == "ok"'
|
|
36
|
+
```
|
|
37
|
+
|
|
38
|
+
### DB — Database schema and queries
|
|
39
|
+
|
|
40
|
+
**Detection signals:**
|
|
41
|
+
- `docker-compose` postgres service defined
|
|
42
|
+
- `prisma`, `drizzle-kit`, `knex`, or `flyway` in dependencies
|
|
43
|
+
- `test/db` or similar helper directory
|
|
44
|
+
|
|
45
|
+
**Proposed verify pattern:**
|
|
46
|
+
Postgres readiness + migration + assertion:
|
|
47
|
+
```yaml
|
|
48
|
+
verify:
|
|
49
|
+
- pg_isready -h localhost -p 5432
|
|
50
|
+
- pnpm prisma migrate deploy
|
|
51
|
+
- pnpm tsx scripts/verify-db.ts
|
|
52
|
+
```
|
|
53
|
+
|
|
54
|
+
### Integration — Cross-module workflows
|
|
55
|
+
|
|
56
|
+
**Detection signals:**
|
|
57
|
+
- `test/integration/**` directory exists
|
|
58
|
+
- `e2e/**` directory exists
|
|
59
|
+
- `*.integration.test.ts` files
|
|
60
|
+
|
|
61
|
+
**Proposed verify pattern:**
|
|
62
|
+
Integration test runner scoped to relevant paths:
|
|
63
|
+
```yaml
|
|
64
|
+
verify:
|
|
65
|
+
- pnpm test test/integration
|
|
66
|
+
```
|
|
67
|
+
|
|
68
|
+
### Browser-based component — Storybook stories
|
|
69
|
+
|
|
70
|
+
**Detection signals:**
|
|
71
|
+
- `storybook` or `@storybook/*` in dependencies
|
|
72
|
+
- `*.stories.{ts,tsx}` files present
|
|
73
|
+
|
|
74
|
+
**Proposed verify pattern:**
|
|
75
|
+
Storybook test or Chromatic visual verification:
|
|
76
|
+
```yaml
|
|
77
|
+
verify:
|
|
78
|
+
- pnpm storybook test --stories "ComponentName"
|
|
79
|
+
```
|
|
80
|
+
|
|
81
|
+
### CLI — Command-line interface
|
|
82
|
+
|
|
83
|
+
**Detection signals:**
|
|
84
|
+
- `bin/*` directory with executables
|
|
85
|
+
- `package.json` `bin:` entry defined
|
|
86
|
+
|
|
87
|
+
**Proposed verify pattern:**
|
|
88
|
+
Smoke test via help flag or scripted invocation:
|
|
89
|
+
```yaml
|
|
90
|
+
verify:
|
|
91
|
+
- pnpm my-cli --help
|
|
92
|
+
- pnpm tsx scripts/smoke-test-cli.ts
|
|
93
|
+
```
|
|
94
|
+
|
|
95
|
+
## Question-bundling rule
|
|
96
|
+
|
|
97
|
+
**Two or more surfaces detected:** Bundle into a single structured `question` tool call with one checkbox group per surface.
|
|
98
|
+
|
|
99
|
+
**One surface detected:** Still ask (confirmation, not interrogation), but use a single-field call.
|
|
100
|
+
|
|
101
|
+
**Zero surfaces detected:** Skip the QA-expectation question entirely. Fall back to generic verifies:
|
|
102
|
+
```yaml
|
|
103
|
+
defaults:
|
|
104
|
+
verify_after_each:
|
|
105
|
+
- pnpm run typecheck
|
|
106
|
+
- pnpm test
|
|
107
|
+
```
|
|
108
|
+
|
|
109
|
+
## Emission
|
|
110
|
+
|
|
111
|
+
Confirmed patterns become:
|
|
112
|
+
|
|
113
|
+
1. **Per-task verify templates** — tasks targeting specific files use scoped verifies (e.g., `pnpm test test/api/users.test.ts` for a task touching `src/api/users.ts`)
|
|
114
|
+
2. **defaults.verify_after_each** — global breakage catchers (typecheck, full test suite)
|
|
115
|
+
|
|
116
|
+
The rule: per-task verify targets the specific files touched; defaults catches global breakage.
|
|
117
|
+
|
|
118
|
+
## Cross-reference to verify-design.md
|
|
119
|
+
|
|
120
|
+
This rule (10) is the per-surface tactical layer — it names the tools to detect and the patterns to propose. Rule 3 (verify-design.md) owns the principles: deterministic, assertive, would-have-failed-before. Every proposed command must satisfy both layers.
|
|
@@ -16,7 +16,7 @@ The validator catches schema, DAG, and glob errors. It cannot catch "this verify
|
|
|
16
16
|
|
|
17
17
|
5. **Are there missing edges?** Look at every pair of tasks that share files in their `touches:`. Do they need an order? If T2's verify exercises code T1 introduces, T2 depends on T1 — even if their `touches:` don't overlap.
|
|
18
18
|
|
|
19
|
-
6. **
|
|
19
|
+
6. **Does the DAG concentrate too much value in one task?** Task-level `cascadeFail` only blocks transitive DEPENDENTS of the failed task — sibling subtrees in a disconnected DAG keep running. So plan size is not itself a risk. The real risk is a task everything else depends on: a schema migration that all downstream work reads, a core-type definition all imports reference, a shared config every consumer parses. If THAT task fails, the whole run stalls. Is there such a task in your plan? If yes, can it be simplified — smaller diff, tighter verify, higher success probability? Don't over-concentrate; a plan where 80% of tasks depend on T1 and T1 is complex is fragile by design.
|
|
20
20
|
|
|
21
21
|
7. **Could you read this plan in 6 months and understand it?** Plan names + task titles + prompts should be a self-explanatory summary of the work. If the plan needs a verbal preamble to make sense, rewrite the prompts.
|
|
22
22
|
|
|
@@ -45,3 +45,37 @@ If the verify commands would FAIL without edits, an empty `touches` is a STOP
|
|
|
45
45
|
- **Including the migrations dir for a non-migration task.** Tight scope.
|
|
46
46
|
|
|
47
47
|
When in doubt, write the tightest possible scope first. If the task fails verify with "touches violation: src/X.ts", the worker shows you which file got touched — broaden then.
|
|
48
|
+
|
|
49
|
+
## `tolerate:` — files allowed in the diff but outside the contract
|
|
50
|
+
|
|
51
|
+
When a task's verify step runs a tool that writes files as a side-effect (codegen, build, snapshots), those files will appear in `git diff` even though the agent didn't author them. Add them to `tolerate:` so enforcement accepts them without counting them as part of the task's output.
|
|
52
|
+
|
|
53
|
+
Two categories to watch for:
|
|
54
|
+
|
|
55
|
+
**Built-in defaults (already tolerated — don't list these):**
|
|
56
|
+
- `**/next-env.d.ts` — Next.js regenerates on every `next build`.
|
|
57
|
+
- `**/.next/types/**`, `**/.next/dev/types/**` — Next.js app-router generated types.
|
|
58
|
+
- `**/*.tsbuildinfo` — TypeScript project-reference build cache.
|
|
59
|
+
- `**/__snapshots__/**`, `**/*.snap` — Jest / Vitest snapshot files rewritten by `-u`.
|
|
60
|
+
|
|
61
|
+
**Project-specific (list in `tolerate:` per task):**
|
|
62
|
+
- Prisma client output (e.g., `prisma/client/**` if `prisma generate` runs in verify).
|
|
63
|
+
- GraphQL codegen output (`graphql/generated/**`, `*.graphql.d.ts`).
|
|
64
|
+
- OpenAPI codegen output (`api-types/generated/**`).
|
|
65
|
+
- Anywhere you have a build step that writes type declarations downstream of the agent's source edits.
|
|
66
|
+
|
|
67
|
+
A good test: if the task's verify step runs `prisma generate`, `pnpm codegen`, `next build`, or similar, ask: "does that command write files anywhere?" If yes, those paths go in `tolerate:`.
|
|
68
|
+
|
|
69
|
+
### Example
|
|
70
|
+
|
|
71
|
+
```yaml
|
|
72
|
+
- id: T-ADD-RULE-MODEL
|
|
73
|
+
touches:
|
|
74
|
+
- prisma/schema.prisma
|
|
75
|
+
- src/models/rule.ts
|
|
76
|
+
tolerate:
|
|
77
|
+
- prisma/client/** # prisma generate output
|
|
78
|
+
verify:
|
|
79
|
+
- pnpm prisma generate
|
|
80
|
+
- pnpm --filter core test rule-model
|
|
81
|
+
```
|