mandrel 1.65.0 → 1.66.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.agents/docs/workflows.md +1 -1
- package/.agents/workflows/qa-assist.md +214 -141
- package/docs/CHANGELOG.md +7 -0
- package/package.json +1 -1
|
@@ -52,6 +52,6 @@ description, edit the workflow file’s front-matter and regenerate.
|
|
|
52
52
|
| `/git-pr-all` | Stage all outstanding changes, commit, push to a feature branch, and open a pull request with native auto-merge enabled. |
|
|
53
53
|
| `/git-push` | Commit all outstanding changes then push to the remote repository. |
|
|
54
54
|
| `/plan` | Unified planning entry point. Routes a seed idea (via scope triage) or an existing Epic ID to the right planning path — the full Epic pipeline (PRD, Tech Spec, Acceptance Spec, decomposition) or the standalone-Story authoring path — and absorbs every planning flag. |
|
|
55
|
-
| `/qa-assist` | Human-led QA assist loop —
|
|
55
|
+
| `/qa-assist` | Human-led QA assist loop — set up, then ride a rolling multi-observation intake session. The operator reports observations in any order; the agent enriches each (repro + root-cause file:line + coverage verdict for bugs; analysis + options + recommendation for enhancements), asks clarifying questions only when ambiguous, and appends a redacted ledger item — recording, never planning — to a persistent, resumable session under temp/qa/. Only when the operator says they are done does it review the full ledger and hand off to /plan. |
|
|
56
56
|
| `/qa-explore` | Agent-led exploratory-QA loop — the agent Plans a surface with an explicit static-vs-drive method choice, drives it (browser MCP or static), and captures ledger items read-only, then Triages — a bounded per-surface session, HITL-gated at every phase transition, routed through the shared dedup/coverage/classification/missing-test/redaction/session core under temp/qa/ |
|
|
57
57
|
| `/qa-run-harness` | Drive Gherkin scenarios through a real browser as an agent-driven QA sweep |
|
|
@@ -1,28 +1,37 @@
|
|
|
1
1
|
---
|
|
2
|
-
description: Human-led QA assist loop —
|
|
2
|
+
description: Human-led QA assist loop — set up, then ride a rolling multi-observation intake session. The operator reports observations in any order; the agent enriches each (repro + root-cause file:line + coverage verdict for bugs; analysis + options + recommendation for enhancements), asks clarifying questions only when ambiguous, and appends a redacted ledger item — recording, never planning — to a persistent, resumable session under temp/qa/. Only when the operator says they are done does it review the full ledger and hand off to /plan.
|
|
3
3
|
---
|
|
4
4
|
|
|
5
5
|
# /qa-assist
|
|
6
6
|
|
|
7
|
-
Drive a **human-led QA-assist session
|
|
8
|
-
|
|
9
|
-
|
|
10
|
-
|
|
11
|
-
|
|
12
|
-
|
|
13
|
-
|
|
14
|
-
**
|
|
7
|
+
Drive a **human-led, rolling QA-assist session**. The operator tests; the
|
|
8
|
+
agent rides alongside as the QA engineer and captures what they see into a
|
|
9
|
+
high-quality, triage-ready ledger. The session has four movements:
|
|
10
|
+
|
|
11
|
+
1. **Setup & Ready** (Phase 0) — load codebase context, resolve the contract,
|
|
12
|
+
open (or resume) the rolling ledger, then tell the operator what it will do
|
|
13
|
+
and that it is **ready for observations**.
|
|
14
|
+
2. **Rolling intake** (Phases 1–3, looped) — the operator reports observations
|
|
15
|
+
**in any order and any quantity**: one at a time, or a **brain dump** of many
|
|
16
|
+
at once in a single message. The agent splits a multi-observation message
|
|
17
|
+
into discrete items and runs each through **Intake → Enrich → Record**, then
|
|
18
|
+
**loops straight back** to wait for more. It **records and enriches only — it
|
|
19
|
+
never plans or fixes during intake.**
|
|
20
|
+
3. **Done** — when the operator says they have finished testing, the agent does
|
|
21
|
+
a final review of the **entire** ledger and asks any last clarifying
|
|
22
|
+
questions.
|
|
23
|
+
4. **Triage & Plan** (Phase 4) — only then does the agent route the full ledger
|
|
24
|
+
into [`/plan`](plan.md) to generate Epics and/or Stories.
|
|
15
25
|
|
|
16
26
|
Unlike [`/qa-explore`](qa-explore.md) (where the *agent* drives open-ended
|
|
17
|
-
exploration of a named surface
|
|
18
|
-
|
|
19
|
-
|
|
20
|
-
|
|
21
|
-
`
|
|
22
|
-
[`qa-ledger.schema.json`](../schemas/qa-ledger.schema.json) contract, the same
|
|
27
|
+
exploration of a named surface), `/qa-assist` is **human-led**: the human owns
|
|
28
|
+
the signal, the agent owns the enrichment. It is the front door for "I'm
|
|
29
|
+
testing — ride along and capture everything well." Each observation is recorded
|
|
30
|
+
as a `QaLedgerItem` against the
|
|
31
|
+
[`qa-ledger.schema.json`](../schemas/qa-ledger.schema.json) contract — the same
|
|
23
32
|
ledger `/qa-explore` and the triage/promotion path consume — so a `/qa-assist`
|
|
24
33
|
item flows through the identical dedup, classification, and promotion machinery
|
|
25
|
-
|
|
34
|
+
in Phase 4.
|
|
26
35
|
|
|
27
36
|
This is a **prose workflow**, not a Node orchestrator: the host LLM executes
|
|
28
37
|
the procedure; deterministic Node helpers under `.agents/scripts/lib/qa/` and
|
|
@@ -31,10 +40,10 @@ resolution, context hydration, redaction, coverage verdict, classification,
|
|
|
31
40
|
dedup/route, and promotion. **The agent consumes the shared core helpers; it
|
|
32
41
|
never reimplements those decisions in prose.**
|
|
33
42
|
|
|
34
|
-
> **When to run**: a developer or operator
|
|
35
|
-
>
|
|
36
|
-
>
|
|
37
|
-
>
|
|
43
|
+
> **When to run**: a developer or operator is about to test (or is mid-test) and
|
|
44
|
+
> wants every bug and enhancement idea captured as a high-quality,
|
|
45
|
+
> triage-ready finding without breaking stride — then, when the testing pass is
|
|
46
|
+
> done, turned into a plan in one batch.
|
|
38
47
|
>
|
|
39
48
|
> **Persona**: `qa-engineer` · **Skills**: `core/qa-coverage-mapping`
|
|
40
49
|
|
|
@@ -45,8 +54,7 @@ Adopt the **`qa-engineer`** persona
|
|
|
45
54
|
run. You are the quality gatekeeper: you value coverage, hermetic
|
|
46
55
|
environments, deterministic results, and — per that persona's Golden Rule —
|
|
47
56
|
you **never invent the signal**. The human owns what was observed; you enrich
|
|
48
|
-
it. Re-read that persona file as your first action
|
|
49
|
-
Intake/Enrich/Record loop is governed by it.
|
|
57
|
+
it. Re-read that persona file as your first action.
|
|
50
58
|
|
|
51
59
|
## Slash Command
|
|
52
60
|
|
|
@@ -56,17 +64,17 @@ Intake/Enrich/Record loop is governed by it.
|
|
|
56
64
|
|
|
57
65
|
### Arguments
|
|
58
66
|
|
|
59
|
-
| Name | Required | Shape / Example | Notes
|
|
60
|
-
| ------------- | -------- | ------------------------------------------------- |
|
|
61
|
-
| `observation` | no | `"sync-commands wipes .claude on a reused name"` |
|
|
67
|
+
| Name | Required | Shape / Example | Notes |
|
|
68
|
+
| ------------- | -------- | ------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------- |
|
|
69
|
+
| `observation` | no | `"sync-commands wipes .claude on a reused name"` | An optional first observation, or a brain dump of several. **Usually omitted** — the normal launch is a bare `/qa-assist`, which does Setup and then waits. If supplied, run Setup first, then feed it in as the first intake (splitting it if it carries multiple observations). |
|
|
62
70
|
|
|
63
|
-
|
|
64
|
-
|
|
65
|
-
|
|
71
|
+
A bare `/qa-assist` is the expected entry point. **Do not** demand an
|
|
72
|
+
observation up front and **do not** synthesize one — the `qa-engineer` Golden
|
|
73
|
+
Rule forbids inventing the signal. Set up, announce ready, and wait.
|
|
66
74
|
|
|
67
75
|
## Project contract
|
|
68
76
|
|
|
69
|
-
Resolve the consumer's `qa` contract
|
|
77
|
+
Resolve the consumer's `qa` contract during Setup, via
|
|
70
78
|
[`resolve-qa-contract.js`](../scripts/lib/qa/resolve-qa-contract.js):
|
|
71
79
|
|
|
72
80
|
```js
|
|
@@ -81,9 +89,10 @@ to the operator and stop; do not pretend a contract exists.
|
|
|
81
89
|
|
|
82
90
|
## Session & ledger (temp/qa/) — persistent, resumable, rolling
|
|
83
91
|
|
|
84
|
-
`/qa-assist` **defaults to a persistent rolling session**: the same session
|
|
85
|
-
|
|
86
|
-
working day. Resolve the session and its ledger
|
|
92
|
+
`/qa-assist` **defaults to a persistent rolling session**: the same session is
|
|
93
|
+
resumed across invocations so an operator can top up the same ledger across a
|
|
94
|
+
working day or a multi-launch testing pass. Resolve the session and its ledger
|
|
95
|
+
path **once**, during Setup, via
|
|
87
96
|
[`qa-session.js`](../scripts/lib/qa/qa-session.js):
|
|
88
97
|
|
|
89
98
|
```js
|
|
@@ -99,56 +108,101 @@ const { sessionId, ledgerPath, reused, untriaged } = resolveQaSession({ config }
|
|
|
99
108
|
- When `reused` is `true`, a prior session of the same id exists: **append**,
|
|
100
109
|
never overwrite, and surface the carried `untriaged` items as the rolling
|
|
101
110
|
backlog so the operator sees what is still open. Pass `--session-id <id>`
|
|
102
|
-
(or `QA_SESSION_ID`) to resume or fork a named session.
|
|
103
|
-
|
|
104
|
-
|
|
111
|
+
(or `QA_SESSION_ID`) to resume or fork a named session. A `/qa-assist` run is
|
|
112
|
+
additive to the prior ledger by default — this is the resumable rolling
|
|
113
|
+
session contract.
|
|
105
114
|
|
|
106
115
|
## Phase gates (HITL)
|
|
107
116
|
|
|
108
|
-
|
|
109
|
-
|
|
110
|
-
|
|
111
|
-
|
|
112
|
-
|
|
113
|
-
|
|
117
|
+
This is a HITL workflow, but the gating is deliberately **light during intake
|
|
118
|
+
and firm at the boundary**, so the rolling loop stays fluid:
|
|
119
|
+
|
|
120
|
+
- **Within a single observation, Intake → Enrich → Record is fluid.** The agent
|
|
121
|
+
restates, enriches, and appends the ledger item without a ceremony, pausing
|
|
122
|
+
only to **ask clarifying questions when the observation is ambiguous**. After
|
|
123
|
+
each append it **echoes the recorded item** so the operator can correct it,
|
|
124
|
+
then **loops back to wait for the next observation**. The agent does **not**
|
|
125
|
+
triage, route, file tickets, or invoke `/plan` during intake.
|
|
126
|
+
- **Two things always require explicit operator confirmation.** First, the
|
|
127
|
+
session-level transition from rolling intake into **Phase 4 — Triage & Plan**
|
|
128
|
+
— the agent never starts planning on its own; the operator must say they are
|
|
129
|
+
done. Second, **every write that leaves the local ledger** — filing a ticket,
|
|
130
|
+
invoking `/plan`, or mutating a label. Present the artifact, ask, and wait. If
|
|
131
|
+
the operator does not confirm, hold.
|
|
132
|
+
|
|
133
|
+
In short: appending to the rolling ledger is the natural product of intake and
|
|
134
|
+
needs no gate beyond the echo-back; **planning and anything that leaves the
|
|
135
|
+
ledger is hard-gated.**
|
|
114
136
|
|
|
115
137
|
---
|
|
116
138
|
|
|
117
|
-
## Phase
|
|
118
|
-
|
|
119
|
-
Goal:
|
|
120
|
-
|
|
121
|
-
|
|
122
|
-
|
|
123
|
-
(
|
|
124
|
-
is
|
|
125
|
-
|
|
126
|
-
|
|
127
|
-
|
|
128
|
-
|
|
129
|
-
|
|
130
|
-
|
|
131
|
-
|
|
132
|
-
|
|
133
|
-
|
|
134
|
-
-
|
|
135
|
-
|
|
136
|
-
|
|
137
|
-
|
|
138
|
-
|
|
139
|
-
confirm the restatement is faithful.
|
|
139
|
+
## Phase 0 — Setup & Ready
|
|
140
|
+
|
|
141
|
+
Goal: become the operator's QA assistant before any observation arrives.
|
|
142
|
+
|
|
143
|
+
1. Re-read the `qa-engineer` persona.
|
|
144
|
+
2. **Load codebase context.** Read the files in `project.docsContextFiles`
|
|
145
|
+
(architecture, decisions, patterns) and, when the testing touches UI/routing,
|
|
146
|
+
`docs/style-guide.md` / `docs/web-routes.md`. This is the context you will
|
|
147
|
+
draw on to enrich observations without guessing.
|
|
148
|
+
3. **Resolve the `qa` contract and the rolling session** (above). Compute the
|
|
149
|
+
ledger path and load any carried `untriaged` backlog.
|
|
150
|
+
4. **Announce readiness.** Tell the operator, in one short message:
|
|
151
|
+
- which session this is (new vs. resumed) and how many items are already on
|
|
152
|
+
the ledger;
|
|
153
|
+
- what you will do with each observation (enrich bugs with repro +
|
|
154
|
+
root-cause `file:line` + coverage; enrich enhancements with analysis +
|
|
155
|
+
options + a recommendation) and that you will **record only, not plan**;
|
|
156
|
+
- that you are **ready for observations, in any order**, and that they
|
|
157
|
+
should tell you when they are **done testing** to move into triage/planning.
|
|
158
|
+
5. **Wait.** Do not invent an observation. If `/qa-assist` was launched with an
|
|
159
|
+
`observation` argument, treat it as the first intake and proceed to Phase 1;
|
|
160
|
+
otherwise wait for the operator's first report.
|
|
140
161
|
|
|
141
162
|
---
|
|
142
163
|
|
|
143
|
-
## Phase
|
|
164
|
+
## Phase 1 — Intake (per observation, looped)
|
|
165
|
+
|
|
166
|
+
Goal: understand **exactly what the human observed** before enriching it. The
|
|
167
|
+
operator's message may carry **one observation or a brain dump of many**; this
|
|
168
|
+
phase first splits the message into discrete observations, then runs Intake for
|
|
169
|
+
**each** of them, before returning here for the next message.
|
|
170
|
+
|
|
171
|
+
1. **Split a brain dump into discrete observations.** Parse the operator's
|
|
172
|
+
message into the distinct things they observed — one ledger item per
|
|
173
|
+
distinct symptom, surface, or idea. Use their own structure (numbered or
|
|
174
|
+
bulleted list, blank-line-separated paragraphs, "and another thing…") as the
|
|
175
|
+
split boundary; do **not** merge two unrelated symptoms into one item or
|
|
176
|
+
split a single symptom into several. **Echo the parsed list back** ("I read
|
|
177
|
+
N observations: …") and let the operator correct the split before you
|
|
178
|
+
enrich anything — this is the only confirmation intake requires. A
|
|
179
|
+
single-observation message is just the N = 1 case; skip the echo when it is
|
|
180
|
+
unambiguously one item.
|
|
181
|
+
2. **Process each observation in turn** through the rest of this phase and
|
|
182
|
+
Phases 2–3. For each one:
|
|
183
|
+
- **Restate the observation** in your own words — the surface it touches, the
|
|
184
|
+
action taken, the actual result, and (for a bug) the expected result, or
|
|
185
|
+
(for an enhancement) the desired improvement. This restatement is your read
|
|
186
|
+
of the signal.
|
|
187
|
+
- **Ask clarifying questions only when that observation is ambiguous.** If
|
|
188
|
+
you cannot confidently fill in the load-bearing facts, **ask** — do not
|
|
189
|
+
paper over the gap with an assumption. Typical gaps: which
|
|
190
|
+
surface/command/flow; the exact steps and whether it reproduces or is
|
|
191
|
+
intermittent; what was expected and why that is the contract; the
|
|
192
|
+
environment (OS, shell, branch, fresh vs. reused state). When the
|
|
193
|
+
observation is already clear, **do not interrogate** — move straight to
|
|
194
|
+
Enrich. Batch the questions across the brain dump into one message rather
|
|
195
|
+
than interrogating item-by-item, so the operator answers them all at once.
|
|
144
196
|
|
|
145
|
-
|
|
146
|
-
|
|
147
|
-
|
|
197
|
+
---
|
|
198
|
+
|
|
199
|
+
## Phase 2 — Enrich (per observation)
|
|
148
200
|
|
|
149
|
-
|
|
150
|
-
|
|
151
|
-
|
|
201
|
+
Goal: turn the observation into a high-quality, triage-ready finding. Delegate
|
|
202
|
+
every decision to the shared core helpers; never re-derive them in prose.
|
|
203
|
+
|
|
204
|
+
1. **Redact first.** Before any evidence string touches disk or GitHub, scrub
|
|
205
|
+
it through [`redact-evidence.js`](../scripts/lib/qa/redact-evidence.js):
|
|
152
206
|
|
|
153
207
|
```js
|
|
154
208
|
import { redactEvidence } from '../scripts/lib/qa/redact-evidence.js';
|
|
@@ -157,42 +211,42 @@ decision to the shared core helpers.
|
|
|
157
211
|
|
|
158
212
|
This is mandatory per [`security-baseline.md`](../rules/security-baseline.md)
|
|
159
213
|
(§ Data Leakage & Logging, § Secrets Management) — bearer tokens, session
|
|
160
|
-
cookies, and emails are masked. The pass is idempotent, so redact eagerly
|
|
161
|
-
|
|
162
|
-
|
|
163
|
-
|
|
164
|
-
|
|
165
|
-
|
|
166
|
-
|
|
167
|
-
|
|
168
|
-
|
|
169
|
-
|
|
170
|
-
|
|
171
|
-
|
|
172
|
-
|
|
214
|
+
cookies, and emails are masked. The pass is idempotent, so redact eagerly.
|
|
215
|
+
|
|
216
|
+
2. **Branch on what kind of observation it is.**
|
|
217
|
+
- **Bug.** Establish a clean, minimal, deterministic **repro**. Investigate
|
|
218
|
+
the **root cause**: read the relevant code, console output, and logs for
|
|
219
|
+
errors, and pin the locus as a concrete **`file:line`** reference (say so
|
|
220
|
+
explicitly if you cannot pin it rather than inventing a locus). Then run
|
|
221
|
+
the coverage steps below.
|
|
222
|
+
- **Enhancement / suggestion.** Analyze **how** the change would be made:
|
|
223
|
+
the surfaces it touches, the **options** for implementing it, and a brief
|
|
224
|
+
**recommendation** with trade-offs. Record these notes on the ledger item.
|
|
225
|
+
Still pin the relevant `file:line` anchor(s) where the change would land.
|
|
226
|
+
|
|
227
|
+
3. **Hydrate the QA context** to locate code precisely, via
|
|
228
|
+
[`qa-context-hydrator.js`](../scripts/lib/qa/qa-context-hydrator.js) — it
|
|
229
|
+
resolves the Epic/Feature context tickets, the feature-file set, the surface
|
|
230
|
+
map, and recent git log:
|
|
173
231
|
|
|
174
232
|
```js
|
|
175
233
|
import { hydrateQaContext } from '../scripts/lib/qa/qa-context-hydrator.js';
|
|
176
234
|
const context = await hydrateQaContext({ epicNumber, githubPort, gitPort, surfaceMap });
|
|
177
235
|
```
|
|
178
236
|
|
|
179
|
-
Record the root cause as a concrete `file:line` reference. If you cannot
|
|
180
|
-
pin it, say so explicitly in the ledger item rather than inventing a locus.
|
|
181
|
-
|
|
182
237
|
4. **Compute the coverage verdict** for the surface the observation points at,
|
|
183
238
|
via [`coverage-verdict.js`](../scripts/lib/qa/coverage-verdict.js) — the
|
|
184
239
|
deterministic seam behind the
|
|
185
240
|
[`core/qa-coverage-mapping`](../skills/core/qa-coverage-mapping/SKILL.md)
|
|
186
|
-
skill. Read that skill for how to assemble the `surface` input
|
|
187
|
-
the
|
|
188
|
-
|
|
189
|
-
|
|
241
|
+
skill. Read that skill for how to assemble the `surface` input and how to
|
|
242
|
+
read the per-tier `{present|absent}` verdict. Optionally render a
|
|
243
|
+
human-readable summary via
|
|
244
|
+
[`coverage-report.js`](../scripts/lib/qa/coverage-report.js).
|
|
190
245
|
|
|
191
246
|
5. **Propose the missing test** (if any) from that verdict, via
|
|
192
247
|
[`propose-missing-test.js`](../scripts/lib/qa/propose-missing-test.js). It
|
|
193
|
-
names the lowest absent tier
|
|
194
|
-
|
|
195
|
-
`description` as the ledger item's `missingTest` (or `null`).
|
|
248
|
+
names the lowest absent tier, or returns `null` when every tier is covered.
|
|
249
|
+
Record the proposal's `description` as the ledger item's `missingTest`.
|
|
196
250
|
|
|
197
251
|
6. **Classify** the finding via
|
|
198
252
|
[`classify-finding.js`](../scripts/lib/findings/classify-finding.js) so the
|
|
@@ -201,33 +255,50 @@ decision to the shared core helpers.
|
|
|
201
255
|
`meta::consumer-improvement`). The helper **throws** on an absent/unknown
|
|
202
256
|
class — fix the finding's class rather than defaulting.
|
|
203
257
|
|
|
204
|
-
7. **Gate:** present the enriched candidate `QaLedgerItem` (redacted evidence,
|
|
205
|
-
repro, root-cause `file:line`, coverage verdict, `class`, `severity`,
|
|
206
|
-
`missingTest`) and ask the operator to confirm it is accurate before any
|
|
207
|
-
write. Do **not** append to the ledger until they confirm.
|
|
208
|
-
|
|
209
258
|
---
|
|
210
259
|
|
|
211
|
-
## Phase 3 — Record
|
|
260
|
+
## Phase 3 — Record (per observation), then loop
|
|
212
261
|
|
|
213
|
-
Goal: persist the enriched
|
|
214
|
-
|
|
262
|
+
Goal: persist the enriched finding to the rolling ledger and **return to
|
|
263
|
+
intake**. **No triage, routing, ticket-filing, or `/plan` happens here** — that
|
|
264
|
+
is Phase 4, and only after the operator says they are done.
|
|
215
265
|
|
|
216
266
|
1. **Append a `QaLedgerItem`** to `temp/qa/<sessionId>.ndjson`, conforming to
|
|
217
267
|
[`qa-ledger.schema.json`](../schemas/qa-ledger.schema.json): a stable `id`
|
|
218
|
-
(`L1`, `L2`, … appended after any carried backlog), the redacted
|
|
219
|
-
|
|
220
|
-
|
|
221
|
-
`
|
|
222
|
-
|
|
223
|
-
|
|
224
|
-
|
|
225
|
-
|
|
226
|
-
|
|
227
|
-
|
|
268
|
+
(`L1`, `L2`, … appended after any carried backlog), the redacted `evidence`,
|
|
269
|
+
the repro and root-cause `file:line` notes (or the enhancement
|
|
270
|
+
analysis/options/recommendation), the `coverage` label, the `class` and
|
|
271
|
+
`severity`, the `missingTest`, and a `disposition` of **untriaged** (intake
|
|
272
|
+
does not decide disposition — Phase 4 does).
|
|
273
|
+
2. **Echo the recorded item** back in one short line — its `class`, `severity`,
|
|
274
|
+
root-cause locus or recommendation, and coverage verdict — so the operator
|
|
275
|
+
can correct it on the spot. When a brain dump produced several items, append
|
|
276
|
+
them all, then echo a **compact batch summary** (one line per new `Lx` item)
|
|
277
|
+
instead of a separate message per item.
|
|
278
|
+
3. **Loop back to Phase 1** and wait for the next message. Keep doing this for
|
|
279
|
+
as many observations as the operator reports — one at a time or in batches,
|
|
280
|
+
in any order — until they say they are done testing.
|
|
281
|
+
|
|
282
|
+
---
|
|
283
|
+
|
|
284
|
+
## Phase 4 — Triage & Plan (on "I'm done")
|
|
285
|
+
|
|
286
|
+
Goal: when the operator says they have finished testing, turn the **whole**
|
|
287
|
+
ledger into a plan. This is the only phase that triages, routes, or plans, and
|
|
288
|
+
its transition is **explicitly operator-gated**.
|
|
289
|
+
|
|
290
|
+
1. **Final ledger review.** Read the entire rolling ledger back to the
|
|
291
|
+
operator: every item, its class/severity, root-cause or recommendation, and
|
|
292
|
+
coverage verdict. Confirm it is complete and ask any **last clarifying
|
|
293
|
+
questions** — missing repro, an item that should be split or merged, a
|
|
294
|
+
severity to adjust. Let the operator set each item's disposition
|
|
295
|
+
(`file` / `defer` / `dismiss`).
|
|
296
|
+
|
|
297
|
+
2. **Dedup / route** each `file`-dispositioned finding against existing GitHub
|
|
298
|
+
Issues via [`route-finding.js`](../scripts/lib/findings/route-finding.js)
|
|
299
|
+
(the **single** dedup implementation shared with `/qa-explore` and
|
|
228
300
|
`audit-to-stories`), backed by
|
|
229
|
-
[`semantic-issue-search.js`](../scripts/lib/findings/semantic-issue-search.js)
|
|
230
|
-
for candidate recall:
|
|
301
|
+
[`semantic-issue-search.js`](../scripts/lib/findings/semantic-issue-search.js):
|
|
231
302
|
|
|
232
303
|
```js
|
|
233
304
|
import { routeFinding, fingerprintFooter } from '../scripts/lib/findings/route-finding.js';
|
|
@@ -239,8 +310,7 @@ and optionally route or promote it — with the operator deciding each write.
|
|
|
239
310
|
`regression-of-closed`. Stamp the `fingerprintFooter(sha)` marker into any
|
|
240
311
|
Issue body so future runs dedup against it.
|
|
241
312
|
|
|
242
|
-
3. **Promote
|
|
243
|
-
GitHub Issue) via
|
|
313
|
+
3. **Promote the full ledger through `/plan`** (never a raw GitHub Issue) via
|
|
244
314
|
[`promote-finding.js`](../scripts/lib/findings/promote-finding.js), which
|
|
245
315
|
clusters, sizes, routes, and files through the same ports `/qa-explore` and
|
|
246
316
|
`/audit-to-stories` consume — never hand-roll the promotion, the clustering,
|
|
@@ -275,37 +345,40 @@ and optionally route or promote it — with the operator deciding each write.
|
|
|
275
345
|
`fingerprintFooter(sha)` into the `/plan --idea` seed, then chain
|
|
276
346
|
`/plan --idea <seed>`. **Known limitation (not solved here):**
|
|
277
347
|
per-child-Story fingerprint propagation through full Epic decomposition is
|
|
278
|
-
*not* guaranteed — the fingerprint is carried in the Epic seed only
|
|
279
|
-
child Stories `/plan` spawns from that seed are not individually
|
|
280
|
-
footer-stamped.
|
|
348
|
+
*not* guaranteed — the fingerprint is carried in the Epic seed only.
|
|
281
349
|
- **A `file` disposition never opens a raw GitHub Issue.** Every `file`
|
|
282
350
|
finding flows through `promoteFindings` → `/plan`; only `defer` (carry
|
|
283
|
-
forward as backlog) and `dismiss` (non-actionable) skip the
|
|
284
|
-
handoff.
|
|
351
|
+
forward as backlog) and `dismiss` (non-actionable) skip the handoff.
|
|
285
352
|
|
|
286
|
-
4. **Gate:**
|
|
287
|
-
|
|
288
|
-
|
|
289
|
-
pauses at its own HITL gates and never auto-delivers. Redaction has
|
|
290
|
-
run, so nothing unredacted reaches disk or GitHub.
|
|
353
|
+
4. **Gate:** the move into this phase, and every write inside it (seed write,
|
|
354
|
+
`/plan` invocation, ticket-filing, label mutation), is **operator-gated** —
|
|
355
|
+
confirm each one. The plan→deliver hard stop is preserved: each `/plan`
|
|
356
|
+
chain pauses at its own HITL gates and never auto-delivers. Redaction has
|
|
357
|
+
already run, so nothing unredacted reaches disk or GitHub.
|
|
291
358
|
|
|
292
|
-
After
|
|
293
|
-
`
|
|
294
|
-
(
|
|
295
|
-
|
|
296
|
-
rolling backlog a resumed session will pick up.
|
|
359
|
+
After planning, summarize: the findings recorded, the route/promotion decisions
|
|
360
|
+
(`new`/`update-existing`/`duplicate`/`regression-of-closed`), whether each
|
|
361
|
+
cluster became a Story (`/plan --from-notes`) or Epic (`/plan --idea`), and any
|
|
362
|
+
`defer` backlog a resumed session will pick up.
|
|
297
363
|
|
|
298
364
|
---
|
|
299
365
|
|
|
300
366
|
## Constraints
|
|
301
367
|
|
|
302
|
-
- **Human-led,
|
|
303
|
-
|
|
304
|
-
|
|
305
|
-
|
|
306
|
-
|
|
307
|
-
|
|
308
|
-
|
|
368
|
+
- **Human-led, rolling, multi-observation.** The operator owns the signal and
|
|
369
|
+
reports observations in any order and any quantity — one at a time or a brain
|
|
370
|
+
dump of many in a single message. The agent splits a brain dump into discrete
|
|
371
|
+
ledger items (echoing the split for correction), then enriches and records
|
|
372
|
+
each one. Never invent an observation; **ask clarifying questions** only when
|
|
373
|
+
an observation is ambiguous, batched across the dump.
|
|
374
|
+
- **Record during intake; plan only on "done".** Phases 1–3 enrich and append
|
|
375
|
+
to the ledger and loop — they never triage, route, file tickets, or invoke
|
|
376
|
+
`/plan`. All of that is **Phase 4**, entered only after explicit operator
|
|
377
|
+
confirmation that testing is done.
|
|
378
|
+
- **Light intake gate, firm boundary gate.** Intake → Enrich → Record is fluid
|
|
379
|
+
(echo-back, no ceremony); the session-level move into Phase 4 and **every
|
|
380
|
+
write** that leaves the local ledger (ticket, `/plan`, label) require
|
|
381
|
+
**explicit operator confirmation**.
|
|
309
382
|
- **Persistent, resumable rolling session.** `/qa-assist` defaults to resuming
|
|
310
383
|
the same session and **appending** to its ledger; a reused session carries
|
|
311
384
|
the un-triaged backlog forward via
|
|
@@ -338,9 +411,9 @@ rolling backlog a resumed session will pick up.
|
|
|
338
411
|
|
|
339
412
|
## See also
|
|
340
413
|
|
|
341
|
-
- [`/plan`](plan.md) — the planning pipeline `/qa-assist` chains into
|
|
342
|
-
|
|
343
|
-
|
|
414
|
+
- [`/plan`](plan.md) — the planning pipeline `/qa-assist` chains into in
|
|
415
|
+
Phase 4 (`--from-notes` for a Story, `--idea` for an Epic). The plan→deliver
|
|
416
|
+
hard stop is preserved across the handoff.
|
|
344
417
|
- [`/qa-explore`](qa-explore.md) — the agent-led sibling that drives a named
|
|
345
418
|
surface and triages through the same `/plan` handoff.
|
|
346
419
|
- [`/audit-to-stories`](audit-to-stories.md) — the precedent for the
|
package/docs/CHANGELOG.md
CHANGED
|
@@ -2,6 +2,13 @@
|
|
|
2
2
|
|
|
3
3
|
All notable changes to this project will be documented in this file.
|
|
4
4
|
|
|
5
|
+
## [1.66.0](https://github.com/dsj1984/mandrel/compare/mandrel-v1.65.0...mandrel-v1.66.0) (2026-06-14)
|
|
6
|
+
|
|
7
|
+
|
|
8
|
+
### Added
|
|
9
|
+
|
|
10
|
+
* **qa:** make /qa-assist a rolling multi-observation intake loop (refs [#4115](https://github.com/dsj1984/mandrel/issues/4115)) ([#4129](https://github.com/dsj1984/mandrel/issues/4129)) ([81e85fa](https://github.com/dsj1984/mandrel/commit/81e85fa6c0b0ac2a8af227d0c8bd6f77fe9a94eb))
|
|
11
|
+
|
|
5
12
|
## [1.65.0](https://github.com/dsj1984/mandrel/compare/mandrel-v1.64.0...mandrel-v1.65.0) (2026-06-14)
|
|
6
13
|
|
|
7
14
|
|
package/package.json
CHANGED