ftown-bridge 0.11.0 → 0.11.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/dist/centrifugo-client.d.ts +6 -1
- package/dist/centrifugo-client.js +21 -1
- package/dist/centrifugo-client.js.map +1 -1
- package/dist/create-ftown-session.d.ts +8 -0
- package/dist/create-ftown-session.js +63 -16
- package/dist/create-ftown-session.js.map +1 -1
- package/dist/ftown-sessions-cli.js +4 -0
- package/dist/ftown-sessions-cli.js.map +1 -1
- package/dist/harness-installer.js +65 -21
- package/dist/harness-installer.js.map +1 -1
- package/dist/index.js +27 -3
- package/dist/index.js.map +1 -1
- package/dist/install-ftown-workflows-cli.js +6 -3
- package/dist/install-ftown-workflows-cli.js.map +1 -1
- package/dist/local-api-server.d.ts +10 -1
- package/dist/local-api-server.js +22 -1
- package/dist/local-api-server.js.map +1 -1
- package/dist/types.d.ts +1 -0
- package/dist/workflow-runner-cli.js +34 -0
- package/dist/workflow-runner-cli.js.map +1 -1
- package/package.json +1 -1
- package/skills/ftown-workflows/SKILL.md +247 -50
- package/skills/ftown-workflows/scripts/example.flow.mjs +130 -82
|
@@ -22,6 +22,147 @@ human-in-the-loop playbook). Use ftown-workflows when the work is scripted and
|
|
|
22
22
|
repeatable; use ftown-orchestrator when you need to improvise or keep a human in
|
|
23
23
|
the loop.
|
|
24
24
|
|
|
25
|
+
## Operating Contract
|
|
26
|
+
|
|
27
|
+
Use ftown-workflows to encode deterministic multi-session control flow: fan-out,
|
|
28
|
+
verification, synthesis, loops, retries, and resumable handoffs. The workflow
|
|
29
|
+
script is where the structure lives: which workers run independently, which
|
|
30
|
+
results are verified, where a barrier is necessary, and what gets returned.
|
|
31
|
+
|
|
32
|
+
Do not infer workflow permission just because a task might benefit from
|
|
33
|
+
parallelism. Run a workflow only when the user explicitly asks for one, asks for
|
|
34
|
+
multi-agent orchestration, asks to fan out workers, names `ftown-workflows`, asks
|
|
35
|
+
to run a specific workflow, or invokes a skill/command whose instructions require
|
|
36
|
+
this skill. Otherwise, answer inline or describe the workflow you would run and
|
|
37
|
+
ask before spending the user's tokens.
|
|
38
|
+
|
|
39
|
+
If the user explicitly says this task **must use ftown-workflows**, a manual
|
|
40
|
+
simulation is not enough. Create a `.flow.mjs` script and run it with
|
|
41
|
+
`~/.ftown/ftown-workflows`.
|
|
42
|
+
|
|
43
|
+
## Scout First
|
|
44
|
+
|
|
45
|
+
Start with a cheap inline scout before writing the workflow script: list relevant
|
|
46
|
+
files, search call sites, scope the diff, read key modules, and identify whether
|
|
47
|
+
the task is understanding, design, review, research, migration, or greenfield
|
|
48
|
+
build. You do not need the final DAG before starting the task; you need the
|
|
49
|
+
work-list and shape before orchestration.
|
|
50
|
+
|
|
51
|
+
Common single-phase shapes:
|
|
52
|
+
|
|
53
|
+
| Intent | Shape |
|
|
54
|
+
| --- | --- |
|
|
55
|
+
| Understand | readers over subsystems -> structured map |
|
|
56
|
+
| Design | independent approaches -> judge panel -> scored synthesis |
|
|
57
|
+
| Review or audit | dimensions -> find -> adversarial verify -> synthesis |
|
|
58
|
+
| Research | search/read sweep -> deep read -> verify -> cited synthesis |
|
|
59
|
+
| Migrate | discover sites -> transform isolated slices -> verify |
|
|
60
|
+
| Greenfield build | scout stack -> contract-first prep -> modules -> compile/review |
|
|
61
|
+
|
|
62
|
+
For large work, run several small workflows in sequence instead of one giant
|
|
63
|
+
script. Read each result before deciding the next phase. A practical split is:
|
|
64
|
+
|
|
65
|
+
```text
|
|
66
|
+
discovery-design.flow.mjs -> implementation-review.flow.mjs
|
|
67
|
+
```
|
|
68
|
+
|
|
69
|
+
The discovery/design workflow discovers constraints, compares approaches, writes
|
|
70
|
+
a durable handoff (`discovery-design.handoff.json`, `plan.json`, `specs/`, a
|
|
71
|
+
rubric), and stops. The implementation/review workflow consumes that handoff,
|
|
72
|
+
implements the work-list, integrates, verifies, and repairs against the rubric.
|
|
73
|
+
|
|
74
|
+
## Pipeline By Default
|
|
75
|
+
|
|
76
|
+
Default to `ctx.pipeline(...)` for multi-stage work. Each item should advance as
|
|
77
|
+
soon as its previous stage finishes; do not make fast items wait for the slowest
|
|
78
|
+
item unless the next stage genuinely needs all prior results at once.
|
|
79
|
+
|
|
80
|
+
A barrier with `ctx.parallel(...)` is correct when a later stage needs
|
|
81
|
+
cross-item context:
|
|
82
|
+
|
|
83
|
+
- deduping or merging all findings before expensive verification
|
|
84
|
+
- early exit when the full result set is empty
|
|
85
|
+
- comparing one finding against the other findings
|
|
86
|
+
|
|
87
|
+
A barrier is not justified by ordinary mapping/filtering, by conceptual phase
|
|
88
|
+
boundaries, or by code tidiness. Put per-item transforms inside a pipeline stage.
|
|
89
|
+
When unsure, pipeline.
|
|
90
|
+
|
|
91
|
+
Use explicit `phase` names in `ctx.agent(..., { phase })` inside pipelines and
|
|
92
|
+
parallel stages so progress groups are stable even when stages interleave.
|
|
93
|
+
|
|
94
|
+
## Quality Patterns
|
|
95
|
+
|
|
96
|
+
Pick and compose these patterns based on the user's request:
|
|
97
|
+
|
|
98
|
+
- **Adversarial verify:** for each claim/finding, spawn independent skeptics
|
|
99
|
+
asked to refute it. Keep a finding only if it survives the vote.
|
|
100
|
+
- **Perspective-diverse verify:** use distinct verifier lenses such as
|
|
101
|
+
correctness, security, performance, reproducibility, and UX instead of cloned
|
|
102
|
+
prompts.
|
|
103
|
+
- **Judge panel:** generate multiple independent solutions, have judges score
|
|
104
|
+
them, then synthesize from the winner while preserving useful ideas from
|
|
105
|
+
runners-up.
|
|
106
|
+
- **Loop-until-dry:** for unknown-size discovery, keep launching finder rounds
|
|
107
|
+
until a fixed number of consecutive rounds returns nothing new.
|
|
108
|
+
- **Multi-modal sweep:** search by different axes (file path, call graph,
|
|
109
|
+
content, timestamp, dependency, runtime behavior) and merge findings.
|
|
110
|
+
- **Completeness critic:** end with a worker that asks what was missed: unread
|
|
111
|
+
sources, unverified claims, uncovered modalities, or dropped work.
|
|
112
|
+
- **No silent caps:** if you cap coverage, sampling, retries, or result counts,
|
|
113
|
+
log what was skipped with `ctx.log()`.
|
|
114
|
+
|
|
115
|
+
Scale to the wording. "Find any bugs" can be a small finder set and one verifier.
|
|
116
|
+
"Thoroughly audit" or a large explicit budget should increase finder diversity,
|
|
117
|
+
verification votes, and loop-until-dry depth.
|
|
118
|
+
|
|
119
|
+
## Dependent Phases
|
|
120
|
+
|
|
121
|
+
`ctx.agent()` returns `null` instead of throwing when a worker times out, exits
|
|
122
|
+
without a result, exhausts budget, or writes `{ "ok": false }`. That is useful
|
|
123
|
+
for optional fan-out, but dangerous for dependent phases. Fail fast before
|
|
124
|
+
implementation, verification, or synthesis depends on a missing result.
|
|
125
|
+
|
|
126
|
+
Use this guard in workflow scripts:
|
|
127
|
+
|
|
128
|
+
```js
|
|
129
|
+
function requireAgentResult(value, label) {
|
|
130
|
+
if (value == null) {
|
|
131
|
+
throw new Error(`${label} failed; aborting dependent workflow phases`);
|
|
132
|
+
}
|
|
133
|
+
return value;
|
|
134
|
+
}
|
|
135
|
+
```
|
|
136
|
+
|
|
137
|
+
Optional fan-out may filter failures with `.filter(Boolean)`. Required handoffs,
|
|
138
|
+
module implementations, verifiers, and final synthesis should use the guard.
|
|
139
|
+
|
|
140
|
+
## Greenfield Builds
|
|
141
|
+
|
|
142
|
+
When a workflow is building a whole app, game, service, library, or system from
|
|
143
|
+
scratch, the discovery/design phase should include contract-first prep before
|
|
144
|
+
implementation fan-out. Read and apply:
|
|
145
|
+
|
|
146
|
+
```text
|
|
147
|
+
~/.claude/skills/contract-first-prep/SKILL.md
|
|
148
|
+
~/.claude/skills/contract-first-prep/references/contract-guide.md
|
|
149
|
+
```
|
|
150
|
+
|
|
151
|
+
The prep worker should produce the parallel-safe handoff:
|
|
152
|
+
|
|
153
|
+
- minimal scaffold with a strict typecheck/build gate
|
|
154
|
+
- immutable type/interface contract for cross-module boundaries
|
|
155
|
+
- pure-data config plus shared low-level helpers
|
|
156
|
+
- disjoint module decomposition (`plan.json`) and per-module specs
|
|
157
|
+
|
|
158
|
+
The later implementation/review workflow treats that contract, config, and shared
|
|
159
|
+
helpers as frozen. Workers adapt their modules to the contract; only the
|
|
160
|
+
integrator performs broad wiring. If review finds a design flaw, launch a new
|
|
161
|
+
discovery/design workflow instead of letting implementers redesign in parallel.
|
|
162
|
+
|
|
163
|
+
Skip contract-first prep for small edits, single-file scripts, or established
|
|
164
|
+
codebases that already have their own architecture.
|
|
165
|
+
|
|
25
166
|
## Running a workflow
|
|
26
167
|
|
|
27
168
|
You must be **inside an ftown session** — `FTOWN_SESSION_ID` must be set.
|
|
@@ -35,6 +176,12 @@ You must be **inside an ftown session** — `FTOWN_SESSION_ID` must be set.
|
|
|
35
176
|
> workers become **siblings** of the orchestrator rather than its children. Results
|
|
36
177
|
> are file-based, so this does not affect correctness — only the dashboard topology.
|
|
37
178
|
|
|
179
|
+
Child/subagent sessions **can run workflows**. Do not refuse just because
|
|
180
|
+
`FTOWN_PARENT_SESSION_ID` is set or because the current session was spawned by
|
|
181
|
+
another agent. The only hard requirement is `FTOWN_SESSION_ID` plus a reachable
|
|
182
|
+
bridge. The parent/child caveat above is about dashboard topology, not
|
|
183
|
+
capability: results still flow through files under `~/.ftown/workflows/<run-id>/`.
|
|
184
|
+
|
|
38
185
|
Full options:
|
|
39
186
|
|
|
40
187
|
```bash
|
|
@@ -101,7 +248,7 @@ Key options:
|
|
|
101
248
|
|---|---|---|
|
|
102
249
|
| `label` | `step-<n>` | step key used for the result file and resume |
|
|
103
250
|
| `phase` | — | progress grouping shown in logs |
|
|
104
|
-
| `schema` | — | JSON Schema;
|
|
251
|
+
| `schema` | — | JSON Schema embedded in the worker prompt; requests JSON result |
|
|
105
252
|
| `shell` | run-level default | `claude` / `cursor` / `codex` / `opencode` / `shell` |
|
|
106
253
|
| `model` | — | model override passed to the session |
|
|
107
254
|
| `workdir` | run-level default | working directory for the child session |
|
|
@@ -173,83 +320,133 @@ once the result is read, or on timeout/exit.
|
|
|
173
320
|
**You do not write this file yourself** — the child agent is instructed to do it.
|
|
174
321
|
The prompt injected by the engine tells the child agent exactly what to write.
|
|
175
322
|
|
|
176
|
-
## Patterns
|
|
323
|
+
## Script Patterns
|
|
177
324
|
|
|
178
|
-
|
|
325
|
+
Scripts are plain JavaScript modules. Avoid nondeterministic labels or control
|
|
326
|
+
flow (`Date.now()`, `Math.random()`, timestamp-derived labels) when you care
|
|
327
|
+
about resume, because cached results are matched by step order and label.
|
|
179
328
|
|
|
180
|
-
|
|
181
|
-
export default async function (ctx) {
|
|
182
|
-
const items = ctx.args.items; // e.g. ["auth.ts", "api.ts", "db.ts"]
|
|
329
|
+
### Canonical Pipeline
|
|
183
330
|
|
|
184
|
-
|
|
185
|
-
|
|
186
|
-
items.map(f => () => ctx.agent(`Review ${f} for security issues`, {
|
|
187
|
-
label: `review-${f}`,
|
|
188
|
-
}))
|
|
189
|
-
);
|
|
331
|
+
Each item moves through review and verification independently. One file can be
|
|
332
|
+
verifying while another is still reviewing.
|
|
190
333
|
|
|
191
|
-
|
|
192
|
-
|
|
193
|
-
|
|
194
|
-
|
|
334
|
+
```js
|
|
335
|
+
export default async function (ctx) {
|
|
336
|
+
const results = await ctx.pipeline(
|
|
337
|
+
ctx.args.files,
|
|
338
|
+
(file) => ctx.agent(`Find bugs in ${file}`, {
|
|
339
|
+
label: `find-${file}`,
|
|
340
|
+
phase: 'Find',
|
|
341
|
+
schema: FINDINGS_SCHEMA,
|
|
342
|
+
}),
|
|
343
|
+
(review, file) => ctx.parallel(
|
|
344
|
+
(review?.findings ?? []).map((finding, i) => () => ctx.agent(
|
|
345
|
+
`Try to refute this finding:\n${JSON.stringify(finding)}`,
|
|
346
|
+
{ label: `verify-${file}-${i}`, phase: 'Verify', schema: VERDICT_SCHEMA },
|
|
347
|
+
).then((verdict) => ({ file, finding, verdict })))
|
|
348
|
+
),
|
|
195
349
|
);
|
|
196
350
|
|
|
197
|
-
return
|
|
351
|
+
return results.flat().filter(Boolean).filter((r) => r.verdict?.isReal);
|
|
198
352
|
}
|
|
199
353
|
```
|
|
200
354
|
|
|
201
|
-
###
|
|
355
|
+
### Correct Barrier
|
|
356
|
+
|
|
357
|
+
Use a barrier when deduplication or comparison needs every prior result.
|
|
202
358
|
|
|
203
359
|
```js
|
|
204
360
|
export default async function (ctx) {
|
|
205
|
-
|
|
206
|
-
|
|
207
|
-
(
|
|
208
|
-
|
|
209
|
-
|
|
361
|
+
ctx.phase('Find');
|
|
362
|
+
const reviews = await ctx.parallel(
|
|
363
|
+
ctx.args.dimensions.map((d) => () => ctx.agent(d.prompt, {
|
|
364
|
+
label: `find-${d.key}`,
|
|
365
|
+
schema: FINDINGS_SCHEMA,
|
|
366
|
+
})),
|
|
210
367
|
);
|
|
368
|
+
|
|
369
|
+
const allFindings = reviews.filter(Boolean).flatMap((r) => r.findings ?? []);
|
|
370
|
+
const deduped = dedupeByFileAndTitle(allFindings);
|
|
371
|
+
if (deduped.length === 0) return { confirmed: [] };
|
|
372
|
+
|
|
373
|
+
ctx.phase('Verify');
|
|
374
|
+
const verified = await ctx.parallel(
|
|
375
|
+
deduped.map((finding, i) => () => ctx.agent(
|
|
376
|
+
`Verify this deduped finding:\n${JSON.stringify(finding)}`,
|
|
377
|
+
{ label: `verify-${i}`, schema: VERDICT_SCHEMA },
|
|
378
|
+
).then((verdict) => ({ finding, verdict }))),
|
|
379
|
+
);
|
|
380
|
+
|
|
381
|
+
return { confirmed: verified.filter(Boolean).filter((r) => r.verdict?.isReal) };
|
|
211
382
|
}
|
|
212
383
|
```
|
|
213
384
|
|
|
214
|
-
###
|
|
385
|
+
### Loop Until Dry
|
|
386
|
+
|
|
387
|
+
Use this for unknown-size searches. Dedup against everything seen, including
|
|
388
|
+
rejected findings, so the loop converges.
|
|
215
389
|
|
|
216
390
|
```js
|
|
217
391
|
export default async function (ctx) {
|
|
218
|
-
const
|
|
219
|
-
const
|
|
220
|
-
|
|
221
|
-
|
|
222
|
-
|
|
223
|
-
|
|
224
|
-
|
|
225
|
-
|
|
226
|
-
|
|
227
|
-
|
|
228
|
-
|
|
229
|
-
|
|
392
|
+
const seen = new Set();
|
|
393
|
+
const confirmed = [];
|
|
394
|
+
let dryRounds = 0;
|
|
395
|
+
|
|
396
|
+
while (dryRounds < 2 && ctx.budget.remaining() > 0) {
|
|
397
|
+
ctx.phase(`Find round ${dryRounds + 1}`);
|
|
398
|
+
const found = await ctx.agent('Find more bugs not already covered.', {
|
|
399
|
+
label: `find-round-${confirmed.length}-${dryRounds}`,
|
|
400
|
+
schema: FINDINGS_SCHEMA,
|
|
401
|
+
});
|
|
402
|
+
|
|
403
|
+
const fresh = (found?.findings ?? []).filter((finding) => {
|
|
404
|
+
const key = `${finding.file}:${finding.title}`;
|
|
405
|
+
if (seen.has(key)) return false;
|
|
406
|
+
seen.add(key);
|
|
407
|
+
return true;
|
|
408
|
+
});
|
|
409
|
+
|
|
410
|
+
if (fresh.length === 0) {
|
|
411
|
+
dryRounds += 1;
|
|
412
|
+
ctx.log(`dry round ${dryRounds}/2`);
|
|
413
|
+
continue;
|
|
414
|
+
}
|
|
415
|
+
dryRounds = 0;
|
|
416
|
+
|
|
417
|
+
const judged = await ctx.parallel(
|
|
418
|
+
fresh.map((finding, i) => () => ctx.agent(
|
|
419
|
+
`Try to refute this finding:\n${JSON.stringify(finding)}`,
|
|
420
|
+
{ label: `judge-${seen.size}-${i}`, phase: 'Verify', schema: VERDICT_SCHEMA },
|
|
421
|
+
).then((verdict) => ({ finding, verdict }))),
|
|
422
|
+
);
|
|
423
|
+
confirmed.push(...judged.filter(Boolean).filter((r) => r.verdict?.isReal));
|
|
424
|
+
}
|
|
230
425
|
|
|
231
|
-
|
|
232
|
-
return { claim, verdict: yes > REVIEWERS / 2 ? 'accepted' : 'rejected', votes: verdicts };
|
|
426
|
+
return { confirmed };
|
|
233
427
|
}
|
|
234
428
|
```
|
|
235
429
|
|
|
236
|
-
###
|
|
430
|
+
### Budget-Bounded Depth
|
|
431
|
+
|
|
432
|
+
Guard loops with a real cap. With no `--max-agents`, `ctx.budget.remaining()` is
|
|
433
|
+
`Infinity`, so add an explicit round limit or require a max-agent budget.
|
|
237
434
|
|
|
238
435
|
```js
|
|
239
436
|
export default async function (ctx) {
|
|
240
|
-
|
|
241
|
-
const
|
|
242
|
-
|
|
243
|
-
|
|
244
|
-
ctx.
|
|
245
|
-
|
|
246
|
-
|
|
247
|
-
|
|
248
|
-
);
|
|
249
|
-
|
|
437
|
+
const rounds = ctx.budget.maxAgents == null ? 3 : ctx.budget.maxAgents;
|
|
438
|
+
const results = [];
|
|
439
|
+
|
|
440
|
+
for (let i = 0; i < rounds && ctx.budget.remaining() > 0; i += 1) {
|
|
441
|
+
const result = await ctx.agent(`Research angle ${i + 1}`, {
|
|
442
|
+
label: `research-${i + 1}`,
|
|
443
|
+
schema: RESEARCH_SCHEMA,
|
|
444
|
+
});
|
|
445
|
+
if (result) results.push(result);
|
|
446
|
+
ctx.log(`${i + 1}/${rounds} research rounds complete`);
|
|
250
447
|
}
|
|
251
448
|
|
|
252
|
-
return
|
|
449
|
+
return results;
|
|
253
450
|
}
|
|
254
451
|
```
|
|
255
452
|
|
|
@@ -1,5 +1,5 @@
|
|
|
1
1
|
/**
|
|
2
|
-
* example.flow.mjs
|
|
2
|
+
* example.flow.mjs - template workflow: review files, verify each finding, synthesize.
|
|
3
3
|
*
|
|
4
4
|
* Run it inside an ftown session:
|
|
5
5
|
*
|
|
@@ -9,114 +9,162 @@
|
|
|
9
9
|
*
|
|
10
10
|
* Add --run-id <previous-id> to resume a partial run without re-running
|
|
11
11
|
* steps whose result files already exist.
|
|
12
|
-
*
|
|
13
|
-
* The script exports a default async function that receives a WorkflowContext.
|
|
14
|
-
* The engine wires FTOWN_SESSION_ID from the calling session so children are
|
|
15
|
-
* registered as its children and are cleaned up on completion.
|
|
16
12
|
*/
|
|
17
13
|
|
|
14
|
+
const FINDINGS_SCHEMA = {
|
|
15
|
+
type: 'object',
|
|
16
|
+
required: ['findings'],
|
|
17
|
+
properties: {
|
|
18
|
+
findings: {
|
|
19
|
+
type: 'array',
|
|
20
|
+
items: {
|
|
21
|
+
type: 'object',
|
|
22
|
+
required: ['title', 'file', 'severity', 'evidence'],
|
|
23
|
+
properties: {
|
|
24
|
+
title: { type: 'string' },
|
|
25
|
+
file: { type: 'string' },
|
|
26
|
+
severity: { type: 'string' },
|
|
27
|
+
evidence: { type: 'string' },
|
|
28
|
+
recommendation: { type: 'string' },
|
|
29
|
+
},
|
|
30
|
+
},
|
|
31
|
+
},
|
|
32
|
+
},
|
|
33
|
+
};
|
|
34
|
+
|
|
35
|
+
const VERDICT_SCHEMA = {
|
|
36
|
+
type: 'object',
|
|
37
|
+
required: ['isReal', 'reason'],
|
|
38
|
+
properties: {
|
|
39
|
+
isReal: { type: 'boolean' },
|
|
40
|
+
reason: { type: 'string' },
|
|
41
|
+
},
|
|
42
|
+
};
|
|
43
|
+
|
|
44
|
+
const REPORT_SCHEMA = {
|
|
45
|
+
type: 'object',
|
|
46
|
+
required: ['summary', 'confirmed', 'rejected'],
|
|
47
|
+
properties: {
|
|
48
|
+
summary: { type: 'string' },
|
|
49
|
+
confirmed: { type: 'array', items: { type: 'string' } },
|
|
50
|
+
rejected: { type: 'array', items: { type: 'string' } },
|
|
51
|
+
},
|
|
52
|
+
};
|
|
53
|
+
|
|
54
|
+
function requireAgentResult(value, label) {
|
|
55
|
+
if (value == null) {
|
|
56
|
+
throw new Error(`${label} failed; aborting dependent workflow phases`);
|
|
57
|
+
}
|
|
58
|
+
return value;
|
|
59
|
+
}
|
|
60
|
+
|
|
61
|
+
function stepKey(value) {
|
|
62
|
+
return String(value).replace(/[^a-z0-9._-]+/gi, '-').replace(/^-+|-+$/g, '') || 'item';
|
|
63
|
+
}
|
|
64
|
+
|
|
65
|
+
function asFindings(value) {
|
|
66
|
+
return value && Array.isArray(value.findings) ? value.findings : [];
|
|
67
|
+
}
|
|
68
|
+
|
|
18
69
|
/**
|
|
19
70
|
* @param {import('../../../src/workflow-runner.js').WorkflowContext} ctx
|
|
20
71
|
*/
|
|
21
72
|
export default async function (ctx) {
|
|
22
|
-
|
|
23
|
-
|
|
24
|
-
|
|
25
|
-
const files = /** @type {string[]} */ (
|
|
26
|
-
Array.isArray(ctx.args?.files)
|
|
27
|
-
? ctx.args.files
|
|
28
|
-
: ['src/auth.ts', 'src/api.ts', 'src/db.ts']
|
|
29
|
-
);
|
|
73
|
+
const files = Array.isArray(ctx.args?.files)
|
|
74
|
+
? ctx.args.files
|
|
75
|
+
: ['src/auth.ts', 'src/api.ts', 'src/db.ts'];
|
|
30
76
|
|
|
31
77
|
ctx.phase('Setup');
|
|
32
78
|
ctx.log(`Reviewing ${files.length} file(s): ${files.join(', ')}`);
|
|
33
79
|
ctx.log(`Budget: ${ctx.budget.maxAgents ?? 'unlimited'} agents`);
|
|
34
80
|
|
|
35
|
-
|
|
36
|
-
|
|
37
|
-
|
|
38
|
-
|
|
39
|
-
|
|
40
|
-
|
|
41
|
-
ctx.phase('Review');
|
|
42
|
-
|
|
43
|
-
const reviews = await ctx.parallel(
|
|
44
|
-
files.map((file) => async () => {
|
|
45
|
-
// Each thunk is an async function returning a string (or null on failure).
|
|
46
|
-
const result = await ctx.agent(
|
|
47
|
-
// The prompt is the full task description for this child session.
|
|
48
|
-
// Keep it self-contained — the child has no other context.
|
|
49
|
-
`You are a code reviewer. Review the file \`${file}\` for:
|
|
50
|
-
- Security vulnerabilities (auth bypass, injection, secret leakage)
|
|
51
|
-
- Correctness bugs (off-by-one, null dereference, missing error handling)
|
|
52
|
-
- Style issues that reduce readability
|
|
53
|
-
|
|
54
|
-
Reply with a concise bullet-point list. Start with "## ${file}".`,
|
|
81
|
+
const reviewed = await ctx.pipeline(
|
|
82
|
+
files,
|
|
83
|
+
async (file) => {
|
|
84
|
+
const review = await ctx.agent(
|
|
85
|
+
`Review ${file} for correctness, security, and maintainability bugs.
|
|
86
|
+
Return only concrete findings with evidence. Do not include style preferences.`,
|
|
55
87
|
{
|
|
56
|
-
|
|
57
|
-
|
|
58
|
-
|
|
59
|
-
// phase groups events in the log output.
|
|
60
|
-
phase: 'review',
|
|
61
|
-
// shell defaults to 'claude'; override here if needed.
|
|
62
|
-
// shell: 'claude',
|
|
88
|
+
label: `review-${stepKey(file)}`,
|
|
89
|
+
phase: 'Review',
|
|
90
|
+
schema: FINDINGS_SCHEMA,
|
|
63
91
|
},
|
|
64
92
|
);
|
|
65
93
|
|
|
66
|
-
|
|
67
|
-
|
|
94
|
+
return {
|
|
95
|
+
file,
|
|
96
|
+
findings: asFindings(requireAgentResult(review, `review ${file}`)),
|
|
97
|
+
};
|
|
98
|
+
},
|
|
99
|
+
async (review) => {
|
|
100
|
+
if (review.findings.length === 0) {
|
|
101
|
+
ctx.log(`No findings reported for ${review.file}`);
|
|
102
|
+
return { file: review.file, confirmed: [], rejected: [] };
|
|
103
|
+
}
|
|
104
|
+
|
|
105
|
+
const verified = await ctx.parallel(
|
|
106
|
+
review.findings.map((finding, index) => async () => {
|
|
107
|
+
const verdict = await ctx.agent(
|
|
108
|
+
`Try to refute this finding. Default to isReal=false if the evidence is weak,
|
|
109
|
+
not reproducible, or not actually caused by the code.
|
|
110
|
+
|
|
111
|
+
Finding:
|
|
112
|
+
${JSON.stringify(finding, null, 2)}`,
|
|
113
|
+
{
|
|
114
|
+
label: `verify-${stepKey(review.file)}-${index}`,
|
|
115
|
+
phase: 'Verify',
|
|
116
|
+
schema: VERDICT_SCHEMA,
|
|
117
|
+
},
|
|
118
|
+
);
|
|
119
|
+
|
|
120
|
+
return {
|
|
121
|
+
finding,
|
|
122
|
+
verdict: requireAgentResult(verdict, `verify ${review.file} #${index + 1}`),
|
|
123
|
+
};
|
|
124
|
+
}),
|
|
125
|
+
);
|
|
126
|
+
|
|
127
|
+
const kept = [];
|
|
128
|
+
const rejected = [];
|
|
129
|
+
for (const item of verified.filter(Boolean)) {
|
|
130
|
+
if (item.verdict.isReal === true) kept.push(item.finding);
|
|
131
|
+
else rejected.push({ finding: item.finding, reason: item.verdict.reason });
|
|
68
132
|
}
|
|
69
|
-
return result;
|
|
70
|
-
}),
|
|
71
|
-
);
|
|
72
133
|
|
|
73
|
-
|
|
74
|
-
|
|
75
|
-
/** @param {string | null} r */ (r) => r != null,
|
|
134
|
+
return { file: review.file, confirmed: kept, rejected };
|
|
135
|
+
},
|
|
76
136
|
);
|
|
77
137
|
|
|
78
|
-
|
|
79
|
-
|
|
80
|
-
|
|
81
|
-
}
|
|
138
|
+
const completed = reviewed.filter(Boolean);
|
|
139
|
+
const confirmed = completed.flatMap((entry) => entry.confirmed);
|
|
140
|
+
const rejected = completed.flatMap((entry) => entry.rejected);
|
|
82
141
|
|
|
83
|
-
ctx.
|
|
142
|
+
ctx.phase('Synthesis');
|
|
143
|
+
if (confirmed.length === 0) {
|
|
144
|
+
ctx.log('No confirmed findings survived verification');
|
|
145
|
+
return {
|
|
146
|
+
summary: 'No confirmed findings survived adversarial verification.',
|
|
147
|
+
confirmed: [],
|
|
148
|
+
rejected: rejected.map((entry) => entry.finding.title),
|
|
149
|
+
};
|
|
150
|
+
}
|
|
84
151
|
|
|
85
|
-
|
|
86
|
-
|
|
87
|
-
|
|
152
|
+
const report = await ctx.agent(
|
|
153
|
+
`Write a concise final code-review report from these verified findings.
|
|
154
|
+
Group by severity and include evidence. Mention rejected findings only if useful.
|
|
88
155
|
|
|
89
|
-
|
|
90
|
-
|
|
91
|
-
Below are ${successfulReviews.length} individual file reviews.
|
|
92
|
-
Consolidate them into a single report with:
|
|
93
|
-
1. An executive summary (2-3 sentences).
|
|
94
|
-
2. Critical issues (must fix before merge).
|
|
95
|
-
3. Minor issues (nice to fix).
|
|
96
|
-
4. Positive observations.
|
|
156
|
+
Confirmed:
|
|
157
|
+
${JSON.stringify(confirmed, null, 2)}
|
|
97
158
|
|
|
98
|
-
|
|
99
|
-
${
|
|
159
|
+
Rejected:
|
|
160
|
+
${JSON.stringify(rejected, null, 2)}`,
|
|
100
161
|
{
|
|
101
162
|
label: 'synthesis',
|
|
102
|
-
phase: '
|
|
103
|
-
|
|
104
|
-
// When schema is set, agent() returns the parsed object (or null).
|
|
105
|
-
// Comment it out to get a plain string instead.
|
|
106
|
-
schema: {
|
|
107
|
-
type: 'object',
|
|
108
|
-
required: ['summary', 'critical', 'minor', 'positives'],
|
|
109
|
-
properties: {
|
|
110
|
-
summary: { type: 'string' },
|
|
111
|
-
critical: { type: 'array', items: { type: 'string' } },
|
|
112
|
-
minor: { type: 'array', items: { type: 'string' } },
|
|
113
|
-
positives: { type: 'array', items: { type: 'string' } },
|
|
114
|
-
},
|
|
115
|
-
},
|
|
163
|
+
phase: 'Synthesis',
|
|
164
|
+
schema: REPORT_SCHEMA,
|
|
116
165
|
},
|
|
117
166
|
);
|
|
118
167
|
|
|
119
|
-
// ── 5. Return value is printed by the CLI (pretty by default, --json for raw) ─
|
|
120
168
|
ctx.log(`Done. Budget used: ${ctx.budget.spent()} agent spawn(s).`);
|
|
121
|
-
return synthesis;
|
|
169
|
+
return requireAgentResult(report, 'synthesis');
|
|
122
170
|
}
|