executant 1.19.0 → 1.21.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +54 -8
- package/dist/index.js +203 -436
- package/package.json +1 -1
- package/dist/prompts/retrospective-analysis.txt +0 -304
package/README.md
CHANGED
|
@@ -115,11 +115,43 @@ steps:
|
|
|
115
115
|
This is pass {{item}} of 5. Review src/runner.ts for untested edge cases.
|
|
116
116
|
```
|
|
117
117
|
|
|
118
|
+
## Variables at Runtime
|
|
119
|
+
|
|
120
|
+
Pass `--var KEY=VALUE` on the command line to override or supply workflow vars without editing the YAML:
|
|
121
|
+
|
|
122
|
+
```bash
|
|
123
|
+
executant --var env=staging --var region=eu-west-1 deploy.yaml
|
|
124
|
+
```
|
|
125
|
+
|
|
126
|
+
CLI vars override any same-named vars in the workflow's `vars:` section. Multiple `--var` flags are accepted.
|
|
127
|
+
|
|
118
128
|
## Quality Controls
|
|
119
129
|
|
|
120
130
|
- **`llm_as_judge: true`** — after a step completes, Claude evaluates the output; retries with feedback on FAIL, up to 5×
|
|
121
131
|
- **`self_healing: true`** — on script failure, Claude diagnoses and repairs the command, then re-runs it, up to 5×
|
|
122
|
-
- **`
|
|
132
|
+
- **`timeout_seconds: N`** — kill the step after N seconds and fail with exit code 3. Works for both script and prompt steps.
|
|
133
|
+
|
|
134
|
+
```yaml
|
|
135
|
+
steps:
|
|
136
|
+
- name: install
|
|
137
|
+
command: npm ci
|
|
138
|
+
timeout_seconds: 120 # fail if install takes longer than 2 min
|
|
139
|
+
|
|
140
|
+
- name: implement
|
|
141
|
+
prompt: Implement the feature described above.
|
|
142
|
+
timeout_seconds: 1800 # 30 min ceiling for the Claude step
|
|
143
|
+
```
|
|
144
|
+
|
|
145
|
+
## Cancellation
|
|
146
|
+
|
|
147
|
+
Write a `.executant-cancel` file in the **same directory as the workflow YAML** to stop the workflow cleanly **between steps**:
|
|
148
|
+
|
|
149
|
+
```bash
|
|
150
|
+
executant long-workflow.yaml &
|
|
151
|
+
touch .executant-cancel # workflow stops at the next step boundary; exits 4
|
|
152
|
+
```
|
|
153
|
+
|
|
154
|
+
The file is deleted automatically. This is a cooperative, process-safe alternative to SIGTERM — no mid-step git state corruption. The cancel file is always resolved relative to the workflow file, so the location is predictable regardless of which directory you invoked executant from.
|
|
123
155
|
|
|
124
156
|
## Interjection
|
|
125
157
|
|
|
@@ -148,21 +180,35 @@ press i → ▷ don't delete that file, use git revert▌ esc to cancel
|
|
|
148
180
|
| `logging-demo.yaml` | Log steps, self-healing, judge |
|
|
149
181
|
| `git-status-summary.yaml` | Real-world git workflow |
|
|
150
182
|
| `repeat-demo.yaml` | Running a step N times with `repeat` |
|
|
183
|
+
| `file-demo.yaml` | File operations |
|
|
184
|
+
| `from-step-test.yaml` | Using `--from-step` to resume mid-workflow |
|
|
151
185
|
|
|
152
186
|
See the [`examples/`](examples/) directory.
|
|
153
187
|
|
|
154
188
|
## CLI
|
|
155
189
|
|
|
156
190
|
```bash
|
|
157
|
-
executant plan "description"
|
|
158
|
-
executant plan -q "description"
|
|
159
|
-
executant workflow.yaml
|
|
160
|
-
executant
|
|
161
|
-
executant --
|
|
162
|
-
executant --
|
|
163
|
-
executant
|
|
191
|
+
executant plan "description" # generate a workflow YAML (auto-detects fast path)
|
|
192
|
+
executant plan -q "description" # skip research pass (fast path)
|
|
193
|
+
executant refine workflow.yaml "instructions" # refine an existing workflow YAML
|
|
194
|
+
executant workflow.yaml # run a workflow
|
|
195
|
+
executant --ci workflow.yaml # headless, NDJSON to stdout
|
|
196
|
+
executant --step <name|n> wf.yaml # run one step by name or index
|
|
197
|
+
executant --from-step <n> wf.yaml # resume from step n
|
|
198
|
+
executant --var KEY=VALUE wf.yaml # override a workflow var at runtime
|
|
199
|
+
executant update # upgrade to latest version
|
|
164
200
|
```
|
|
165
201
|
|
|
202
|
+
### Exit codes
|
|
203
|
+
|
|
204
|
+
| Code | Meaning |
|
|
205
|
+
|------|---------|
|
|
206
|
+
| `0` | All steps completed successfully |
|
|
207
|
+
| `1` | A step failed at runtime |
|
|
208
|
+
| `2` | YAML or variable validation error |
|
|
209
|
+
| `3` | A step timed out (`timeout_seconds` exceeded) |
|
|
210
|
+
| `4` | Cancelled via `.executant-cancel` file |
|
|
211
|
+
|
|
166
212
|
## Development
|
|
167
213
|
|
|
168
214
|
```bash
|
package/dist/index.js
CHANGED
|
@@ -52,8 +52,8 @@ var init_update = __esm({
|
|
|
52
52
|
// src/index.ts
|
|
53
53
|
import React3 from "react";
|
|
54
54
|
import { render } from "ink";
|
|
55
|
-
import { readFileSync as
|
|
56
|
-
import { dirname as
|
|
55
|
+
import { mkdirSync as mkdirSync4, readFileSync as readFileSync6 } from "node:fs";
|
|
56
|
+
import { dirname as dirname4, join as join5, resolve as resolve3 } from "node:path";
|
|
57
57
|
import { fileURLToPath as fileURLToPath2 } from "node:url";
|
|
58
58
|
|
|
59
59
|
// src/load-workflow.ts
|
|
@@ -154,16 +154,16 @@ var RawStepSchema = z.lazy(
|
|
|
154
154
|
forEach: z.union([z.array(z.string()), z.string()]).optional(),
|
|
155
155
|
repeat: z.number().int().positive().optional(),
|
|
156
156
|
context: z.array(z.string()).optional(),
|
|
157
|
-
steps: z.array(RawStepSchema).min(1).optional()
|
|
157
|
+
steps: z.array(RawStepSchema).min(1).optional(),
|
|
158
|
+
timeout_seconds: z.number().positive().optional()
|
|
158
159
|
})
|
|
159
160
|
);
|
|
160
161
|
var RawWorkflowSchema = z.object({
|
|
161
162
|
goal: z.string(),
|
|
162
163
|
steps: z.array(RawStepSchema),
|
|
163
|
-
vars: z.record(z.string(), z.string()).optional()
|
|
164
|
-
self_improve: z.boolean().optional()
|
|
164
|
+
vars: z.record(z.string(), z.string()).optional()
|
|
165
165
|
});
|
|
166
|
-
function loadWorkflow(filePath2) {
|
|
166
|
+
function loadWorkflow(filePath2, cliVars2 = {}) {
|
|
167
167
|
let raw;
|
|
168
168
|
try {
|
|
169
169
|
raw = readFileSync2(filePath2, "utf8");
|
|
@@ -180,7 +180,7 @@ function loadWorkflow(filePath2) {
|
|
|
180
180
|
throw new Error(`Invalid workflow file "${filePath2}":
|
|
181
181
|
${detail}`);
|
|
182
182
|
}
|
|
183
|
-
const vars = doc.vars ?? {};
|
|
183
|
+
const vars = { ...doc.vars ?? {}, ...cliVars2 };
|
|
184
184
|
const seen = /* @__PURE__ */ new Set();
|
|
185
185
|
for (const step of doc.steps) {
|
|
186
186
|
if (seen.has(step.name)) {
|
|
@@ -193,7 +193,6 @@ ${detail}`);
|
|
|
193
193
|
return {
|
|
194
194
|
goal: doc.goal,
|
|
195
195
|
vars,
|
|
196
|
-
selfImprove: doc.self_improve,
|
|
197
196
|
tasks: doc.steps.map((step) => convertStep(step, vars))
|
|
198
197
|
};
|
|
199
198
|
}
|
|
@@ -245,6 +244,9 @@ function convertInnerStep(step, vars, name, continueOnError) {
|
|
|
245
244
|
maxHealingAttempts: step.max_healing_attempts,
|
|
246
245
|
...step.output && {
|
|
247
246
|
output: resolveOutputFile(step.output, vars, name)
|
|
247
|
+
},
|
|
248
|
+
...step.timeout_seconds !== void 0 && {
|
|
249
|
+
timeoutSeconds: step.timeout_seconds
|
|
248
250
|
}
|
|
249
251
|
};
|
|
250
252
|
}
|
|
@@ -269,7 +271,10 @@ function convertInnerStep(step, vars, name, continueOnError) {
|
|
|
269
271
|
llmAsJudge: step.llm_as_judge,
|
|
270
272
|
allowedTools: step.allowed_tools,
|
|
271
273
|
model: "sonnet",
|
|
272
|
-
...contextFiles.length > 0 && { contextFiles }
|
|
274
|
+
...contextFiles.length > 0 && { contextFiles },
|
|
275
|
+
...step.timeout_seconds !== void 0 && {
|
|
276
|
+
timeoutSeconds: step.timeout_seconds
|
|
277
|
+
}
|
|
273
278
|
};
|
|
274
279
|
}
|
|
275
280
|
default:
|
|
@@ -311,14 +316,42 @@ function substituteVars(text, vars, stepName, field) {
|
|
|
311
316
|
|
|
312
317
|
// src/runner.ts
|
|
313
318
|
import { exec } from "node:child_process";
|
|
314
|
-
import {
|
|
315
|
-
|
|
319
|
+
import {
|
|
320
|
+
existsSync,
|
|
321
|
+
mkdirSync,
|
|
322
|
+
readFileSync as readFileSync3,
|
|
323
|
+
unlinkSync,
|
|
324
|
+
writeFileSync
|
|
325
|
+
} from "node:fs";
|
|
326
|
+
import { dirname as dirname2, join as join2 } from "node:path";
|
|
316
327
|
import { promisify } from "node:util";
|
|
317
328
|
import { z as z2 } from "zod";
|
|
318
329
|
|
|
319
330
|
// src/tasks/command.ts
|
|
320
331
|
import { spawn } from "node:child_process";
|
|
321
332
|
|
|
333
|
+
// src/types.ts
|
|
334
|
+
var InterjectChannel = class {
|
|
335
|
+
_queue = [];
|
|
336
|
+
/** Called by the TUI when the user submits an interjection message. */
|
|
337
|
+
interject(message) {
|
|
338
|
+
this._queue.push(message);
|
|
339
|
+
}
|
|
340
|
+
/** Drains and returns any queued messages (for non-Claude steps to consume). */
|
|
341
|
+
consumeQueue() {
|
|
342
|
+
const q = this._queue.slice();
|
|
343
|
+
this._queue = [];
|
|
344
|
+
return q;
|
|
345
|
+
}
|
|
346
|
+
};
|
|
347
|
+
var TimeoutError = class extends Error {
|
|
348
|
+
exitCode = 3;
|
|
349
|
+
constructor(stepName, seconds) {
|
|
350
|
+
super(`Step "${stepName}" timed out after ${seconds}s`);
|
|
351
|
+
this.name = "TimeoutError";
|
|
352
|
+
}
|
|
353
|
+
};
|
|
354
|
+
|
|
322
355
|
// src/tasks/stream.ts
|
|
323
356
|
var AsyncQueue = class {
|
|
324
357
|
buf = [];
|
|
@@ -376,6 +409,25 @@ function waitForExit(proc) {
|
|
|
376
409
|
proc.on("error", reject);
|
|
377
410
|
});
|
|
378
411
|
}
|
|
412
|
+
function startTimeout(proc, taskName, timeoutSeconds) {
|
|
413
|
+
if (timeoutSeconds == null) return { check: () => {
|
|
414
|
+
}, cancel: () => {
|
|
415
|
+
} };
|
|
416
|
+
let timedOut = false;
|
|
417
|
+
const timer = setTimeout(() => {
|
|
418
|
+
timedOut = true;
|
|
419
|
+
try {
|
|
420
|
+
proc.kill();
|
|
421
|
+
} catch {
|
|
422
|
+
}
|
|
423
|
+
}, timeoutSeconds * 1e3);
|
|
424
|
+
return {
|
|
425
|
+
check: () => {
|
|
426
|
+
if (timedOut) throw new TimeoutError(taskName, timeoutSeconds);
|
|
427
|
+
},
|
|
428
|
+
cancel: () => clearTimeout(timer)
|
|
429
|
+
};
|
|
430
|
+
}
|
|
379
431
|
|
|
380
432
|
// src/tasks/command.ts
|
|
381
433
|
var CommandError = class extends Error {
|
|
@@ -393,12 +445,22 @@ async function* runCommand(task) {
|
|
|
393
445
|
const proc = spawn("bash", ["-c", task.command], {
|
|
394
446
|
stdio: ["ignore", "pipe", "pipe"]
|
|
395
447
|
});
|
|
396
|
-
|
|
397
|
-
|
|
398
|
-
|
|
399
|
-
|
|
400
|
-
|
|
401
|
-
|
|
448
|
+
const timeout = startTimeout(proc, task.name, task.timeoutSeconds);
|
|
449
|
+
try {
|
|
450
|
+
for await (const line of mergeStreamsToLines(proc.stdout, proc.stderr)) {
|
|
451
|
+
yield { type: "output:text", index: -1, text: line };
|
|
452
|
+
}
|
|
453
|
+
const code = await waitForExit(proc);
|
|
454
|
+
timeout.check();
|
|
455
|
+
if (code !== 0) {
|
|
456
|
+
throw new CommandError(
|
|
457
|
+
code,
|
|
458
|
+
task.command,
|
|
459
|
+
`Command "${task.name}" exited with code ${code}`
|
|
460
|
+
);
|
|
461
|
+
}
|
|
462
|
+
} finally {
|
|
463
|
+
timeout.cancel();
|
|
402
464
|
}
|
|
403
465
|
}
|
|
404
466
|
|
|
@@ -460,6 +522,7 @@ async function* runClaude(task) {
|
|
|
460
522
|
};
|
|
461
523
|
process.once("SIGTERM", cleanup);
|
|
462
524
|
process.once("SIGHUP", cleanup);
|
|
525
|
+
const timeout = startTimeout(proc, task.name, task.timeoutSeconds);
|
|
463
526
|
const plainLines = [];
|
|
464
527
|
try {
|
|
465
528
|
for await (const line of mergeStreamsToLines(proc.stdout, proc.stderr)) {
|
|
@@ -476,8 +539,10 @@ async function* runClaude(task) {
|
|
|
476
539
|
}
|
|
477
540
|
}
|
|
478
541
|
const code = await waitForExit(proc);
|
|
542
|
+
timeout.check();
|
|
479
543
|
if (code !== 0) throw buildExitError(code, plainLines);
|
|
480
544
|
} finally {
|
|
545
|
+
timeout.cancel();
|
|
481
546
|
process.off("SIGTERM", cleanup);
|
|
482
547
|
process.off("SIGHUP", cleanup);
|
|
483
548
|
}
|
|
@@ -562,10 +627,28 @@ function shouldSkipStep(stepNumber, name, options2) {
|
|
|
562
627
|
}
|
|
563
628
|
return options2.fromStep !== void 0 && stepNumber < options2.fromStep[0];
|
|
564
629
|
}
|
|
630
|
+
var LAST_OUTPUT_MAX_LINES = 100;
|
|
565
631
|
async function* runWorkflow(workflow2, options2 = {}, channel2) {
|
|
566
632
|
const workflowStart = Date.now();
|
|
633
|
+
const cancelFile = join2(
|
|
634
|
+
options2.workDir ?? process.cwd(),
|
|
635
|
+
".executant-cancel"
|
|
636
|
+
);
|
|
567
637
|
yield { type: "workflow:start", workflow: workflow2 };
|
|
638
|
+
let lastStepOutput;
|
|
568
639
|
for (const [i, task] of workflow2.tasks.entries()) {
|
|
640
|
+
if (existsSync(cancelFile)) {
|
|
641
|
+
try {
|
|
642
|
+
unlinkSync(cancelFile);
|
|
643
|
+
} catch {
|
|
644
|
+
}
|
|
645
|
+
yield {
|
|
646
|
+
type: "workflow:cancelled",
|
|
647
|
+
workflow: workflow2,
|
|
648
|
+
durationMs: Date.now() - workflowStart
|
|
649
|
+
};
|
|
650
|
+
return;
|
|
651
|
+
}
|
|
569
652
|
const stepNumber = i + 1;
|
|
570
653
|
if (shouldSkipStep(stepNumber, task.name, options2)) {
|
|
571
654
|
yield { type: "step:skip", index: i, name: task.name };
|
|
@@ -574,14 +657,20 @@ async function* runWorkflow(workflow2, options2 = {}, channel2) {
|
|
|
574
657
|
const stepStart = Date.now();
|
|
575
658
|
yield { type: "step:start", index: i, name: task.name };
|
|
576
659
|
const from = options2.fromStep && options2.fromStep[0] === stepNumber ? options2.fromStep.slice(1) : void 0;
|
|
660
|
+
const lines = [];
|
|
577
661
|
try {
|
|
578
662
|
for await (const event of runStep(task, from, channel2)) {
|
|
579
663
|
if (event.type === "step:iteration" || event.type === "step:inner" || event.type === "output:text" || event.type === "output:tool") {
|
|
664
|
+
if (event.type === "output:text") {
|
|
665
|
+
if (lines.length >= LAST_OUTPUT_MAX_LINES) lines.shift();
|
|
666
|
+
lines.push(event.text);
|
|
667
|
+
}
|
|
580
668
|
yield { ...event, index: i };
|
|
581
669
|
} else {
|
|
582
670
|
yield event;
|
|
583
671
|
}
|
|
584
672
|
}
|
|
673
|
+
lastStepOutput = lines.join("\n") || void 0;
|
|
585
674
|
yield {
|
|
586
675
|
type: "step:complete",
|
|
587
676
|
index: i,
|
|
@@ -590,14 +679,23 @@ async function* runWorkflow(workflow2, options2 = {}, channel2) {
|
|
|
590
679
|
};
|
|
591
680
|
} catch (err) {
|
|
592
681
|
const error = normalizeError(err);
|
|
593
|
-
|
|
682
|
+
const lastOutput = lines.join("\n") || void 0;
|
|
683
|
+
lastStepOutput = lastOutput;
|
|
684
|
+
yield {
|
|
685
|
+
type: "step:error",
|
|
686
|
+
index: i,
|
|
687
|
+
name: task.name,
|
|
688
|
+
error,
|
|
689
|
+
lastOutput
|
|
690
|
+
};
|
|
594
691
|
if (!task.continueOnError) throw error;
|
|
595
692
|
}
|
|
596
693
|
}
|
|
597
694
|
yield {
|
|
598
695
|
type: "workflow:complete",
|
|
599
696
|
workflow: workflow2,
|
|
600
|
-
durationMs: Date.now() - workflowStart
|
|
697
|
+
durationMs: Date.now() - workflowStart,
|
|
698
|
+
lastOutput: lastStepOutput
|
|
601
699
|
};
|
|
602
700
|
}
|
|
603
701
|
async function* runStep(task, from, channel2) {
|
|
@@ -1060,6 +1158,7 @@ function reducer(state, event) {
|
|
|
1060
1158
|
case "workflow:start":
|
|
1061
1159
|
return { ...state, startTime: Date.now() };
|
|
1062
1160
|
case "workflow:complete":
|
|
1161
|
+
case "workflow:cancelled":
|
|
1063
1162
|
return { ...state, endTime: Date.now() };
|
|
1064
1163
|
case "step:start":
|
|
1065
1164
|
return updateTask(state, event.index, {
|
|
@@ -1456,10 +1555,15 @@ function App({
|
|
|
1456
1555
|
if (event.type === "workflow:complete") {
|
|
1457
1556
|
setTimeout(() => exit(), EXIT_DELAY_MS);
|
|
1458
1557
|
}
|
|
1558
|
+
if (event.type === "workflow:cancelled") {
|
|
1559
|
+
process.exitCode = 4;
|
|
1560
|
+
setTimeout(() => exit(), EXIT_DELAY_MS);
|
|
1561
|
+
}
|
|
1459
1562
|
}
|
|
1460
1563
|
} catch (err) {
|
|
1461
1564
|
if (!active) return;
|
|
1462
1565
|
dispatch({ type: "log", level: "error", text: getErrorMessage(err) });
|
|
1566
|
+
process.exitCode = err instanceof TimeoutError ? 3 : 1;
|
|
1463
1567
|
setTimeout(
|
|
1464
1568
|
() => exit(err instanceof Error ? err : new Error(getErrorMessage(err))),
|
|
1465
1569
|
EXIT_DELAY_MS
|
|
@@ -1603,8 +1707,8 @@ function App({
|
|
|
1603
1707
|
}
|
|
1604
1708
|
|
|
1605
1709
|
// src/plan.ts
|
|
1606
|
-
import { existsSync, mkdirSync as mkdirSync2, readFileSync as readFileSync4, writeFileSync as writeFileSync2 } from "node:fs";
|
|
1607
|
-
import { join as
|
|
1710
|
+
import { existsSync as existsSync2, mkdirSync as mkdirSync2, readFileSync as readFileSync4, writeFileSync as writeFileSync2 } from "node:fs";
|
|
1711
|
+
import { join as join3, resolve } from "node:path";
|
|
1608
1712
|
import { dump as dumpYaml } from "js-yaml";
|
|
1609
1713
|
import { z as z3 } from "zod";
|
|
1610
1714
|
import { zodToJsonSchema as zodToJsonSchema2 } from "zod-to-json-schema";
|
|
@@ -1620,8 +1724,7 @@ var TOTAL_PLAN_STAGES = 3;
|
|
|
1620
1724
|
var WorkflowSchema = z3.object({
|
|
1621
1725
|
goal: z3.string(),
|
|
1622
1726
|
steps: z3.array(RawStepSchema).min(1),
|
|
1623
|
-
vars: z3.record(z3.string()).optional()
|
|
1624
|
-
self_improve: z3.boolean().optional()
|
|
1727
|
+
vars: z3.record(z3.string()).optional()
|
|
1625
1728
|
});
|
|
1626
1729
|
var PlanJudgeOutputSchema = z3.object({
|
|
1627
1730
|
pass: z3.boolean(),
|
|
@@ -1633,7 +1736,7 @@ function walkUp(startDir, check) {
|
|
|
1633
1736
|
while (true) {
|
|
1634
1737
|
const found = check(dir);
|
|
1635
1738
|
if (found !== null) return found;
|
|
1636
|
-
const parent =
|
|
1739
|
+
const parent = join3(dir, "..");
|
|
1637
1740
|
if (resolve(parent) === resolve(dir)) return null;
|
|
1638
1741
|
dir = parent;
|
|
1639
1742
|
}
|
|
@@ -1641,13 +1744,13 @@ function walkUp(startDir, check) {
|
|
|
1641
1744
|
function findGitRoot(startDir) {
|
|
1642
1745
|
return walkUp(
|
|
1643
1746
|
startDir,
|
|
1644
|
-
(dir) =>
|
|
1747
|
+
(dir) => existsSync2(join3(dir, ".git")) ? dir : null
|
|
1645
1748
|
);
|
|
1646
1749
|
}
|
|
1647
1750
|
function findProjectRoot(startDir) {
|
|
1648
1751
|
return walkUp(startDir, (dir) => {
|
|
1649
|
-
const candidate =
|
|
1650
|
-
return
|
|
1752
|
+
const candidate = join3(dir, ".claude", "executant.local", "tasks");
|
|
1753
|
+
return existsSync2(candidate) ? candidate : null;
|
|
1651
1754
|
});
|
|
1652
1755
|
}
|
|
1653
1756
|
function isSimpleRequest(description) {
|
|
@@ -1687,7 +1790,7 @@ Examples:
|
|
|
1687
1790
|
console.error("Error: -f/--file requires a file path argument");
|
|
1688
1791
|
process.exit(1);
|
|
1689
1792
|
}
|
|
1690
|
-
if (!
|
|
1793
|
+
if (!existsSync2(filePath2)) {
|
|
1691
1794
|
console.error(`Error: File not found: ${filePath2}`);
|
|
1692
1795
|
process.exit(1);
|
|
1693
1796
|
}
|
|
@@ -1715,14 +1818,14 @@ Examples:
|
|
|
1715
1818
|
let taskDir = findProjectRoot(process.cwd());
|
|
1716
1819
|
if (!taskDir) {
|
|
1717
1820
|
const base = findGitRoot(process.cwd()) ?? process.cwd();
|
|
1718
|
-
taskDir =
|
|
1821
|
+
taskDir = join3(base, ".claude", "executant.local", "tasks");
|
|
1719
1822
|
mkdirSync2(taskDir, { recursive: true });
|
|
1720
1823
|
}
|
|
1721
|
-
const todoDir =
|
|
1824
|
+
const todoDir = join3(taskDir, "todo");
|
|
1722
1825
|
mkdirSync2(todoDir, { recursive: true });
|
|
1723
1826
|
const slug = slugify(description);
|
|
1724
1827
|
const ts = timestamp();
|
|
1725
|
-
const taskFile =
|
|
1828
|
+
const taskFile = join3(todoDir, `${ts}-${slug}.yaml`);
|
|
1726
1829
|
return { description, taskFile, todoDir, fast };
|
|
1727
1830
|
}
|
|
1728
1831
|
async function runPass3Judge(description, workflow2) {
|
|
@@ -2040,7 +2143,7 @@ ${PLAN_SYSTEM_RULES}`,
|
|
|
2040
2143
|
}
|
|
2041
2144
|
|
|
2042
2145
|
// src/refine.ts
|
|
2043
|
-
import { existsSync as
|
|
2146
|
+
import { existsSync as existsSync3, readFileSync as readFileSync5 } from "node:fs";
|
|
2044
2147
|
import { load as loadYaml } from "js-yaml";
|
|
2045
2148
|
var PLAN_REFINE_PROMPT = loadPrompt("plan-refine");
|
|
2046
2149
|
var PLAN_SYSTEM_RULES2 = loadPrompt("plan-system-rules");
|
|
@@ -2067,7 +2170,7 @@ Examples:
|
|
|
2067
2170
|
console.error("Usage: executant refine <task-file> [INSTRUCTIONS]");
|
|
2068
2171
|
process.exit(1);
|
|
2069
2172
|
}
|
|
2070
|
-
if (!
|
|
2173
|
+
if (!existsSync3(taskFile)) {
|
|
2071
2174
|
console.error(`Error: File not found: ${taskFile}`);
|
|
2072
2175
|
process.exit(1);
|
|
2073
2176
|
}
|
|
@@ -2092,7 +2195,7 @@ Examples:
|
|
|
2092
2195
|
console.error("Error: -f/--file requires a file path argument");
|
|
2093
2196
|
process.exit(1);
|
|
2094
2197
|
}
|
|
2095
|
-
if (!
|
|
2198
|
+
if (!existsSync3(filePath2)) {
|
|
2096
2199
|
console.error(`Error: File not found: ${filePath2}`);
|
|
2097
2200
|
process.exit(1);
|
|
2098
2201
|
}
|
|
@@ -2328,19 +2431,13 @@ function PlanApp({ description, events: events2 }) {
|
|
|
2328
2431
|
}
|
|
2329
2432
|
|
|
2330
2433
|
// src/logger.ts
|
|
2331
|
-
import {
|
|
2332
|
-
|
|
2333
|
-
existsSync as existsSync3,
|
|
2334
|
-
mkdirSync as mkdirSync3,
|
|
2335
|
-
readdirSync,
|
|
2336
|
-
writeFileSync as writeFileSync3
|
|
2337
|
-
} from "node:fs";
|
|
2338
|
-
import { dirname as dirname3, join as join3, resolve as resolve2 } from "node:path";
|
|
2434
|
+
import { appendFileSync, existsSync as existsSync4, mkdirSync as mkdirSync3, writeFileSync as writeFileSync3 } from "node:fs";
|
|
2435
|
+
import { dirname as dirname3, join as join4, resolve as resolve2 } from "node:path";
|
|
2339
2436
|
function findExecutantLocalDir(startDir) {
|
|
2340
2437
|
let dir = resolve2(startDir);
|
|
2341
2438
|
while (true) {
|
|
2342
|
-
const candidate =
|
|
2343
|
-
if (
|
|
2439
|
+
const candidate = join4(dir, ".claude", "executant.local");
|
|
2440
|
+
if (existsSync4(candidate)) return candidate;
|
|
2344
2441
|
const parent = dirname3(dir);
|
|
2345
2442
|
if (parent === dir) return null;
|
|
2346
2443
|
dir = parent;
|
|
@@ -2349,29 +2446,20 @@ function findExecutantLocalDir(startDir) {
|
|
|
2349
2446
|
function resolveLogDir(workflowFilePath) {
|
|
2350
2447
|
const startDir = dirname3(resolve2(workflowFilePath));
|
|
2351
2448
|
const executantLocal = findExecutantLocalDir(startDir);
|
|
2352
|
-
return executantLocal ?
|
|
2449
|
+
return executantLocal ? join4(executantLocal, "logs") : join4(startDir, "logs");
|
|
2353
2450
|
}
|
|
2354
2451
|
var INIT_STATE = {
|
|
2355
2452
|
logFile: "",
|
|
2356
2453
|
stepIndex: -1,
|
|
2357
2454
|
stepName: "",
|
|
2358
|
-
stepStartMs: 0
|
|
2359
|
-
toolCount: 0,
|
|
2360
|
-
complexSequenceFile: "",
|
|
2361
|
-
selfHealingFile: "",
|
|
2362
|
-
judgeAttempt: 0,
|
|
2363
|
-
recentOutput: []
|
|
2455
|
+
stepStartMs: 0
|
|
2364
2456
|
};
|
|
2365
2457
|
function appendLog(logFile, text) {
|
|
2366
2458
|
if (logFile) appendFileSync(logFile, text + "\n");
|
|
2367
2459
|
}
|
|
2368
|
-
function highlightPath(ctx, stepIndex, suffix) {
|
|
2369
|
-
return join3(ctx.highlightsDir, `${ctx.ts}_step${stepIndex + 1}_${suffix}.md`);
|
|
2370
|
-
}
|
|
2371
2460
|
function onWorkflowStart(ctx, s) {
|
|
2372
2461
|
mkdirSync3(ctx.logDir, { recursive: true });
|
|
2373
|
-
|
|
2374
|
-
const logFile = join3(ctx.logDir, `${ctx.ts}_${ctx.slug}.log`);
|
|
2462
|
+
const logFile = join4(ctx.logDir, `${ctx.ts}_${ctx.slug}.log`);
|
|
2375
2463
|
writeFileSync3(
|
|
2376
2464
|
logFile,
|
|
2377
2465
|
`# Execution Log
|
|
@@ -2402,20 +2490,6 @@ ${"\u2501".repeat(51)}
|
|
|
2402
2490
|
);
|
|
2403
2491
|
return next;
|
|
2404
2492
|
}
|
|
2405
|
-
function finalizeComplexSequence(s) {
|
|
2406
|
-
if (s.toolCount >= 3 && s.complexSequenceFile) {
|
|
2407
|
-
appendFileSync(
|
|
2408
|
-
s.complexSequenceFile,
|
|
2409
|
-
`
|
|
2410
|
-
---
|
|
2411
|
-
|
|
2412
|
-
*Total tools used: ${s.toolCount}*
|
|
2413
|
-
|
|
2414
|
-
*Captured by Executant Logger*
|
|
2415
|
-
`
|
|
2416
|
-
);
|
|
2417
|
-
}
|
|
2418
|
-
}
|
|
2419
2493
|
function onStepComplete(s) {
|
|
2420
2494
|
appendLog(
|
|
2421
2495
|
s.logFile,
|
|
@@ -2423,131 +2497,21 @@ function onStepComplete(s) {
|
|
|
2423
2497
|
Step completed in ${((Date.now() - s.stepStartMs) / 1e3).toFixed(1)}s
|
|
2424
2498
|
`
|
|
2425
2499
|
);
|
|
2426
|
-
finalizeComplexSequence(s);
|
|
2427
2500
|
return s;
|
|
2428
2501
|
}
|
|
2429
2502
|
function onStepError(s, error) {
|
|
2430
2503
|
appendLog(s.logFile, `
|
|
2431
2504
|
Step failed: ${error.message}
|
|
2432
2505
|
`);
|
|
2433
|
-
finalizeComplexSequence(s);
|
|
2434
2506
|
return s;
|
|
2435
2507
|
}
|
|
2436
|
-
function
|
|
2437
|
-
|
|
2438
|
-
|
|
2439
|
-
"",
|
|
2440
|
-
`**Task:** ${ctx.slug}`,
|
|
2441
|
-
`**Step:** ${s.stepName}`,
|
|
2442
|
-
...extra,
|
|
2443
|
-
`**Timestamp:** ${(/* @__PURE__ */ new Date()).toISOString()}`,
|
|
2444
|
-
"",
|
|
2445
|
-
"---",
|
|
2446
|
-
""
|
|
2447
|
-
].join("\n") + "\n";
|
|
2448
|
-
}
|
|
2449
|
-
function complexSequenceHeader(ctx, s) {
|
|
2450
|
-
return buildHighlightHeader(ctx, s, "Complex Tool Sequence") + "## Claude's Tool Orchestration\n\nClaude used multiple tools to complete this step:\n\n";
|
|
2451
|
-
}
|
|
2452
|
-
function createComplexSequenceFile(ctx, s) {
|
|
2453
|
-
const path = highlightPath(ctx, s.stepIndex, "complex_sequence");
|
|
2454
|
-
writeFileSync3(path, complexSequenceHeader(ctx, s));
|
|
2455
|
-
return path;
|
|
2456
|
-
}
|
|
2457
|
-
function onTool(ctx, s, tool, input) {
|
|
2458
|
-
const desc = getToolArg(tool, input);
|
|
2459
|
-
appendLog(s.logFile, ` [${tool}] ${desc}`);
|
|
2460
|
-
const toolCount = s.toolCount + 1;
|
|
2461
|
-
const complexSequenceFile = toolCount === 3 ? createComplexSequenceFile(ctx, s) : s.complexSequenceFile;
|
|
2462
|
-
if (toolCount >= 3 && complexSequenceFile) {
|
|
2463
|
-
appendFileSync(
|
|
2464
|
-
complexSequenceFile,
|
|
2465
|
-
`${toolCount}. **${tool}** - ${desc}
|
|
2466
|
-
`
|
|
2467
|
-
);
|
|
2468
|
-
}
|
|
2469
|
-
return { ...s, toolCount, complexSequenceFile };
|
|
2470
|
-
}
|
|
2471
|
-
function saveJudgeHighlight(ctx, s, verdict, text) {
|
|
2472
|
-
writeFileSync3(
|
|
2473
|
-
highlightPath(ctx, s.stepIndex, `judge_${verdict}`),
|
|
2474
|
-
buildHighlightHeader(ctx, s, `Judge Verdict: ${verdict}`, [
|
|
2475
|
-
`**Attempt:** ${s.judgeAttempt}`
|
|
2476
|
-
]) + [text, "", "---", "", "*Auto-captured*", ""].join("\n")
|
|
2477
|
-
);
|
|
2508
|
+
function onTool(s, tool, input) {
|
|
2509
|
+
appendLog(s.logFile, ` [${tool}] ${getToolArg(tool, input)}`);
|
|
2510
|
+
return s;
|
|
2478
2511
|
}
|
|
2479
|
-
|
|
2480
|
-
{
|
|
2481
|
-
pattern: /\[judge\]\s+(PASS|FAIL)/i,
|
|
2482
|
-
apply: (ctx, s, text, match) => {
|
|
2483
|
-
const verdict = match[1].toUpperCase();
|
|
2484
|
-
const judgeAttempt = s.judgeAttempt + 1;
|
|
2485
|
-
saveJudgeHighlight(ctx, { ...s, judgeAttempt }, verdict, text);
|
|
2486
|
-
return { ...s, judgeAttempt };
|
|
2487
|
-
}
|
|
2488
|
-
},
|
|
2489
|
-
{
|
|
2490
|
-
pattern: /\[self-healing\].*failed.*exit\s+(\d+)/i,
|
|
2491
|
-
apply: (ctx, s, _text, match) => {
|
|
2492
|
-
const selfHealingFile = highlightPath(ctx, s.stepIndex, "self_healing");
|
|
2493
|
-
writeFileSync3(
|
|
2494
|
-
selfHealingFile,
|
|
2495
|
-
buildHighlightHeader(ctx, s, "Self-Healing Activation") + [
|
|
2496
|
-
"## \u274C Failure Detected",
|
|
2497
|
-
"",
|
|
2498
|
-
`**Exit Code:** ${match[1]}`,
|
|
2499
|
-
"",
|
|
2500
|
-
"**Recent Output:**",
|
|
2501
|
-
"```",
|
|
2502
|
-
s.recentOutput.join("\n"),
|
|
2503
|
-
"```",
|
|
2504
|
-
"",
|
|
2505
|
-
"---",
|
|
2506
|
-
"",
|
|
2507
|
-
"## \u{1F527} Claude's Healing Process",
|
|
2508
|
-
""
|
|
2509
|
-
].join("\n")
|
|
2510
|
-
);
|
|
2511
|
-
return { ...s, selfHealingFile, recentOutput: [] };
|
|
2512
|
-
}
|
|
2513
|
-
},
|
|
2514
|
-
{
|
|
2515
|
-
pattern: /\[self-healing\].*Re-running/i,
|
|
2516
|
-
apply: (_ctx, s) => {
|
|
2517
|
-
if (!s.selfHealingFile) return s;
|
|
2518
|
-
appendFileSync(
|
|
2519
|
-
s.selfHealingFile,
|
|
2520
|
-
[
|
|
2521
|
-
"",
|
|
2522
|
-
"*(See full log for Claude's diagnostic process)*",
|
|
2523
|
-
"",
|
|
2524
|
-
"---",
|
|
2525
|
-
"",
|
|
2526
|
-
"## \u2705 Resolution Applied",
|
|
2527
|
-
"",
|
|
2528
|
-
"The self-healing process completed. Check the full execution log to see Claude's analysis and fix.",
|
|
2529
|
-
"",
|
|
2530
|
-
"---",
|
|
2531
|
-
"",
|
|
2532
|
-
"*Auto-captured*",
|
|
2533
|
-
""
|
|
2534
|
-
].join("\n")
|
|
2535
|
-
);
|
|
2536
|
-
return { ...s, selfHealingFile: "" };
|
|
2537
|
-
}
|
|
2538
|
-
}
|
|
2539
|
-
];
|
|
2540
|
-
function onLogMessage(ctx, s, level, text) {
|
|
2512
|
+
function onLogMessage(s, level, text) {
|
|
2541
2513
|
appendLog(s.logFile, `[${level}] ${text}`);
|
|
2542
|
-
|
|
2543
|
-
for (const { pattern, apply } of LOG_MATCHERS) {
|
|
2544
|
-
const m = pattern.exec(text);
|
|
2545
|
-
if (m) {
|
|
2546
|
-
state = apply(ctx, state, text, m);
|
|
2547
|
-
break;
|
|
2548
|
-
}
|
|
2549
|
-
}
|
|
2550
|
-
return state;
|
|
2514
|
+
return s;
|
|
2551
2515
|
}
|
|
2552
2516
|
function onWorkflowComplete(ctx, s) {
|
|
2553
2517
|
appendLog(
|
|
@@ -2559,37 +2523,8 @@ Finished: ${(/* @__PURE__ */ new Date()).toISOString()}
|
|
|
2559
2523
|
${"\u2501".repeat(51)}
|
|
2560
2524
|
`
|
|
2561
2525
|
);
|
|
2562
|
-
const indexFile = join3(ctx.highlightsDir, "README.md");
|
|
2563
|
-
if (!existsSync3(indexFile)) {
|
|
2564
|
-
writeFileSync3(
|
|
2565
|
-
indexFile,
|
|
2566
|
-
[
|
|
2567
|
-
"# Execution Highlights",
|
|
2568
|
-
"",
|
|
2569
|
-
"This directory contains automatically extracted highlight moments from task executions.",
|
|
2570
|
-
"",
|
|
2571
|
-
"## Latest Highlights",
|
|
2572
|
-
""
|
|
2573
|
-
].join("\n")
|
|
2574
|
-
);
|
|
2575
|
-
}
|
|
2576
|
-
const highlights = readdirSync(ctx.highlightsDir).filter((f) => f.startsWith(ctx.ts) && f.endsWith(".md")).sort();
|
|
2577
|
-
if (highlights.length > 0) {
|
|
2578
|
-
const entries = highlights.map((f) => `- [${f.replace(/\.md$/, "")}](./${f})`).join("\n");
|
|
2579
|
-
appendFileSync(
|
|
2580
|
-
indexFile,
|
|
2581
|
-
`
|
|
2582
|
-
### ${ctx.slug} (${(/* @__PURE__ */ new Date()).toISOString()})
|
|
2583
|
-
${entries}
|
|
2584
|
-
`
|
|
2585
|
-
);
|
|
2586
|
-
}
|
|
2587
2526
|
return s;
|
|
2588
2527
|
}
|
|
2589
|
-
function onOutputText(s, text) {
|
|
2590
|
-
appendLog(s.logFile, text);
|
|
2591
|
-
return { ...s, recentOutput: [...s.recentOutput, text] };
|
|
2592
|
-
}
|
|
2593
2528
|
function reduce(ctx, s, event) {
|
|
2594
2529
|
switch (event.type) {
|
|
2595
2530
|
case "workflow:start":
|
|
@@ -2614,12 +2549,14 @@ function reduce(ctx, s, event) {
|
|
|
2614
2549
|
);
|
|
2615
2550
|
return s;
|
|
2616
2551
|
case "output:text":
|
|
2617
|
-
|
|
2552
|
+
appendLog(s.logFile, event.text);
|
|
2553
|
+
return s;
|
|
2618
2554
|
case "output:tool":
|
|
2619
|
-
return onTool(
|
|
2555
|
+
return onTool(s, event.tool, event.input);
|
|
2620
2556
|
case "log":
|
|
2621
|
-
return onLogMessage(
|
|
2557
|
+
return onLogMessage(s, event.level, event.text);
|
|
2622
2558
|
case "workflow:complete":
|
|
2559
|
+
case "workflow:cancelled":
|
|
2623
2560
|
return onWorkflowComplete(ctx, s);
|
|
2624
2561
|
default:
|
|
2625
2562
|
return s;
|
|
@@ -2628,15 +2565,12 @@ function reduce(ctx, s, event) {
|
|
|
2628
2565
|
function createLogger(logDir, taskName) {
|
|
2629
2566
|
const ctx = {
|
|
2630
2567
|
logDir,
|
|
2631
|
-
highlightsDir: join3(logDir, "highlights"),
|
|
2632
2568
|
ts: formatTimestamp(/* @__PURE__ */ new Date()),
|
|
2633
2569
|
slug: slugify(taskName, 40) || "task"
|
|
2634
2570
|
};
|
|
2635
2571
|
const enabled = process.env["EXECUTANT_LOG"] !== "0";
|
|
2636
2572
|
let state = INIT_STATE;
|
|
2637
2573
|
return {
|
|
2638
|
-
getHighlightsDir: () => ctx.highlightsDir,
|
|
2639
|
-
getTimestamp: () => ctx.ts,
|
|
2640
2574
|
observe(event) {
|
|
2641
2575
|
if (!enabled) return;
|
|
2642
2576
|
try {
|
|
@@ -2654,195 +2588,10 @@ async function* withLogger(gen, logger2) {
|
|
|
2654
2588
|
}
|
|
2655
2589
|
}
|
|
2656
2590
|
|
|
2657
|
-
// src/retrospective.ts
|
|
2658
|
-
import {
|
|
2659
|
-
existsSync as existsSync4,
|
|
2660
|
-
mkdirSync as mkdirSync4,
|
|
2661
|
-
readdirSync as readdirSync2,
|
|
2662
|
-
readFileSync as readFileSync6,
|
|
2663
|
-
writeFileSync as writeFileSync4
|
|
2664
|
-
} from "node:fs";
|
|
2665
|
-
import { basename as basename2, dirname as dirname4, join as join4, resolve as resolve3 } from "node:path";
|
|
2666
|
-
import { spawnSync } from "node:child_process";
|
|
2667
|
-
import { load as parseYaml2 } from "js-yaml";
|
|
2668
|
-
import { z as z4 } from "zod";
|
|
2669
|
-
var RetrospectiveOutputSchema = z4.object({
|
|
2670
|
-
improved_yaml: z4.string(),
|
|
2671
|
-
changelog: z4.string()
|
|
2672
|
-
});
|
|
2673
|
-
var RETROSPECTIVE_PROMPT = loadPrompt("retrospective-analysis");
|
|
2674
|
-
async function runRetrospective(workflowFilePath, workflow2, highlightsDir, runTimestamp) {
|
|
2675
|
-
try {
|
|
2676
|
-
await doRetrospective(
|
|
2677
|
-
workflowFilePath,
|
|
2678
|
-
workflow2,
|
|
2679
|
-
highlightsDir,
|
|
2680
|
-
runTimestamp
|
|
2681
|
-
);
|
|
2682
|
-
} catch (err) {
|
|
2683
|
-
console.warn(
|
|
2684
|
-
`
|
|
2685
|
-
Self-improvement: retrospective failed: ${getErrorMessage(err)}`
|
|
2686
|
-
);
|
|
2687
|
-
}
|
|
2688
|
-
}
|
|
2689
|
-
async function doRetrospective(workflowFilePath, workflow2, highlightsDir, runTimestamp) {
|
|
2690
|
-
if (!existsSync4(highlightsDir)) {
|
|
2691
|
-
console.log("\nSelf-improvement: no highlights directory found, skipping.");
|
|
2692
|
-
return;
|
|
2693
|
-
}
|
|
2694
|
-
const allFiles = readdirSync2(highlightsDir);
|
|
2695
|
-
const runHighlights = allFiles.filter((f) => f.startsWith(runTimestamp) && f.endsWith(".md")).sort();
|
|
2696
|
-
if (runHighlights.length === 0) {
|
|
2697
|
-
console.log(
|
|
2698
|
-
"\nSelf-improvement: no highlights for this run \u2014 task completed without issues, skipping."
|
|
2699
|
-
);
|
|
2700
|
-
return;
|
|
2701
|
-
}
|
|
2702
|
-
const divider = "\u2501".repeat(51);
|
|
2703
|
-
console.log(`
|
|
2704
|
-
${divider}`);
|
|
2705
|
-
console.log(
|
|
2706
|
-
"Self-Improvement: Analyzing execution and generating improvements..."
|
|
2707
|
-
);
|
|
2708
|
-
console.log(`${divider}
|
|
2709
|
-
`);
|
|
2710
|
-
console.log(`Found ${runHighlights.length} highlight(s) to analyze`);
|
|
2711
|
-
const countByPattern = (pat) => runHighlights.filter((f) => f.includes(pat)).length;
|
|
2712
|
-
const judgeFailures = countByPattern("_judge_FAIL");
|
|
2713
|
-
const selfHealingCount = countByPattern("_self_healing");
|
|
2714
|
-
const complexSequences = countByPattern("_complex_sequence");
|
|
2715
|
-
const metrics = [
|
|
2716
|
-
`- Judge Failures: ${judgeFailures}`,
|
|
2717
|
-
`- Self-Healing Activations: ${selfHealingCount}`,
|
|
2718
|
-
`- Complex Tool Sequences: ${complexSequences}`,
|
|
2719
|
-
`- Total Highlights: ${runHighlights.length}`
|
|
2720
|
-
].join("\n");
|
|
2721
|
-
console.log(`
|
|
2722
|
-
Execution Metrics:
|
|
2723
|
-
${metrics}
|
|
2724
|
-
`);
|
|
2725
|
-
console.log("Analyzing execution and generating improvements...\n");
|
|
2726
|
-
const highlightContents = runHighlights.map((f) => {
|
|
2727
|
-
const content = readFileSync6(join4(highlightsDir, f), "utf8");
|
|
2728
|
-
return `### ${f}
|
|
2729
|
-
|
|
2730
|
-
${content}`;
|
|
2731
|
-
}).join("\n\n---\n\n");
|
|
2732
|
-
const originalYaml = readFileSync6(workflowFilePath, "utf8");
|
|
2733
|
-
const taskName = basename2(workflowFilePath, ".yaml");
|
|
2734
|
-
const prompt = fillTemplate(RETROSPECTIVE_PROMPT, {
|
|
2735
|
-
TASK_NAME: taskName,
|
|
2736
|
-
ORIGINAL_GOAL: workflow2.goal,
|
|
2737
|
-
ORIGINAL_YAML: originalYaml,
|
|
2738
|
-
HIGHLIGHTS: highlightContents,
|
|
2739
|
-
METRICS: metrics
|
|
2740
|
-
});
|
|
2741
|
-
const result = spawnSync(
|
|
2742
|
-
"claude",
|
|
2743
|
-
[
|
|
2744
|
-
"-p",
|
|
2745
|
-
prompt,
|
|
2746
|
-
"--allowedTools",
|
|
2747
|
-
"Read",
|
|
2748
|
-
"--permission-mode",
|
|
2749
|
-
"bypassPermissions",
|
|
2750
|
-
"--output-format",
|
|
2751
|
-
"text"
|
|
2752
|
-
],
|
|
2753
|
-
{
|
|
2754
|
-
encoding: "utf8",
|
|
2755
|
-
maxBuffer: 10 * 1024 * 1024,
|
|
2756
|
-
stdio: ["ignore", "pipe", "pipe"]
|
|
2757
|
-
}
|
|
2758
|
-
);
|
|
2759
|
-
if (result.error) {
|
|
2760
|
-
console.warn(
|
|
2761
|
-
`Self-improvement: failed to run claude: ${result.error.message}`
|
|
2762
|
-
);
|
|
2763
|
-
return;
|
|
2764
|
-
}
|
|
2765
|
-
if (result.status !== 0) {
|
|
2766
|
-
const stderr = result.stderr ?? "";
|
|
2767
|
-
console.warn(
|
|
2768
|
-
`Self-improvement: claude exited with code ${result.status}${stderr ? ": " + stderr : ""}`
|
|
2769
|
-
);
|
|
2770
|
-
return;
|
|
2771
|
-
}
|
|
2772
|
-
const response = result.stdout ?? "";
|
|
2773
|
-
let parsed;
|
|
2774
|
-
try {
|
|
2775
|
-
parsed = JSON.parse(extractJson(response));
|
|
2776
|
-
} catch {
|
|
2777
|
-
console.warn(
|
|
2778
|
-
`Self-improvement: could not parse Claude response as JSON.
|
|
2779
|
-
Response: ${response.trim()}`
|
|
2780
|
-
);
|
|
2781
|
-
return;
|
|
2782
|
-
}
|
|
2783
|
-
const zodResult = RetrospectiveOutputSchema.safeParse(parsed);
|
|
2784
|
-
if (!zodResult.success) {
|
|
2785
|
-
console.warn(
|
|
2786
|
-
"Self-improvement: response schema mismatch \u2014 improved YAML not saved."
|
|
2787
|
-
);
|
|
2788
|
-
return;
|
|
2789
|
-
}
|
|
2790
|
-
const improvedYaml = zodResult.data.improved_yaml.trim();
|
|
2791
|
-
const changelog = zodResult.data.changelog.trim() || "No changelog generated.";
|
|
2792
|
-
try {
|
|
2793
|
-
parseYaml2(improvedYaml);
|
|
2794
|
-
} catch (err) {
|
|
2795
|
-
console.warn(
|
|
2796
|
-
`Self-improvement: generated YAML is invalid (${getErrorMessage(err)}), skipping save.`
|
|
2797
|
-
);
|
|
2798
|
-
return;
|
|
2799
|
-
}
|
|
2800
|
-
const startDir = dirname4(resolve3(workflowFilePath));
|
|
2801
|
-
const executantLocal = findExecutantLocalDir(startDir);
|
|
2802
|
-
const backlogDir = executantLocal ? join4(executantLocal, "tasks", "backlog") : join4(startDir, "..", "backlog");
|
|
2803
|
-
mkdirSync4(backlogDir, { recursive: true });
|
|
2804
|
-
const ts = formatTimestamp(/* @__PURE__ */ new Date());
|
|
2805
|
-
const slug = slugify(taskName, 40);
|
|
2806
|
-
const improvedFile = join4(backlogDir, `${ts}-${slug}-improved.yaml`);
|
|
2807
|
-
const changelogFile = join4(backlogDir, `${ts}-${slug}-changelog.md`);
|
|
2808
|
-
writeFileSync4(improvedFile, improvedYaml + "\n", "utf8");
|
|
2809
|
-
writeFileSync4(changelogFile, changelog + "\n", "utf8");
|
|
2810
|
-
console.log(`\u2705 Improved task saved: ${improvedFile}`);
|
|
2811
|
-
console.log(`\u2705 Changelog saved: ${changelogFile}`);
|
|
2812
|
-
console.log(`
|
|
2813
|
-
${divider}`);
|
|
2814
|
-
console.log("Improvement Summary");
|
|
2815
|
-
console.log(`${divider}
|
|
2816
|
-
`);
|
|
2817
|
-
console.log(changelog);
|
|
2818
|
-
}
|
|
2819
|
-
function extractJson(text) {
|
|
2820
|
-
const start = text.indexOf("{");
|
|
2821
|
-
const end = text.lastIndexOf("}");
|
|
2822
|
-
if (start === -1 || end === -1 || end <= start)
|
|
2823
|
-
throw new Error("no JSON object found in response");
|
|
2824
|
-
return text.slice(start, end + 1);
|
|
2825
|
-
}
|
|
2826
|
-
|
|
2827
|
-
// src/types.ts
|
|
2828
|
-
var InterjectChannel = class {
|
|
2829
|
-
_queue = [];
|
|
2830
|
-
/** Called by the TUI when the user submits an interjection message. */
|
|
2831
|
-
interject(message) {
|
|
2832
|
-
this._queue.push(message);
|
|
2833
|
-
}
|
|
2834
|
-
/** Drains and returns any queued messages (for non-Claude steps to consume). */
|
|
2835
|
-
consumeQueue() {
|
|
2836
|
-
const q = this._queue.slice();
|
|
2837
|
-
this._queue = [];
|
|
2838
|
-
return q;
|
|
2839
|
-
}
|
|
2840
|
-
};
|
|
2841
|
-
|
|
2842
2591
|
// src/index.ts
|
|
2843
2592
|
var CURRENT_VERSION = JSON.parse(
|
|
2844
|
-
|
|
2845
|
-
join5(
|
|
2593
|
+
readFileSync6(
|
|
2594
|
+
join5(dirname4(fileURLToPath2(import.meta.url)), "../package.json"),
|
|
2846
2595
|
"utf-8"
|
|
2847
2596
|
)
|
|
2848
2597
|
).version;
|
|
@@ -2901,6 +2650,7 @@ Options:
|
|
|
2901
2650
|
--ci Headless mode \u2014 print events as NDJSON, no TUI
|
|
2902
2651
|
--step <name|index> Run only the named step or step at 1-based index
|
|
2903
2652
|
--from-step <n> Resume from step n (e.g. 3, 3.2, 2.5.4.3 \u2014 1-based path)
|
|
2653
|
+
--var KEY=VALUE Override or supply a workflow var at runtime (repeatable)
|
|
2904
2654
|
--help, -h Show this help
|
|
2905
2655
|
|
|
2906
2656
|
Commands:
|
|
@@ -2942,6 +2692,18 @@ YAML \u2014 script step fields (type: script | command, or inferred when command
|
|
|
2942
2692
|
self_healing bool On failure, Claude diagnoses and fixes iteratively
|
|
2943
2693
|
up to 5 attempts with accumulated context (default: false)
|
|
2944
2694
|
max_healing_attempts int Override max self-healing retries (default: 5)
|
|
2695
|
+
timeout_seconds number Kill the step and fail with exit code 3 after N seconds
|
|
2696
|
+
|
|
2697
|
+
Cancellation:
|
|
2698
|
+
Write a .executant-cancel file in the working directory to stop execution
|
|
2699
|
+
cleanly between steps (exit code 4). The file is deleted automatically.
|
|
2700
|
+
|
|
2701
|
+
Exit codes:
|
|
2702
|
+
0 All steps completed successfully
|
|
2703
|
+
1 A step failed at runtime
|
|
2704
|
+
2 YAML or variable validation error
|
|
2705
|
+
3 A step timed out (timeout_seconds exceeded)
|
|
2706
|
+
4 Cancelled via .executant-cancel file
|
|
2945
2707
|
|
|
2946
2708
|
YAML \u2014 log step fields (type: log, or inferred when message is present and prompt is absent):
|
|
2947
2709
|
message string Text to emit as a progress marker
|
|
@@ -2967,6 +2729,7 @@ Example:
|
|
|
2967
2729
|
var ciMode = false;
|
|
2968
2730
|
var stepFilter;
|
|
2969
2731
|
var fromStep;
|
|
2732
|
+
var cliVars = {};
|
|
2970
2733
|
var positional = [];
|
|
2971
2734
|
for (let i = 0; i < rawArgs.length; i++) {
|
|
2972
2735
|
const a = rawArgs[i];
|
|
@@ -2992,6 +2755,18 @@ for (let i = 0; i < rawArgs.length; i++) {
|
|
|
2992
2755
|
process.exit(1);
|
|
2993
2756
|
}
|
|
2994
2757
|
fromStep = parts;
|
|
2758
|
+
} else if (a === "--var") {
|
|
2759
|
+
if (!rawArgs[i + 1]) {
|
|
2760
|
+
console.error("--var requires a KEY=VALUE argument");
|
|
2761
|
+
process.exit(1);
|
|
2762
|
+
}
|
|
2763
|
+
const pair = rawArgs[++i];
|
|
2764
|
+
const eq = pair.indexOf("=");
|
|
2765
|
+
if (eq <= 0) {
|
|
2766
|
+
console.error(`--var value must be KEY=VALUE, got: ${pair}`);
|
|
2767
|
+
process.exit(1);
|
|
2768
|
+
}
|
|
2769
|
+
cliVars[pair.slice(0, eq)] = pair.slice(eq + 1);
|
|
2995
2770
|
} else {
|
|
2996
2771
|
positional.push(a);
|
|
2997
2772
|
}
|
|
@@ -3003,12 +2778,21 @@ if (!filePath) {
|
|
|
3003
2778
|
}
|
|
3004
2779
|
var workflow;
|
|
3005
2780
|
try {
|
|
3006
|
-
workflow = loadWorkflow(filePath);
|
|
2781
|
+
workflow = loadWorkflow(filePath, cliVars);
|
|
3007
2782
|
} catch (err) {
|
|
3008
2783
|
console.error(getErrorMessage(err));
|
|
3009
|
-
process.exit(
|
|
2784
|
+
process.exit(2);
|
|
3010
2785
|
}
|
|
3011
|
-
var
|
|
2786
|
+
var localDir = findExecutantLocalDir(dirname4(resolve3(filePath)));
|
|
2787
|
+
if (localDir) {
|
|
2788
|
+
mkdirSync4(join5(localDir, "tasks", "todo"), { recursive: true });
|
|
2789
|
+
mkdirSync4(join5(localDir, "tasks", "done"), { recursive: true });
|
|
2790
|
+
}
|
|
2791
|
+
var options = {
|
|
2792
|
+
stepFilter,
|
|
2793
|
+
fromStep,
|
|
2794
|
+
workDir: dirname4(resolve3(filePath))
|
|
2795
|
+
};
|
|
3012
2796
|
var channel = new InterjectChannel();
|
|
3013
2797
|
var rawEvents = runWorkflow(workflow, options, channel);
|
|
3014
2798
|
var logger = createLogger(resolveLogDir(filePath), workflow.goal);
|
|
@@ -3020,36 +2804,23 @@ function errorReplacer(_key, value) {
|
|
|
3020
2804
|
}
|
|
3021
2805
|
return value;
|
|
3022
2806
|
}
|
|
3023
|
-
async function maybeRunRetrospective(filePath2, workflow2, logger2) {
|
|
3024
|
-
if (!logger2) return;
|
|
3025
|
-
try {
|
|
3026
|
-
await runRetrospective(
|
|
3027
|
-
filePath2,
|
|
3028
|
-
workflow2,
|
|
3029
|
-
logger2.getHighlightsDir(),
|
|
3030
|
-
logger2.getTimestamp()
|
|
3031
|
-
);
|
|
3032
|
-
} catch (err) {
|
|
3033
|
-
console.warn(
|
|
3034
|
-
"[executant] retrospective failed (non-fatal):",
|
|
3035
|
-
getErrorMessage(err)
|
|
3036
|
-
);
|
|
3037
|
-
}
|
|
3038
|
-
}
|
|
3039
2807
|
if (ciMode) {
|
|
3040
2808
|
(async () => {
|
|
3041
2809
|
for await (const event of events) {
|
|
3042
|
-
|
|
3043
|
-
|
|
3044
|
-
|
|
3045
|
-
|
|
2810
|
+
const line = JSON.stringify(event, errorReplacer) + "\n";
|
|
2811
|
+
if (event.type === "workflow:cancelled") {
|
|
2812
|
+
process.stdout.write(line, () => process.exit(4));
|
|
2813
|
+
return;
|
|
2814
|
+
}
|
|
2815
|
+
process.stdout.write(line);
|
|
3046
2816
|
}
|
|
3047
2817
|
})().catch((err) => {
|
|
2818
|
+
const code = err instanceof TimeoutError ? 3 : 1;
|
|
3048
2819
|
console.error(err);
|
|
3049
|
-
process.exit(
|
|
2820
|
+
process.exit(code);
|
|
3050
2821
|
});
|
|
3051
2822
|
} else {
|
|
3052
|
-
|
|
2823
|
+
render(
|
|
3053
2824
|
React3.createElement(App, {
|
|
3054
2825
|
workflow,
|
|
3055
2826
|
events,
|
|
@@ -3058,8 +2829,4 @@ if (ciMode) {
|
|
|
3058
2829
|
interjectChannel: channel
|
|
3059
2830
|
})
|
|
3060
2831
|
);
|
|
3061
|
-
if (workflow.selfImprove) {
|
|
3062
|
-
inkApp.waitUntilExit().then(() => maybeRunRetrospective(filePath, workflow, logger)).catch(() => {
|
|
3063
|
-
});
|
|
3064
|
-
}
|
|
3065
2832
|
}
|
package/package.json
CHANGED
|
@@ -1,304 +0,0 @@
|
|
|
1
|
-
# ============================================================================
|
|
2
|
-
# RETROSPECTIVE ANALYSIS PROMPT
|
|
3
|
-
# ============================================================================
|
|
4
|
-
# Purpose: Analyzes task execution highlights and generates improved task YAML
|
|
5
|
-
# Used by: src/retrospective.ts runRetrospective()
|
|
6
|
-
# Triggered when: A task completes with self_improve: true and has highlights
|
|
7
|
-
#
|
|
8
|
-
# Placeholders:
|
|
9
|
-
# {{TASK_NAME}} - Name of the task that was executed
|
|
10
|
-
# {{ORIGINAL_GOAL}} - The original goal statement (must be preserved)
|
|
11
|
-
# {{ORIGINAL_YAML}} - Complete original task YAML for reference
|
|
12
|
-
# {{HIGHLIGHTS}} - Aggregated highlight markdown files from execution
|
|
13
|
-
# {{METRICS}} - Execution metrics summary (failures, retries, etc.)
|
|
14
|
-
# ============================================================================
|
|
15
|
-
|
|
16
|
-
You are analyzing the execution of an Executant task to identify improvement opportunities.
|
|
17
|
-
|
|
18
|
-
# Task Information
|
|
19
|
-
|
|
20
|
-
**Task Name:** {{TASK_NAME}}
|
|
21
|
-
|
|
22
|
-
**Original Goal:** {{ORIGINAL_GOAL}}
|
|
23
|
-
|
|
24
|
-
# Execution Metrics
|
|
25
|
-
|
|
26
|
-
{{METRICS}}
|
|
27
|
-
|
|
28
|
-
# Execution Highlights
|
|
29
|
-
|
|
30
|
-
The following highlights were captured during execution. Each highlight represents a moment where the system encountered challenges:
|
|
31
|
-
|
|
32
|
-
{{HIGHLIGHTS}}
|
|
33
|
-
|
|
34
|
-
# Original Task YAML
|
|
35
|
-
|
|
36
|
-
```yaml
|
|
37
|
-
{{ORIGINAL_YAML}}
|
|
38
|
-
```
|
|
39
|
-
|
|
40
|
-
# Your Task
|
|
41
|
-
|
|
42
|
-
Analyze the execution highlights and generate an improved version of the task YAML that addresses the problems encountered during execution.
|
|
43
|
-
|
|
44
|
-
## Analysis Guidelines
|
|
45
|
-
|
|
46
|
-
### Interpreting Judge Failures (llm_as_judge: true)
|
|
47
|
-
|
|
48
|
-
Judge failures indicate that Claude's output didn't meet quality standards. Common causes:
|
|
49
|
-
|
|
50
|
-
**Unclear prompts** - The step instructions were too vague
|
|
51
|
-
- Fix: Add specific numbered sub-steps
|
|
52
|
-
- Fix: Define clear success criteria
|
|
53
|
-
- Fix: Specify what to check and how to verify it
|
|
54
|
-
|
|
55
|
-
**Missing criteria** - The prompt didn't explain what "good" looks like
|
|
56
|
-
- Fix: Add examples of expected output
|
|
57
|
-
- Fix: Specify quality thresholds (test coverage %, file count, etc.)
|
|
58
|
-
- Fix: Include validation steps
|
|
59
|
-
|
|
60
|
-
**Steps too large** - One step tried to do too much
|
|
61
|
-
- Fix: Break into smaller, focused steps
|
|
62
|
-
- Fix: Each step should have one clear objective
|
|
63
|
-
|
|
64
|
-
**Example Fix:**
|
|
65
|
-
```
|
|
66
|
-
BEFORE:
|
|
67
|
-
- name: "validate results"
|
|
68
|
-
llm_as_judge: true
|
|
69
|
-
prompt: "Validate the conversion results"
|
|
70
|
-
|
|
71
|
-
AFTER:
|
|
72
|
-
- name: "validate results"
|
|
73
|
-
llm_as_judge: true
|
|
74
|
-
prompt: |
|
|
75
|
-
Validate the TypeScript conversion by checking:
|
|
76
|
-
1. Read the generated .ts file
|
|
77
|
-
2. Verify all functions have type annotations
|
|
78
|
-
3. Check that tests pass (npm test)
|
|
79
|
-
4. Confirm no compilation errors (tsc --noEmit)
|
|
80
|
-
|
|
81
|
-
Success criteria: All 4 checks pass without errors.
|
|
82
|
-
```
|
|
83
|
-
|
|
84
|
-
### Interpreting Self-Healing Events (self_healing: true)
|
|
85
|
-
|
|
86
|
-
Self-healing activations indicate brittle script steps that failed during execution. Common causes:
|
|
87
|
-
|
|
88
|
-
**Missing dependencies** - Command not found, package not installed
|
|
89
|
-
- Fix: Add a script step to install/check dependencies first
|
|
90
|
-
- Fix: Use explicit paths instead of assuming commands are in PATH
|
|
91
|
-
|
|
92
|
-
**Wrong assumptions** - Script assumed files/directories exist
|
|
93
|
-
- Fix: Add checks or create directories in the script
|
|
94
|
-
- Fix: Use `mkdir -p` instead of `mkdir`
|
|
95
|
-
- Fix: Check file existence before operating on it
|
|
96
|
-
|
|
97
|
-
**Environment issues** - PWD, env vars, or paths incorrect
|
|
98
|
-
- Fix: Use absolute paths instead of relative
|
|
99
|
-
- Fix: cd to correct directory in the script
|
|
100
|
-
- Fix: Set required environment variables
|
|
101
|
-
|
|
102
|
-
**Race conditions** - Script ran before previous step completed
|
|
103
|
-
- Fix: Add wait/check logic
|
|
104
|
-
- Fix: Combine dependent commands with && in one script step
|
|
105
|
-
|
|
106
|
-
**Example Fix:**
|
|
107
|
-
```
|
|
108
|
-
BEFORE:
|
|
109
|
-
- name: "run tests"
|
|
110
|
-
type: script
|
|
111
|
-
self_healing: true
|
|
112
|
-
command: npm test
|
|
113
|
-
|
|
114
|
-
AFTER:
|
|
115
|
-
- name: "install dependencies"
|
|
116
|
-
type: script
|
|
117
|
-
command: npm install
|
|
118
|
-
|
|
119
|
-
- name: "run tests"
|
|
120
|
-
type: script
|
|
121
|
-
self_healing: true
|
|
122
|
-
command: npm test
|
|
123
|
-
```
|
|
124
|
-
|
|
125
|
-
### Interpreting Complex Tool Sequences
|
|
126
|
-
|
|
127
|
-
Complex tool sequences (3+ tools) indicate that Claude had to work hard to complete a step. Common causes:
|
|
128
|
-
|
|
129
|
-
**Vague instructions** - Step didn't specify what files to operate on
|
|
130
|
-
- Fix: List specific file paths to read/edit
|
|
131
|
-
- Fix: Specify glob patterns for file discovery
|
|
132
|
-
- Fix: Break discovery and operation into separate steps
|
|
133
|
-
|
|
134
|
-
**Exploratory work needed** - Claude had to search to understand the codebase
|
|
135
|
-
- Fix: Add a separate discovery/analysis step first
|
|
136
|
-
- Fix: Provide file paths in the prompt
|
|
137
|
-
- Fix: Include relevant code snippets in the prompt
|
|
138
|
-
|
|
139
|
-
**Multi-phase operations** - One step tried to do research + implementation
|
|
140
|
-
- Fix: Split into "research" step and "implementation" step
|
|
141
|
-
- Fix: First step outputs findings, second step acts on them
|
|
142
|
-
|
|
143
|
-
**Example Fix:**
|
|
144
|
-
```
|
|
145
|
-
BEFORE:
|
|
146
|
-
- name: "update imports"
|
|
147
|
-
prompt: "Update all imports to use the new module structure"
|
|
148
|
-
|
|
149
|
-
AFTER:
|
|
150
|
-
- name: "analyze imports"
|
|
151
|
-
prompt: |
|
|
152
|
-
Search the codebase for all import statements:
|
|
153
|
-
1. Use grep to find all imports in src/
|
|
154
|
-
2. List files that import from old modules
|
|
155
|
-
3. Create a plan for updating each file
|
|
156
|
-
|
|
157
|
-
- name: "update imports"
|
|
158
|
-
prompt: |
|
|
159
|
-
Update imports in the following files based on the analysis:
|
|
160
|
-
- src/components/Button.tsx
|
|
161
|
-
- src/utils/helpers.ts
|
|
162
|
-
- src/services/api.ts
|
|
163
|
-
|
|
164
|
-
Change: import from './old/' to import from '@/new/'
|
|
165
|
-
```
|
|
166
|
-
|
|
167
|
-
## Improvement Principles
|
|
168
|
-
|
|
169
|
-
1. **Preserve the original goal** - The task succeeded, so the goal is correct
|
|
170
|
-
2. **Fix problems shown in highlights** - Only address issues that actually occurred
|
|
171
|
-
3. **Be specific** - Add numbered steps, file paths, and clear criteria
|
|
172
|
-
4. **Break down large steps** - If a step caused many retries or complex tool sequences
|
|
173
|
-
5. **Add prerequisite steps** - If self-healing had to install deps or create files
|
|
174
|
-
6. **Keep self_improve: true** - Allow recursive improvement in future runs
|
|
175
|
-
7. **Document changes** - Explain what you changed and why in the changelog
|
|
176
|
-
|
|
177
|
-
## Improvement Patterns
|
|
178
|
-
|
|
179
|
-
### Pattern: Split Vague Prompt into Specific Sub-Steps
|
|
180
|
-
|
|
181
|
-
When a judge fails or complex tools are needed, make the prompt more specific:
|
|
182
|
-
|
|
183
|
-
```yaml
|
|
184
|
-
# BEFORE: Vague, requires exploration
|
|
185
|
-
- name: "refactor authentication"
|
|
186
|
-
llm_as_judge: true
|
|
187
|
-
prompt: "Refactor the authentication code"
|
|
188
|
-
|
|
189
|
-
# AFTER: Specific numbered steps
|
|
190
|
-
- name: "refactor authentication"
|
|
191
|
-
llm_as_judge: true
|
|
192
|
-
prompt: |
|
|
193
|
-
Refactor authentication by:
|
|
194
|
-
1. Reading src/auth/login.ts and src/auth/session.ts
|
|
195
|
-
2. Extracting common logic into src/auth/helpers.ts
|
|
196
|
-
3. Updating imports in both files
|
|
197
|
-
4. Running tests to verify: npm test src/auth/
|
|
198
|
-
|
|
199
|
-
Success: Tests pass, no code duplication between login.ts and session.ts
|
|
200
|
-
```
|
|
201
|
-
|
|
202
|
-
### Pattern: Add Prerequisite Step
|
|
203
|
-
|
|
204
|
-
When self-healing installs deps or fixes environment:
|
|
205
|
-
|
|
206
|
-
```yaml
|
|
207
|
-
# BEFORE: Brittle, assumes deps installed
|
|
208
|
-
steps:
|
|
209
|
-
- name: "build"
|
|
210
|
-
type: script
|
|
211
|
-
self_healing: true
|
|
212
|
-
command: npm run build
|
|
213
|
-
|
|
214
|
-
# AFTER: Explicit dependency step
|
|
215
|
-
steps:
|
|
216
|
-
- name: "install dependencies"
|
|
217
|
-
type: script
|
|
218
|
-
command: npm install
|
|
219
|
-
|
|
220
|
-
- name: "build"
|
|
221
|
-
type: script
|
|
222
|
-
command: npm run build
|
|
223
|
-
```
|
|
224
|
-
|
|
225
|
-
### Pattern: Split Research from Implementation
|
|
226
|
-
|
|
227
|
-
When complex tool sequences suggest exploratory work:
|
|
228
|
-
|
|
229
|
-
```yaml
|
|
230
|
-
# BEFORE: Combined research + work
|
|
231
|
-
- name: "fix bugs"
|
|
232
|
-
prompt: "Find and fix all bugs in the payment flow"
|
|
233
|
-
|
|
234
|
-
# AFTER: Separated discovery and fixing
|
|
235
|
-
- name: "identify payment bugs"
|
|
236
|
-
prompt: |
|
|
237
|
-
Analyze the payment flow for bugs:
|
|
238
|
-
1. Read src/payment/*.ts files
|
|
239
|
-
2. Check for error handling gaps
|
|
240
|
-
3. List files that need fixes
|
|
241
|
-
|
|
242
|
-
- name: "fix payment bugs"
|
|
243
|
-
llm_as_judge: true
|
|
244
|
-
prompt: |
|
|
245
|
-
Fix bugs identified in previous step:
|
|
246
|
-
- Add error handling in src/payment/checkout.ts
|
|
247
|
-
- Validate input in src/payment/process.ts
|
|
248
|
-
- Update tests in src/payment/__tests__/
|
|
249
|
-
|
|
250
|
-
Success: All payment tests pass
|
|
251
|
-
```
|
|
252
|
-
|
|
253
|
-
### Pattern: Add Explicit Success Criteria
|
|
254
|
-
|
|
255
|
-
When judge fails due to unclear expectations:
|
|
256
|
-
|
|
257
|
-
```yaml
|
|
258
|
-
# BEFORE: No clear success criteria
|
|
259
|
-
- name: "improve test coverage"
|
|
260
|
-
llm_as_judge: true
|
|
261
|
-
prompt: "Improve test coverage for the API module"
|
|
262
|
-
|
|
263
|
-
# AFTER: Explicit threshold and verification
|
|
264
|
-
- name: "improve test coverage"
|
|
265
|
-
llm_as_judge: true
|
|
266
|
-
prompt: |
|
|
267
|
-
Improve test coverage for src/api/ to at least 80%:
|
|
268
|
-
1. Run: npm test -- --coverage src/api/
|
|
269
|
-
2. Identify files with <80% coverage
|
|
270
|
-
3. Write tests for uncovered code paths
|
|
271
|
-
4. Re-run coverage and verify ≥80%
|
|
272
|
-
|
|
273
|
-
Success criteria: Coverage report shows ≥80% for all files in src/api/
|
|
274
|
-
```
|
|
275
|
-
|
|
276
|
-
# Output Format
|
|
277
|
-
|
|
278
|
-
Respond with a single JSON object:
|
|
279
|
-
{
|
|
280
|
-
"improved_yaml": "<complete improved task YAML — no markdown fences, raw YAML only>",
|
|
281
|
-
"changelog": "<markdown: Problems Identified / Changes Applied / Expected Impact>"
|
|
282
|
-
}
|
|
283
|
-
|
|
284
|
-
Output only the JSON object — no prose before or after.
|
|
285
|
-
|
|
286
|
-
# Important Requirements
|
|
287
|
-
|
|
288
|
-
1. **Always preserve the original goal** - Do not change the goal statement
|
|
289
|
-
2. **Keep self_improve: true** - This enables recursive improvement
|
|
290
|
-
3. **Only fix problems shown in highlights** - Don't add unnecessary changes
|
|
291
|
-
4. **Be specific in improvements** - Vague fixes won't help
|
|
292
|
-
5. **Generate valid YAML** - The improved task must be parseable
|
|
293
|
-
6. **Explain all changes** - The changelog should justify each modification
|
|
294
|
-
|
|
295
|
-
# Example Response
|
|
296
|
-
|
|
297
|
-
```json
|
|
298
|
-
{
|
|
299
|
-
"improved_yaml": "goal: \"Convert CoffeeScript to TypeScript with validation\"\nself_improve: true\n\nsteps:\n - name: \"install dependencies\"\n type: script\n command: npm install\n\n - name: \"convert to TypeScript\"\n type: script\n command: coffee2ts convert app.coffee\n\n - name: \"validate conversion\"\n llm_as_judge: true\n prompt: |\n Validate the TypeScript conversion by:\n 1. Reading app.ts and checking all functions have type annotations\n 2. Running: tsc --noEmit to check for type errors\n 3. Running: npm test to verify functionality\n\n Success criteria: No type errors, all tests pass",
|
|
300
|
-
"changelog": "## Problems Identified\n- Judge failure in \"validate conversion\": Instructions were too vague\n- Self-healing activation: npm dependencies were missing\n\n## Changes Applied\n\n### Step 1: install dependencies (NEW)\n- Before: Not present\n- After: Added explicit npm install step\n- Rationale: Self-healing had to install deps, do it upfront\n\n### Step 3: validate conversion (MODIFIED)\n- Before: \"Validate the results\"\n- After: Specific 3-step validation with success criteria\n- Rationale: Judge failed because unclear what to validate and how\n\n## Expected Impact\n- Judge retries: 1 → 0 (clearer validation steps)\n- Self-healing activations: 1 → 0 (deps installed first)"
|
|
301
|
-
}
|
|
302
|
-
```
|
|
303
|
-
|
|
304
|
-
Now analyze the highlights and generate the improved task YAML with detailed changelog.
|