executant 1.19.0 → 1.21.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -115,11 +115,43 @@ steps:
115
115
  This is pass {{item}} of 5. Review src/runner.ts for untested edge cases.
116
116
  ```
117
117
 
118
+ ## Variables at Runtime
119
+
120
+ Pass `--var KEY=VALUE` on the command line to override or supply workflow vars without editing the YAML:
121
+
122
+ ```bash
123
+ executant --var env=staging --var region=eu-west-1 deploy.yaml
124
+ ```
125
+
126
+ CLI vars override any same-named vars in the workflow's `vars:` section. Multiple `--var` flags are accepted.
127
+
118
128
  ## Quality Controls
119
129
 
120
130
  - **`llm_as_judge: true`** — after a step completes, Claude evaluates the output; retries with feedback on FAIL, up to 5×
121
131
  - **`self_healing: true`** — on script failure, Claude diagnoses and repairs the command, then re-runs it, up to 5×
122
- - **`self_improve: true`** — after the workflow finishes, Claude analyzes execution highlights and saves an improved YAML to `tasks/backlog/`
132
+ - **`timeout_seconds: N`** — kill the step after N seconds and fail with exit code 3. Works for both script and prompt steps.
133
+
134
+ ```yaml
135
+ steps:
136
+ - name: install
137
+ command: npm ci
138
+ timeout_seconds: 120 # fail if install takes longer than 2 min
139
+
140
+ - name: implement
141
+ prompt: Implement the feature described above.
142
+ timeout_seconds: 1800 # 30 min ceiling for the Claude step
143
+ ```
144
+
145
+ ## Cancellation
146
+
147
+ Write a `.executant-cancel` file in the **same directory as the workflow YAML** to stop the workflow cleanly **between steps**:
148
+
149
+ ```bash
150
+ executant long-workflow.yaml &
151
+ touch .executant-cancel # workflow stops at the next step boundary; exits 4
152
+ ```
153
+
154
+ The file is deleted automatically. This is a cooperative, process-safe alternative to SIGTERM — no mid-step git state corruption. The cancel file is always resolved relative to the workflow file, so the location is predictable regardless of which directory you invoked executant from.
123
155
 
124
156
  ## Interjection
125
157
 
@@ -148,21 +180,35 @@ press i → ▷ don't delete that file, use git revert▌ esc to cancel
148
180
  | `logging-demo.yaml` | Log steps, self-healing, judge |
149
181
  | `git-status-summary.yaml` | Real-world git workflow |
150
182
  | `repeat-demo.yaml` | Running a step N times with `repeat` |
183
+ | `file-demo.yaml` | File operations |
184
+ | `from-step-test.yaml` | Using `--from-step` to resume mid-workflow |
151
185
 
152
186
  See the [`examples/`](examples/) directory.
153
187
 
154
188
  ## CLI
155
189
 
156
190
  ```bash
157
- executant plan "description" # generate a workflow YAML (auto-detects fast path)
158
- executant plan -q "description" # skip research pass (fast path)
159
- executant workflow.yaml # run a workflow
160
- executant --ci workflow.yaml # headless, NDJSON to stdout
161
- executant --step <name|n> wf.yaml # run one step by name or index
162
- executant --from-step <n> wf.yaml # resume from step n
163
- executant update # upgrade to latest version
191
+ executant plan "description" # generate a workflow YAML (auto-detects fast path)
192
+ executant plan -q "description" # skip research pass (fast path)
193
+ executant refine workflow.yaml "instructions" # refine an existing workflow YAML
194
+ executant workflow.yaml # run a workflow
195
+ executant --ci workflow.yaml # headless, NDJSON to stdout
196
+ executant --step <name|n> wf.yaml # run one step by name or index
197
+ executant --from-step <n> wf.yaml # resume from step n
198
+ executant --var KEY=VALUE wf.yaml # override a workflow var at runtime
199
+ executant update # upgrade to latest version
164
200
  ```
165
201
 
202
+ ### Exit codes
203
+
204
+ | Code | Meaning |
205
+ |------|---------|
206
+ | `0` | All steps completed successfully |
207
+ | `1` | A step failed at runtime |
208
+ | `2` | YAML or variable validation error |
209
+ | `3` | A step timed out (`timeout_seconds` exceeded) |
210
+ | `4` | Cancelled via `.executant-cancel` file |
211
+
166
212
  ## Development
167
213
 
168
214
  ```bash
package/dist/index.js CHANGED
@@ -52,8 +52,8 @@ var init_update = __esm({
52
52
  // src/index.ts
53
53
  import React3 from "react";
54
54
  import { render } from "ink";
55
- import { readFileSync as readFileSync7 } from "node:fs";
56
- import { dirname as dirname5, join as join5 } from "node:path";
55
+ import { mkdirSync as mkdirSync4, readFileSync as readFileSync6 } from "node:fs";
56
+ import { dirname as dirname4, join as join5, resolve as resolve3 } from "node:path";
57
57
  import { fileURLToPath as fileURLToPath2 } from "node:url";
58
58
 
59
59
  // src/load-workflow.ts
@@ -154,16 +154,16 @@ var RawStepSchema = z.lazy(
154
154
  forEach: z.union([z.array(z.string()), z.string()]).optional(),
155
155
  repeat: z.number().int().positive().optional(),
156
156
  context: z.array(z.string()).optional(),
157
- steps: z.array(RawStepSchema).min(1).optional()
157
+ steps: z.array(RawStepSchema).min(1).optional(),
158
+ timeout_seconds: z.number().positive().optional()
158
159
  })
159
160
  );
160
161
  var RawWorkflowSchema = z.object({
161
162
  goal: z.string(),
162
163
  steps: z.array(RawStepSchema),
163
- vars: z.record(z.string(), z.string()).optional(),
164
- self_improve: z.boolean().optional()
164
+ vars: z.record(z.string(), z.string()).optional()
165
165
  });
166
- function loadWorkflow(filePath2) {
166
+ function loadWorkflow(filePath2, cliVars2 = {}) {
167
167
  let raw;
168
168
  try {
169
169
  raw = readFileSync2(filePath2, "utf8");
@@ -180,7 +180,7 @@ function loadWorkflow(filePath2) {
180
180
  throw new Error(`Invalid workflow file "${filePath2}":
181
181
  ${detail}`);
182
182
  }
183
- const vars = doc.vars ?? {};
183
+ const vars = { ...doc.vars ?? {}, ...cliVars2 };
184
184
  const seen = /* @__PURE__ */ new Set();
185
185
  for (const step of doc.steps) {
186
186
  if (seen.has(step.name)) {
@@ -193,7 +193,6 @@ ${detail}`);
193
193
  return {
194
194
  goal: doc.goal,
195
195
  vars,
196
- selfImprove: doc.self_improve,
197
196
  tasks: doc.steps.map((step) => convertStep(step, vars))
198
197
  };
199
198
  }
@@ -245,6 +244,9 @@ function convertInnerStep(step, vars, name, continueOnError) {
245
244
  maxHealingAttempts: step.max_healing_attempts,
246
245
  ...step.output && {
247
246
  output: resolveOutputFile(step.output, vars, name)
247
+ },
248
+ ...step.timeout_seconds !== void 0 && {
249
+ timeoutSeconds: step.timeout_seconds
248
250
  }
249
251
  };
250
252
  }
@@ -269,7 +271,10 @@ function convertInnerStep(step, vars, name, continueOnError) {
269
271
  llmAsJudge: step.llm_as_judge,
270
272
  allowedTools: step.allowed_tools,
271
273
  model: "sonnet",
272
- ...contextFiles.length > 0 && { contextFiles }
274
+ ...contextFiles.length > 0 && { contextFiles },
275
+ ...step.timeout_seconds !== void 0 && {
276
+ timeoutSeconds: step.timeout_seconds
277
+ }
273
278
  };
274
279
  }
275
280
  default:
@@ -311,14 +316,42 @@ function substituteVars(text, vars, stepName, field) {
311
316
 
312
317
  // src/runner.ts
313
318
  import { exec } from "node:child_process";
314
- import { mkdirSync, readFileSync as readFileSync3, writeFileSync } from "node:fs";
315
- import { dirname as dirname2 } from "node:path";
319
+ import {
320
+ existsSync,
321
+ mkdirSync,
322
+ readFileSync as readFileSync3,
323
+ unlinkSync,
324
+ writeFileSync
325
+ } from "node:fs";
326
+ import { dirname as dirname2, join as join2 } from "node:path";
316
327
  import { promisify } from "node:util";
317
328
  import { z as z2 } from "zod";
318
329
 
319
330
  // src/tasks/command.ts
320
331
  import { spawn } from "node:child_process";
321
332
 
333
+ // src/types.ts
334
+ var InterjectChannel = class {
335
+ _queue = [];
336
+ /** Called by the TUI when the user submits an interjection message. */
337
+ interject(message) {
338
+ this._queue.push(message);
339
+ }
340
+ /** Drains and returns any queued messages (for non-Claude steps to consume). */
341
+ consumeQueue() {
342
+ const q = this._queue.slice();
343
+ this._queue = [];
344
+ return q;
345
+ }
346
+ };
347
+ var TimeoutError = class extends Error {
348
+ exitCode = 3;
349
+ constructor(stepName, seconds) {
350
+ super(`Step "${stepName}" timed out after ${seconds}s`);
351
+ this.name = "TimeoutError";
352
+ }
353
+ };
354
+
322
355
  // src/tasks/stream.ts
323
356
  var AsyncQueue = class {
324
357
  buf = [];
@@ -376,6 +409,25 @@ function waitForExit(proc) {
376
409
  proc.on("error", reject);
377
410
  });
378
411
  }
412
+ function startTimeout(proc, taskName, timeoutSeconds) {
413
+ if (timeoutSeconds == null) return { check: () => {
414
+ }, cancel: () => {
415
+ } };
416
+ let timedOut = false;
417
+ const timer = setTimeout(() => {
418
+ timedOut = true;
419
+ try {
420
+ proc.kill();
421
+ } catch {
422
+ }
423
+ }, timeoutSeconds * 1e3);
424
+ return {
425
+ check: () => {
426
+ if (timedOut) throw new TimeoutError(taskName, timeoutSeconds);
427
+ },
428
+ cancel: () => clearTimeout(timer)
429
+ };
430
+ }
379
431
 
380
432
  // src/tasks/command.ts
381
433
  var CommandError = class extends Error {
@@ -393,12 +445,22 @@ async function* runCommand(task) {
393
445
  const proc = spawn("bash", ["-c", task.command], {
394
446
  stdio: ["ignore", "pipe", "pipe"]
395
447
  });
396
- for await (const line of mergeStreamsToLines(proc.stdout, proc.stderr)) {
397
- yield { type: "output:text", index: -1, text: line };
398
- }
399
- const code = await waitForExit(proc);
400
- if (code !== 0) {
401
- throw new CommandError(code, task.command, `Command "${task.name}" exited with code ${code}`);
448
+ const timeout = startTimeout(proc, task.name, task.timeoutSeconds);
449
+ try {
450
+ for await (const line of mergeStreamsToLines(proc.stdout, proc.stderr)) {
451
+ yield { type: "output:text", index: -1, text: line };
452
+ }
453
+ const code = await waitForExit(proc);
454
+ timeout.check();
455
+ if (code !== 0) {
456
+ throw new CommandError(
457
+ code,
458
+ task.command,
459
+ `Command "${task.name}" exited with code ${code}`
460
+ );
461
+ }
462
+ } finally {
463
+ timeout.cancel();
402
464
  }
403
465
  }
404
466
 
@@ -460,6 +522,7 @@ async function* runClaude(task) {
460
522
  };
461
523
  process.once("SIGTERM", cleanup);
462
524
  process.once("SIGHUP", cleanup);
525
+ const timeout = startTimeout(proc, task.name, task.timeoutSeconds);
463
526
  const plainLines = [];
464
527
  try {
465
528
  for await (const line of mergeStreamsToLines(proc.stdout, proc.stderr)) {
@@ -476,8 +539,10 @@ async function* runClaude(task) {
476
539
  }
477
540
  }
478
541
  const code = await waitForExit(proc);
542
+ timeout.check();
479
543
  if (code !== 0) throw buildExitError(code, plainLines);
480
544
  } finally {
545
+ timeout.cancel();
481
546
  process.off("SIGTERM", cleanup);
482
547
  process.off("SIGHUP", cleanup);
483
548
  }
@@ -562,10 +627,28 @@ function shouldSkipStep(stepNumber, name, options2) {
562
627
  }
563
628
  return options2.fromStep !== void 0 && stepNumber < options2.fromStep[0];
564
629
  }
630
+ var LAST_OUTPUT_MAX_LINES = 100;
565
631
  async function* runWorkflow(workflow2, options2 = {}, channel2) {
566
632
  const workflowStart = Date.now();
633
+ const cancelFile = join2(
634
+ options2.workDir ?? process.cwd(),
635
+ ".executant-cancel"
636
+ );
567
637
  yield { type: "workflow:start", workflow: workflow2 };
638
+ let lastStepOutput;
568
639
  for (const [i, task] of workflow2.tasks.entries()) {
640
+ if (existsSync(cancelFile)) {
641
+ try {
642
+ unlinkSync(cancelFile);
643
+ } catch {
644
+ }
645
+ yield {
646
+ type: "workflow:cancelled",
647
+ workflow: workflow2,
648
+ durationMs: Date.now() - workflowStart
649
+ };
650
+ return;
651
+ }
569
652
  const stepNumber = i + 1;
570
653
  if (shouldSkipStep(stepNumber, task.name, options2)) {
571
654
  yield { type: "step:skip", index: i, name: task.name };
@@ -574,14 +657,20 @@ async function* runWorkflow(workflow2, options2 = {}, channel2) {
574
657
  const stepStart = Date.now();
575
658
  yield { type: "step:start", index: i, name: task.name };
576
659
  const from = options2.fromStep && options2.fromStep[0] === stepNumber ? options2.fromStep.slice(1) : void 0;
660
+ const lines = [];
577
661
  try {
578
662
  for await (const event of runStep(task, from, channel2)) {
579
663
  if (event.type === "step:iteration" || event.type === "step:inner" || event.type === "output:text" || event.type === "output:tool") {
664
+ if (event.type === "output:text") {
665
+ if (lines.length >= LAST_OUTPUT_MAX_LINES) lines.shift();
666
+ lines.push(event.text);
667
+ }
580
668
  yield { ...event, index: i };
581
669
  } else {
582
670
  yield event;
583
671
  }
584
672
  }
673
+ lastStepOutput = lines.join("\n") || void 0;
585
674
  yield {
586
675
  type: "step:complete",
587
676
  index: i,
@@ -590,14 +679,23 @@ async function* runWorkflow(workflow2, options2 = {}, channel2) {
590
679
  };
591
680
  } catch (err) {
592
681
  const error = normalizeError(err);
593
- yield { type: "step:error", index: i, name: task.name, error };
682
+ const lastOutput = lines.join("\n") || void 0;
683
+ lastStepOutput = lastOutput;
684
+ yield {
685
+ type: "step:error",
686
+ index: i,
687
+ name: task.name,
688
+ error,
689
+ lastOutput
690
+ };
594
691
  if (!task.continueOnError) throw error;
595
692
  }
596
693
  }
597
694
  yield {
598
695
  type: "workflow:complete",
599
696
  workflow: workflow2,
600
- durationMs: Date.now() - workflowStart
697
+ durationMs: Date.now() - workflowStart,
698
+ lastOutput: lastStepOutput
601
699
  };
602
700
  }
603
701
  async function* runStep(task, from, channel2) {
@@ -1060,6 +1158,7 @@ function reducer(state, event) {
1060
1158
  case "workflow:start":
1061
1159
  return { ...state, startTime: Date.now() };
1062
1160
  case "workflow:complete":
1161
+ case "workflow:cancelled":
1063
1162
  return { ...state, endTime: Date.now() };
1064
1163
  case "step:start":
1065
1164
  return updateTask(state, event.index, {
@@ -1456,10 +1555,15 @@ function App({
1456
1555
  if (event.type === "workflow:complete") {
1457
1556
  setTimeout(() => exit(), EXIT_DELAY_MS);
1458
1557
  }
1558
+ if (event.type === "workflow:cancelled") {
1559
+ process.exitCode = 4;
1560
+ setTimeout(() => exit(), EXIT_DELAY_MS);
1561
+ }
1459
1562
  }
1460
1563
  } catch (err) {
1461
1564
  if (!active) return;
1462
1565
  dispatch({ type: "log", level: "error", text: getErrorMessage(err) });
1566
+ process.exitCode = err instanceof TimeoutError ? 3 : 1;
1463
1567
  setTimeout(
1464
1568
  () => exit(err instanceof Error ? err : new Error(getErrorMessage(err))),
1465
1569
  EXIT_DELAY_MS
@@ -1603,8 +1707,8 @@ function App({
1603
1707
  }
1604
1708
 
1605
1709
  // src/plan.ts
1606
- import { existsSync, mkdirSync as mkdirSync2, readFileSync as readFileSync4, writeFileSync as writeFileSync2 } from "node:fs";
1607
- import { join as join2, resolve } from "node:path";
1710
+ import { existsSync as existsSync2, mkdirSync as mkdirSync2, readFileSync as readFileSync4, writeFileSync as writeFileSync2 } from "node:fs";
1711
+ import { join as join3, resolve } from "node:path";
1608
1712
  import { dump as dumpYaml } from "js-yaml";
1609
1713
  import { z as z3 } from "zod";
1610
1714
  import { zodToJsonSchema as zodToJsonSchema2 } from "zod-to-json-schema";
@@ -1620,8 +1724,7 @@ var TOTAL_PLAN_STAGES = 3;
1620
1724
  var WorkflowSchema = z3.object({
1621
1725
  goal: z3.string(),
1622
1726
  steps: z3.array(RawStepSchema).min(1),
1623
- vars: z3.record(z3.string()).optional(),
1624
- self_improve: z3.boolean().optional()
1727
+ vars: z3.record(z3.string()).optional()
1625
1728
  });
1626
1729
  var PlanJudgeOutputSchema = z3.object({
1627
1730
  pass: z3.boolean(),
@@ -1633,7 +1736,7 @@ function walkUp(startDir, check) {
1633
1736
  while (true) {
1634
1737
  const found = check(dir);
1635
1738
  if (found !== null) return found;
1636
- const parent = join2(dir, "..");
1739
+ const parent = join3(dir, "..");
1637
1740
  if (resolve(parent) === resolve(dir)) return null;
1638
1741
  dir = parent;
1639
1742
  }
@@ -1641,13 +1744,13 @@ function walkUp(startDir, check) {
1641
1744
  function findGitRoot(startDir) {
1642
1745
  return walkUp(
1643
1746
  startDir,
1644
- (dir) => existsSync(join2(dir, ".git")) ? dir : null
1747
+ (dir) => existsSync2(join3(dir, ".git")) ? dir : null
1645
1748
  );
1646
1749
  }
1647
1750
  function findProjectRoot(startDir) {
1648
1751
  return walkUp(startDir, (dir) => {
1649
- const candidate = join2(dir, ".claude", "executant.local", "tasks");
1650
- return existsSync(candidate) ? candidate : null;
1752
+ const candidate = join3(dir, ".claude", "executant.local", "tasks");
1753
+ return existsSync2(candidate) ? candidate : null;
1651
1754
  });
1652
1755
  }
1653
1756
  function isSimpleRequest(description) {
@@ -1687,7 +1790,7 @@ Examples:
1687
1790
  console.error("Error: -f/--file requires a file path argument");
1688
1791
  process.exit(1);
1689
1792
  }
1690
- if (!existsSync(filePath2)) {
1793
+ if (!existsSync2(filePath2)) {
1691
1794
  console.error(`Error: File not found: ${filePath2}`);
1692
1795
  process.exit(1);
1693
1796
  }
@@ -1715,14 +1818,14 @@ Examples:
1715
1818
  let taskDir = findProjectRoot(process.cwd());
1716
1819
  if (!taskDir) {
1717
1820
  const base = findGitRoot(process.cwd()) ?? process.cwd();
1718
- taskDir = join2(base, ".claude", "executant.local", "tasks");
1821
+ taskDir = join3(base, ".claude", "executant.local", "tasks");
1719
1822
  mkdirSync2(taskDir, { recursive: true });
1720
1823
  }
1721
- const todoDir = join2(taskDir, "todo");
1824
+ const todoDir = join3(taskDir, "todo");
1722
1825
  mkdirSync2(todoDir, { recursive: true });
1723
1826
  const slug = slugify(description);
1724
1827
  const ts = timestamp();
1725
- const taskFile = join2(todoDir, `${ts}-${slug}.yaml`);
1828
+ const taskFile = join3(todoDir, `${ts}-${slug}.yaml`);
1726
1829
  return { description, taskFile, todoDir, fast };
1727
1830
  }
1728
1831
  async function runPass3Judge(description, workflow2) {
@@ -2040,7 +2143,7 @@ ${PLAN_SYSTEM_RULES}`,
2040
2143
  }
2041
2144
 
2042
2145
  // src/refine.ts
2043
- import { existsSync as existsSync2, readFileSync as readFileSync5 } from "node:fs";
2146
+ import { existsSync as existsSync3, readFileSync as readFileSync5 } from "node:fs";
2044
2147
  import { load as loadYaml } from "js-yaml";
2045
2148
  var PLAN_REFINE_PROMPT = loadPrompt("plan-refine");
2046
2149
  var PLAN_SYSTEM_RULES2 = loadPrompt("plan-system-rules");
@@ -2067,7 +2170,7 @@ Examples:
2067
2170
  console.error("Usage: executant refine <task-file> [INSTRUCTIONS]");
2068
2171
  process.exit(1);
2069
2172
  }
2070
- if (!existsSync2(taskFile)) {
2173
+ if (!existsSync3(taskFile)) {
2071
2174
  console.error(`Error: File not found: ${taskFile}`);
2072
2175
  process.exit(1);
2073
2176
  }
@@ -2092,7 +2195,7 @@ Examples:
2092
2195
  console.error("Error: -f/--file requires a file path argument");
2093
2196
  process.exit(1);
2094
2197
  }
2095
- if (!existsSync2(filePath2)) {
2198
+ if (!existsSync3(filePath2)) {
2096
2199
  console.error(`Error: File not found: ${filePath2}`);
2097
2200
  process.exit(1);
2098
2201
  }
@@ -2328,19 +2431,13 @@ function PlanApp({ description, events: events2 }) {
2328
2431
  }
2329
2432
 
2330
2433
  // src/logger.ts
2331
- import {
2332
- appendFileSync,
2333
- existsSync as existsSync3,
2334
- mkdirSync as mkdirSync3,
2335
- readdirSync,
2336
- writeFileSync as writeFileSync3
2337
- } from "node:fs";
2338
- import { dirname as dirname3, join as join3, resolve as resolve2 } from "node:path";
2434
+ import { appendFileSync, existsSync as existsSync4, mkdirSync as mkdirSync3, writeFileSync as writeFileSync3 } from "node:fs";
2435
+ import { dirname as dirname3, join as join4, resolve as resolve2 } from "node:path";
2339
2436
  function findExecutantLocalDir(startDir) {
2340
2437
  let dir = resolve2(startDir);
2341
2438
  while (true) {
2342
- const candidate = join3(dir, ".claude", "executant.local");
2343
- if (existsSync3(candidate)) return candidate;
2439
+ const candidate = join4(dir, ".claude", "executant.local");
2440
+ if (existsSync4(candidate)) return candidate;
2344
2441
  const parent = dirname3(dir);
2345
2442
  if (parent === dir) return null;
2346
2443
  dir = parent;
@@ -2349,29 +2446,20 @@ function findExecutantLocalDir(startDir) {
2349
2446
  function resolveLogDir(workflowFilePath) {
2350
2447
  const startDir = dirname3(resolve2(workflowFilePath));
2351
2448
  const executantLocal = findExecutantLocalDir(startDir);
2352
- return executantLocal ? join3(executantLocal, "logs") : join3(startDir, "logs");
2449
+ return executantLocal ? join4(executantLocal, "logs") : join4(startDir, "logs");
2353
2450
  }
2354
2451
  var INIT_STATE = {
2355
2452
  logFile: "",
2356
2453
  stepIndex: -1,
2357
2454
  stepName: "",
2358
- stepStartMs: 0,
2359
- toolCount: 0,
2360
- complexSequenceFile: "",
2361
- selfHealingFile: "",
2362
- judgeAttempt: 0,
2363
- recentOutput: []
2455
+ stepStartMs: 0
2364
2456
  };
2365
2457
  function appendLog(logFile, text) {
2366
2458
  if (logFile) appendFileSync(logFile, text + "\n");
2367
2459
  }
2368
- function highlightPath(ctx, stepIndex, suffix) {
2369
- return join3(ctx.highlightsDir, `${ctx.ts}_step${stepIndex + 1}_${suffix}.md`);
2370
- }
2371
2460
  function onWorkflowStart(ctx, s) {
2372
2461
  mkdirSync3(ctx.logDir, { recursive: true });
2373
- mkdirSync3(ctx.highlightsDir, { recursive: true });
2374
- const logFile = join3(ctx.logDir, `${ctx.ts}_${ctx.slug}.log`);
2462
+ const logFile = join4(ctx.logDir, `${ctx.ts}_${ctx.slug}.log`);
2375
2463
  writeFileSync3(
2376
2464
  logFile,
2377
2465
  `# Execution Log
@@ -2402,20 +2490,6 @@ ${"\u2501".repeat(51)}
2402
2490
  );
2403
2491
  return next;
2404
2492
  }
2405
- function finalizeComplexSequence(s) {
2406
- if (s.toolCount >= 3 && s.complexSequenceFile) {
2407
- appendFileSync(
2408
- s.complexSequenceFile,
2409
- `
2410
- ---
2411
-
2412
- *Total tools used: ${s.toolCount}*
2413
-
2414
- *Captured by Executant Logger*
2415
- `
2416
- );
2417
- }
2418
- }
2419
2493
  function onStepComplete(s) {
2420
2494
  appendLog(
2421
2495
  s.logFile,
@@ -2423,131 +2497,21 @@ function onStepComplete(s) {
2423
2497
  Step completed in ${((Date.now() - s.stepStartMs) / 1e3).toFixed(1)}s
2424
2498
  `
2425
2499
  );
2426
- finalizeComplexSequence(s);
2427
2500
  return s;
2428
2501
  }
2429
2502
  function onStepError(s, error) {
2430
2503
  appendLog(s.logFile, `
2431
2504
  Step failed: ${error.message}
2432
2505
  `);
2433
- finalizeComplexSequence(s);
2434
2506
  return s;
2435
2507
  }
2436
- function buildHighlightHeader(ctx, s, title, extra = []) {
2437
- return [
2438
- `# ${title}`,
2439
- "",
2440
- `**Task:** ${ctx.slug}`,
2441
- `**Step:** ${s.stepName}`,
2442
- ...extra,
2443
- `**Timestamp:** ${(/* @__PURE__ */ new Date()).toISOString()}`,
2444
- "",
2445
- "---",
2446
- ""
2447
- ].join("\n") + "\n";
2448
- }
2449
- function complexSequenceHeader(ctx, s) {
2450
- return buildHighlightHeader(ctx, s, "Complex Tool Sequence") + "## Claude's Tool Orchestration\n\nClaude used multiple tools to complete this step:\n\n";
2451
- }
2452
- function createComplexSequenceFile(ctx, s) {
2453
- const path = highlightPath(ctx, s.stepIndex, "complex_sequence");
2454
- writeFileSync3(path, complexSequenceHeader(ctx, s));
2455
- return path;
2456
- }
2457
- function onTool(ctx, s, tool, input) {
2458
- const desc = getToolArg(tool, input);
2459
- appendLog(s.logFile, ` [${tool}] ${desc}`);
2460
- const toolCount = s.toolCount + 1;
2461
- const complexSequenceFile = toolCount === 3 ? createComplexSequenceFile(ctx, s) : s.complexSequenceFile;
2462
- if (toolCount >= 3 && complexSequenceFile) {
2463
- appendFileSync(
2464
- complexSequenceFile,
2465
- `${toolCount}. **${tool}** - ${desc}
2466
- `
2467
- );
2468
- }
2469
- return { ...s, toolCount, complexSequenceFile };
2470
- }
2471
- function saveJudgeHighlight(ctx, s, verdict, text) {
2472
- writeFileSync3(
2473
- highlightPath(ctx, s.stepIndex, `judge_${verdict}`),
2474
- buildHighlightHeader(ctx, s, `Judge Verdict: ${verdict}`, [
2475
- `**Attempt:** ${s.judgeAttempt}`
2476
- ]) + [text, "", "---", "", "*Auto-captured*", ""].join("\n")
2477
- );
2508
+ function onTool(s, tool, input) {
2509
+ appendLog(s.logFile, ` [${tool}] ${getToolArg(tool, input)}`);
2510
+ return s;
2478
2511
  }
2479
- var LOG_MATCHERS = [
2480
- {
2481
- pattern: /\[judge\]\s+(PASS|FAIL)/i,
2482
- apply: (ctx, s, text, match) => {
2483
- const verdict = match[1].toUpperCase();
2484
- const judgeAttempt = s.judgeAttempt + 1;
2485
- saveJudgeHighlight(ctx, { ...s, judgeAttempt }, verdict, text);
2486
- return { ...s, judgeAttempt };
2487
- }
2488
- },
2489
- {
2490
- pattern: /\[self-healing\].*failed.*exit\s+(\d+)/i,
2491
- apply: (ctx, s, _text, match) => {
2492
- const selfHealingFile = highlightPath(ctx, s.stepIndex, "self_healing");
2493
- writeFileSync3(
2494
- selfHealingFile,
2495
- buildHighlightHeader(ctx, s, "Self-Healing Activation") + [
2496
- "## \u274C Failure Detected",
2497
- "",
2498
- `**Exit Code:** ${match[1]}`,
2499
- "",
2500
- "**Recent Output:**",
2501
- "```",
2502
- s.recentOutput.join("\n"),
2503
- "```",
2504
- "",
2505
- "---",
2506
- "",
2507
- "## \u{1F527} Claude's Healing Process",
2508
- ""
2509
- ].join("\n")
2510
- );
2511
- return { ...s, selfHealingFile, recentOutput: [] };
2512
- }
2513
- },
2514
- {
2515
- pattern: /\[self-healing\].*Re-running/i,
2516
- apply: (_ctx, s) => {
2517
- if (!s.selfHealingFile) return s;
2518
- appendFileSync(
2519
- s.selfHealingFile,
2520
- [
2521
- "",
2522
- "*(See full log for Claude's diagnostic process)*",
2523
- "",
2524
- "---",
2525
- "",
2526
- "## \u2705 Resolution Applied",
2527
- "",
2528
- "The self-healing process completed. Check the full execution log to see Claude's analysis and fix.",
2529
- "",
2530
- "---",
2531
- "",
2532
- "*Auto-captured*",
2533
- ""
2534
- ].join("\n")
2535
- );
2536
- return { ...s, selfHealingFile: "" };
2537
- }
2538
- }
2539
- ];
2540
- function onLogMessage(ctx, s, level, text) {
2512
+ function onLogMessage(s, level, text) {
2541
2513
  appendLog(s.logFile, `[${level}] ${text}`);
2542
- let state = s;
2543
- for (const { pattern, apply } of LOG_MATCHERS) {
2544
- const m = pattern.exec(text);
2545
- if (m) {
2546
- state = apply(ctx, state, text, m);
2547
- break;
2548
- }
2549
- }
2550
- return state;
2514
+ return s;
2551
2515
  }
2552
2516
  function onWorkflowComplete(ctx, s) {
2553
2517
  appendLog(
@@ -2559,37 +2523,8 @@ Finished: ${(/* @__PURE__ */ new Date()).toISOString()}
2559
2523
  ${"\u2501".repeat(51)}
2560
2524
  `
2561
2525
  );
2562
- const indexFile = join3(ctx.highlightsDir, "README.md");
2563
- if (!existsSync3(indexFile)) {
2564
- writeFileSync3(
2565
- indexFile,
2566
- [
2567
- "# Execution Highlights",
2568
- "",
2569
- "This directory contains automatically extracted highlight moments from task executions.",
2570
- "",
2571
- "## Latest Highlights",
2572
- ""
2573
- ].join("\n")
2574
- );
2575
- }
2576
- const highlights = readdirSync(ctx.highlightsDir).filter((f) => f.startsWith(ctx.ts) && f.endsWith(".md")).sort();
2577
- if (highlights.length > 0) {
2578
- const entries = highlights.map((f) => `- [${f.replace(/\.md$/, "")}](./${f})`).join("\n");
2579
- appendFileSync(
2580
- indexFile,
2581
- `
2582
- ### ${ctx.slug} (${(/* @__PURE__ */ new Date()).toISOString()})
2583
- ${entries}
2584
- `
2585
- );
2586
- }
2587
2526
  return s;
2588
2527
  }
2589
- function onOutputText(s, text) {
2590
- appendLog(s.logFile, text);
2591
- return { ...s, recentOutput: [...s.recentOutput, text] };
2592
- }
2593
2528
  function reduce(ctx, s, event) {
2594
2529
  switch (event.type) {
2595
2530
  case "workflow:start":
@@ -2614,12 +2549,14 @@ function reduce(ctx, s, event) {
2614
2549
  );
2615
2550
  return s;
2616
2551
  case "output:text":
2617
- return onOutputText(s, event.text);
2552
+ appendLog(s.logFile, event.text);
2553
+ return s;
2618
2554
  case "output:tool":
2619
- return onTool(ctx, s, event.tool, event.input);
2555
+ return onTool(s, event.tool, event.input);
2620
2556
  case "log":
2621
- return onLogMessage(ctx, s, event.level, event.text);
2557
+ return onLogMessage(s, event.level, event.text);
2622
2558
  case "workflow:complete":
2559
+ case "workflow:cancelled":
2623
2560
  return onWorkflowComplete(ctx, s);
2624
2561
  default:
2625
2562
  return s;
@@ -2628,15 +2565,12 @@ function reduce(ctx, s, event) {
2628
2565
  function createLogger(logDir, taskName) {
2629
2566
  const ctx = {
2630
2567
  logDir,
2631
- highlightsDir: join3(logDir, "highlights"),
2632
2568
  ts: formatTimestamp(/* @__PURE__ */ new Date()),
2633
2569
  slug: slugify(taskName, 40) || "task"
2634
2570
  };
2635
2571
  const enabled = process.env["EXECUTANT_LOG"] !== "0";
2636
2572
  let state = INIT_STATE;
2637
2573
  return {
2638
- getHighlightsDir: () => ctx.highlightsDir,
2639
- getTimestamp: () => ctx.ts,
2640
2574
  observe(event) {
2641
2575
  if (!enabled) return;
2642
2576
  try {
@@ -2654,195 +2588,10 @@ async function* withLogger(gen, logger2) {
2654
2588
  }
2655
2589
  }
2656
2590
 
2657
- // src/retrospective.ts
2658
- import {
2659
- existsSync as existsSync4,
2660
- mkdirSync as mkdirSync4,
2661
- readdirSync as readdirSync2,
2662
- readFileSync as readFileSync6,
2663
- writeFileSync as writeFileSync4
2664
- } from "node:fs";
2665
- import { basename as basename2, dirname as dirname4, join as join4, resolve as resolve3 } from "node:path";
2666
- import { spawnSync } from "node:child_process";
2667
- import { load as parseYaml2 } from "js-yaml";
2668
- import { z as z4 } from "zod";
2669
- var RetrospectiveOutputSchema = z4.object({
2670
- improved_yaml: z4.string(),
2671
- changelog: z4.string()
2672
- });
2673
- var RETROSPECTIVE_PROMPT = loadPrompt("retrospective-analysis");
2674
- async function runRetrospective(workflowFilePath, workflow2, highlightsDir, runTimestamp) {
2675
- try {
2676
- await doRetrospective(
2677
- workflowFilePath,
2678
- workflow2,
2679
- highlightsDir,
2680
- runTimestamp
2681
- );
2682
- } catch (err) {
2683
- console.warn(
2684
- `
2685
- Self-improvement: retrospective failed: ${getErrorMessage(err)}`
2686
- );
2687
- }
2688
- }
2689
- async function doRetrospective(workflowFilePath, workflow2, highlightsDir, runTimestamp) {
2690
- if (!existsSync4(highlightsDir)) {
2691
- console.log("\nSelf-improvement: no highlights directory found, skipping.");
2692
- return;
2693
- }
2694
- const allFiles = readdirSync2(highlightsDir);
2695
- const runHighlights = allFiles.filter((f) => f.startsWith(runTimestamp) && f.endsWith(".md")).sort();
2696
- if (runHighlights.length === 0) {
2697
- console.log(
2698
- "\nSelf-improvement: no highlights for this run \u2014 task completed without issues, skipping."
2699
- );
2700
- return;
2701
- }
2702
- const divider = "\u2501".repeat(51);
2703
- console.log(`
2704
- ${divider}`);
2705
- console.log(
2706
- "Self-Improvement: Analyzing execution and generating improvements..."
2707
- );
2708
- console.log(`${divider}
2709
- `);
2710
- console.log(`Found ${runHighlights.length} highlight(s) to analyze`);
2711
- const countByPattern = (pat) => runHighlights.filter((f) => f.includes(pat)).length;
2712
- const judgeFailures = countByPattern("_judge_FAIL");
2713
- const selfHealingCount = countByPattern("_self_healing");
2714
- const complexSequences = countByPattern("_complex_sequence");
2715
- const metrics = [
2716
- `- Judge Failures: ${judgeFailures}`,
2717
- `- Self-Healing Activations: ${selfHealingCount}`,
2718
- `- Complex Tool Sequences: ${complexSequences}`,
2719
- `- Total Highlights: ${runHighlights.length}`
2720
- ].join("\n");
2721
- console.log(`
2722
- Execution Metrics:
2723
- ${metrics}
2724
- `);
2725
- console.log("Analyzing execution and generating improvements...\n");
2726
- const highlightContents = runHighlights.map((f) => {
2727
- const content = readFileSync6(join4(highlightsDir, f), "utf8");
2728
- return `### ${f}
2729
-
2730
- ${content}`;
2731
- }).join("\n\n---\n\n");
2732
- const originalYaml = readFileSync6(workflowFilePath, "utf8");
2733
- const taskName = basename2(workflowFilePath, ".yaml");
2734
- const prompt = fillTemplate(RETROSPECTIVE_PROMPT, {
2735
- TASK_NAME: taskName,
2736
- ORIGINAL_GOAL: workflow2.goal,
2737
- ORIGINAL_YAML: originalYaml,
2738
- HIGHLIGHTS: highlightContents,
2739
- METRICS: metrics
2740
- });
2741
- const result = spawnSync(
2742
- "claude",
2743
- [
2744
- "-p",
2745
- prompt,
2746
- "--allowedTools",
2747
- "Read",
2748
- "--permission-mode",
2749
- "bypassPermissions",
2750
- "--output-format",
2751
- "text"
2752
- ],
2753
- {
2754
- encoding: "utf8",
2755
- maxBuffer: 10 * 1024 * 1024,
2756
- stdio: ["ignore", "pipe", "pipe"]
2757
- }
2758
- );
2759
- if (result.error) {
2760
- console.warn(
2761
- `Self-improvement: failed to run claude: ${result.error.message}`
2762
- );
2763
- return;
2764
- }
2765
- if (result.status !== 0) {
2766
- const stderr = result.stderr ?? "";
2767
- console.warn(
2768
- `Self-improvement: claude exited with code ${result.status}${stderr ? ": " + stderr : ""}`
2769
- );
2770
- return;
2771
- }
2772
- const response = result.stdout ?? "";
2773
- let parsed;
2774
- try {
2775
- parsed = JSON.parse(extractJson(response));
2776
- } catch {
2777
- console.warn(
2778
- `Self-improvement: could not parse Claude response as JSON.
2779
- Response: ${response.trim()}`
2780
- );
2781
- return;
2782
- }
2783
- const zodResult = RetrospectiveOutputSchema.safeParse(parsed);
2784
- if (!zodResult.success) {
2785
- console.warn(
2786
- "Self-improvement: response schema mismatch \u2014 improved YAML not saved."
2787
- );
2788
- return;
2789
- }
2790
- const improvedYaml = zodResult.data.improved_yaml.trim();
2791
- const changelog = zodResult.data.changelog.trim() || "No changelog generated.";
2792
- try {
2793
- parseYaml2(improvedYaml);
2794
- } catch (err) {
2795
- console.warn(
2796
- `Self-improvement: generated YAML is invalid (${getErrorMessage(err)}), skipping save.`
2797
- );
2798
- return;
2799
- }
2800
- const startDir = dirname4(resolve3(workflowFilePath));
2801
- const executantLocal = findExecutantLocalDir(startDir);
2802
- const backlogDir = executantLocal ? join4(executantLocal, "tasks", "backlog") : join4(startDir, "..", "backlog");
2803
- mkdirSync4(backlogDir, { recursive: true });
2804
- const ts = formatTimestamp(/* @__PURE__ */ new Date());
2805
- const slug = slugify(taskName, 40);
2806
- const improvedFile = join4(backlogDir, `${ts}-${slug}-improved.yaml`);
2807
- const changelogFile = join4(backlogDir, `${ts}-${slug}-changelog.md`);
2808
- writeFileSync4(improvedFile, improvedYaml + "\n", "utf8");
2809
- writeFileSync4(changelogFile, changelog + "\n", "utf8");
2810
- console.log(`\u2705 Improved task saved: ${improvedFile}`);
2811
- console.log(`\u2705 Changelog saved: ${changelogFile}`);
2812
- console.log(`
2813
- ${divider}`);
2814
- console.log("Improvement Summary");
2815
- console.log(`${divider}
2816
- `);
2817
- console.log(changelog);
2818
- }
2819
- function extractJson(text) {
2820
- const start = text.indexOf("{");
2821
- const end = text.lastIndexOf("}");
2822
- if (start === -1 || end === -1 || end <= start)
2823
- throw new Error("no JSON object found in response");
2824
- return text.slice(start, end + 1);
2825
- }
2826
-
2827
- // src/types.ts
2828
- var InterjectChannel = class {
2829
- _queue = [];
2830
- /** Called by the TUI when the user submits an interjection message. */
2831
- interject(message) {
2832
- this._queue.push(message);
2833
- }
2834
- /** Drains and returns any queued messages (for non-Claude steps to consume). */
2835
- consumeQueue() {
2836
- const q = this._queue.slice();
2837
- this._queue = [];
2838
- return q;
2839
- }
2840
- };
2841
-
2842
2591
  // src/index.ts
2843
2592
  var CURRENT_VERSION = JSON.parse(
2844
- readFileSync7(
2845
- join5(dirname5(fileURLToPath2(import.meta.url)), "../package.json"),
2593
+ readFileSync6(
2594
+ join5(dirname4(fileURLToPath2(import.meta.url)), "../package.json"),
2846
2595
  "utf-8"
2847
2596
  )
2848
2597
  ).version;
@@ -2901,6 +2650,7 @@ Options:
2901
2650
  --ci Headless mode \u2014 print events as NDJSON, no TUI
2902
2651
  --step <name|index> Run only the named step or step at 1-based index
2903
2652
  --from-step <n> Resume from step n (e.g. 3, 3.2, 2.5.4.3 \u2014 1-based path)
2653
+ --var KEY=VALUE Override or supply a workflow var at runtime (repeatable)
2904
2654
  --help, -h Show this help
2905
2655
 
2906
2656
  Commands:
@@ -2942,6 +2692,18 @@ YAML \u2014 script step fields (type: script | command, or inferred when command
2942
2692
  self_healing bool On failure, Claude diagnoses and fixes iteratively
2943
2693
  up to 5 attempts with accumulated context (default: false)
2944
2694
  max_healing_attempts int Override max self-healing retries (default: 5)
2695
+ timeout_seconds number Kill the step and fail with exit code 3 after N seconds
2696
+
2697
+ Cancellation:
2698
+ Write a .executant-cancel file in the working directory to stop execution
2699
+ cleanly between steps (exit code 4). The file is deleted automatically.
2700
+
2701
+ Exit codes:
2702
+ 0 All steps completed successfully
2703
+ 1 A step failed at runtime
2704
+ 2 YAML or variable validation error
2705
+ 3 A step timed out (timeout_seconds exceeded)
2706
+ 4 Cancelled via .executant-cancel file
2945
2707
 
2946
2708
  YAML \u2014 log step fields (type: log, or inferred when message is present and prompt is absent):
2947
2709
  message string Text to emit as a progress marker
@@ -2967,6 +2729,7 @@ Example:
2967
2729
  var ciMode = false;
2968
2730
  var stepFilter;
2969
2731
  var fromStep;
2732
+ var cliVars = {};
2970
2733
  var positional = [];
2971
2734
  for (let i = 0; i < rawArgs.length; i++) {
2972
2735
  const a = rawArgs[i];
@@ -2992,6 +2755,18 @@ for (let i = 0; i < rawArgs.length; i++) {
2992
2755
  process.exit(1);
2993
2756
  }
2994
2757
  fromStep = parts;
2758
+ } else if (a === "--var") {
2759
+ if (!rawArgs[i + 1]) {
2760
+ console.error("--var requires a KEY=VALUE argument");
2761
+ process.exit(1);
2762
+ }
2763
+ const pair = rawArgs[++i];
2764
+ const eq = pair.indexOf("=");
2765
+ if (eq <= 0) {
2766
+ console.error(`--var value must be KEY=VALUE, got: ${pair}`);
2767
+ process.exit(1);
2768
+ }
2769
+ cliVars[pair.slice(0, eq)] = pair.slice(eq + 1);
2995
2770
  } else {
2996
2771
  positional.push(a);
2997
2772
  }
@@ -3003,12 +2778,21 @@ if (!filePath) {
3003
2778
  }
3004
2779
  var workflow;
3005
2780
  try {
3006
- workflow = loadWorkflow(filePath);
2781
+ workflow = loadWorkflow(filePath, cliVars);
3007
2782
  } catch (err) {
3008
2783
  console.error(getErrorMessage(err));
3009
- process.exit(1);
2784
+ process.exit(2);
3010
2785
  }
3011
- var options = { stepFilter, fromStep };
2786
+ var localDir = findExecutantLocalDir(dirname4(resolve3(filePath)));
2787
+ if (localDir) {
2788
+ mkdirSync4(join5(localDir, "tasks", "todo"), { recursive: true });
2789
+ mkdirSync4(join5(localDir, "tasks", "done"), { recursive: true });
2790
+ }
2791
+ var options = {
2792
+ stepFilter,
2793
+ fromStep,
2794
+ workDir: dirname4(resolve3(filePath))
2795
+ };
3012
2796
  var channel = new InterjectChannel();
3013
2797
  var rawEvents = runWorkflow(workflow, options, channel);
3014
2798
  var logger = createLogger(resolveLogDir(filePath), workflow.goal);
@@ -3020,36 +2804,23 @@ function errorReplacer(_key, value) {
3020
2804
  }
3021
2805
  return value;
3022
2806
  }
3023
- async function maybeRunRetrospective(filePath2, workflow2, logger2) {
3024
- if (!logger2) return;
3025
- try {
3026
- await runRetrospective(
3027
- filePath2,
3028
- workflow2,
3029
- logger2.getHighlightsDir(),
3030
- logger2.getTimestamp()
3031
- );
3032
- } catch (err) {
3033
- console.warn(
3034
- "[executant] retrospective failed (non-fatal):",
3035
- getErrorMessage(err)
3036
- );
3037
- }
3038
- }
3039
2807
  if (ciMode) {
3040
2808
  (async () => {
3041
2809
  for await (const event of events) {
3042
- process.stdout.write(JSON.stringify(event, errorReplacer) + "\n");
3043
- }
3044
- if (workflow.selfImprove) {
3045
- await maybeRunRetrospective(filePath, workflow, logger);
2810
+ const line = JSON.stringify(event, errorReplacer) + "\n";
2811
+ if (event.type === "workflow:cancelled") {
2812
+ process.stdout.write(line, () => process.exit(4));
2813
+ return;
2814
+ }
2815
+ process.stdout.write(line);
3046
2816
  }
3047
2817
  })().catch((err) => {
2818
+ const code = err instanceof TimeoutError ? 3 : 1;
3048
2819
  console.error(err);
3049
- process.exit(1);
2820
+ process.exit(code);
3050
2821
  });
3051
2822
  } else {
3052
- const inkApp = render(
2823
+ render(
3053
2824
  React3.createElement(App, {
3054
2825
  workflow,
3055
2826
  events,
@@ -3058,8 +2829,4 @@ if (ciMode) {
3058
2829
  interjectChannel: channel
3059
2830
  })
3060
2831
  );
3061
- if (workflow.selfImprove) {
3062
- inkApp.waitUntilExit().then(() => maybeRunRetrospective(filePath, workflow, logger)).catch(() => {
3063
- });
3064
- }
3065
2832
  }
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "executant",
3
- "version": "1.19.0",
3
+ "version": "1.21.0",
4
4
  "description": "Harness for YAML-defined workflows that enables stepping through Claude sessions and bash commands",
5
5
  "repository": {
6
6
  "type": "git",
@@ -1,304 +0,0 @@
1
- # ============================================================================
2
- # RETROSPECTIVE ANALYSIS PROMPT
3
- # ============================================================================
4
- # Purpose: Analyzes task execution highlights and generates improved task YAML
5
- # Used by: src/retrospective.ts runRetrospective()
6
- # Triggered when: A task completes with self_improve: true and has highlights
7
- #
8
- # Placeholders:
9
- # {{TASK_NAME}} - Name of the task that was executed
10
- # {{ORIGINAL_GOAL}} - The original goal statement (must be preserved)
11
- # {{ORIGINAL_YAML}} - Complete original task YAML for reference
12
- # {{HIGHLIGHTS}} - Aggregated highlight markdown files from execution
13
- # {{METRICS}} - Execution metrics summary (failures, retries, etc.)
14
- # ============================================================================
15
-
16
- You are analyzing the execution of an Executant task to identify improvement opportunities.
17
-
18
- # Task Information
19
-
20
- **Task Name:** {{TASK_NAME}}
21
-
22
- **Original Goal:** {{ORIGINAL_GOAL}}
23
-
24
- # Execution Metrics
25
-
26
- {{METRICS}}
27
-
28
- # Execution Highlights
29
-
30
- The following highlights were captured during execution. Each highlight represents a moment where the system encountered challenges:
31
-
32
- {{HIGHLIGHTS}}
33
-
34
- # Original Task YAML
35
-
36
- ```yaml
37
- {{ORIGINAL_YAML}}
38
- ```
39
-
40
- # Your Task
41
-
42
- Analyze the execution highlights and generate an improved version of the task YAML that addresses the problems encountered during execution.
43
-
44
- ## Analysis Guidelines
45
-
46
- ### Interpreting Judge Failures (llm_as_judge: true)
47
-
48
- Judge failures indicate that Claude's output didn't meet quality standards. Common causes:
49
-
50
- **Unclear prompts** - The step instructions were too vague
51
- - Fix: Add specific numbered sub-steps
52
- - Fix: Define clear success criteria
53
- - Fix: Specify what to check and how to verify it
54
-
55
- **Missing criteria** - The prompt didn't explain what "good" looks like
56
- - Fix: Add examples of expected output
57
- - Fix: Specify quality thresholds (test coverage %, file count, etc.)
58
- - Fix: Include validation steps
59
-
60
- **Steps too large** - One step tried to do too much
61
- - Fix: Break into smaller, focused steps
62
- - Fix: Each step should have one clear objective
63
-
64
- **Example Fix:**
65
- ```
66
- BEFORE:
67
- - name: "validate results"
68
- llm_as_judge: true
69
- prompt: "Validate the conversion results"
70
-
71
- AFTER:
72
- - name: "validate results"
73
- llm_as_judge: true
74
- prompt: |
75
- Validate the TypeScript conversion by checking:
76
- 1. Read the generated .ts file
77
- 2. Verify all functions have type annotations
78
- 3. Check that tests pass (npm test)
79
- 4. Confirm no compilation errors (tsc --noEmit)
80
-
81
- Success criteria: All 4 checks pass without errors.
82
- ```
83
-
84
- ### Interpreting Self-Healing Events (self_healing: true)
85
-
86
- Self-healing activations indicate brittle script steps that failed during execution. Common causes:
87
-
88
- **Missing dependencies** - Command not found, package not installed
89
- - Fix: Add a script step to install/check dependencies first
90
- - Fix: Use explicit paths instead of assuming commands are in PATH
91
-
92
- **Wrong assumptions** - Script assumed files/directories exist
93
- - Fix: Add checks or create directories in the script
94
- - Fix: Use `mkdir -p` instead of `mkdir`
95
- - Fix: Check file existence before operating on it
96
-
97
- **Environment issues** - PWD, env vars, or paths incorrect
98
- - Fix: Use absolute paths instead of relative
99
- - Fix: cd to correct directory in the script
100
- - Fix: Set required environment variables
101
-
102
- **Race conditions** - Script ran before previous step completed
103
- - Fix: Add wait/check logic
104
- - Fix: Combine dependent commands with && in one script step
105
-
106
- **Example Fix:**
107
- ```
108
- BEFORE:
109
- - name: "run tests"
110
- type: script
111
- self_healing: true
112
- command: npm test
113
-
114
- AFTER:
115
- - name: "install dependencies"
116
- type: script
117
- command: npm install
118
-
119
- - name: "run tests"
120
- type: script
121
- self_healing: true
122
- command: npm test
123
- ```
124
-
125
- ### Interpreting Complex Tool Sequences
126
-
127
- Complex tool sequences (3+ tools) indicate that Claude had to work hard to complete a step. Common causes:
128
-
129
- **Vague instructions** - Step didn't specify what files to operate on
130
- - Fix: List specific file paths to read/edit
131
- - Fix: Specify glob patterns for file discovery
132
- - Fix: Break discovery and operation into separate steps
133
-
134
- **Exploratory work needed** - Claude had to search to understand the codebase
135
- - Fix: Add a separate discovery/analysis step first
136
- - Fix: Provide file paths in the prompt
137
- - Fix: Include relevant code snippets in the prompt
138
-
139
- **Multi-phase operations** - One step tried to do research + implementation
140
- - Fix: Split into "research" step and "implementation" step
141
- - Fix: First step outputs findings, second step acts on them
142
-
143
- **Example Fix:**
144
- ```
145
- BEFORE:
146
- - name: "update imports"
147
- prompt: "Update all imports to use the new module structure"
148
-
149
- AFTER:
150
- - name: "analyze imports"
151
- prompt: |
152
- Search the codebase for all import statements:
153
- 1. Use grep to find all imports in src/
154
- 2. List files that import from old modules
155
- 3. Create a plan for updating each file
156
-
157
- - name: "update imports"
158
- prompt: |
159
- Update imports in the following files based on the analysis:
160
- - src/components/Button.tsx
161
- - src/utils/helpers.ts
162
- - src/services/api.ts
163
-
164
- Change: import from './old/' to import from '@/new/'
165
- ```
166
-
167
- ## Improvement Principles
168
-
169
- 1. **Preserve the original goal** - The task succeeded, so the goal is correct
170
- 2. **Fix problems shown in highlights** - Only address issues that actually occurred
171
- 3. **Be specific** - Add numbered steps, file paths, and clear criteria
172
- 4. **Break down large steps** - If a step caused many retries or complex tool sequences
173
- 5. **Add prerequisite steps** - If self-healing had to install deps or create files
174
- 6. **Keep self_improve: true** - Allow recursive improvement in future runs
175
- 7. **Document changes** - Explain what you changed and why in the changelog
176
-
177
- ## Improvement Patterns
178
-
179
- ### Pattern: Split Vague Prompt into Specific Sub-Steps
180
-
181
- When a judge fails or complex tools are needed, make the prompt more specific:
182
-
183
- ```yaml
184
- # BEFORE: Vague, requires exploration
185
- - name: "refactor authentication"
186
- llm_as_judge: true
187
- prompt: "Refactor the authentication code"
188
-
189
- # AFTER: Specific numbered steps
190
- - name: "refactor authentication"
191
- llm_as_judge: true
192
- prompt: |
193
- Refactor authentication by:
194
- 1. Reading src/auth/login.ts and src/auth/session.ts
195
- 2. Extracting common logic into src/auth/helpers.ts
196
- 3. Updating imports in both files
197
- 4. Running tests to verify: npm test src/auth/
198
-
199
- Success: Tests pass, no code duplication between login.ts and session.ts
200
- ```
201
-
202
- ### Pattern: Add Prerequisite Step
203
-
204
- When self-healing installs deps or fixes environment:
205
-
206
- ```yaml
207
- # BEFORE: Brittle, assumes deps installed
208
- steps:
209
- - name: "build"
210
- type: script
211
- self_healing: true
212
- command: npm run build
213
-
214
- # AFTER: Explicit dependency step
215
- steps:
216
- - name: "install dependencies"
217
- type: script
218
- command: npm install
219
-
220
- - name: "build"
221
- type: script
222
- command: npm run build
223
- ```
224
-
225
- ### Pattern: Split Research from Implementation
226
-
227
- When complex tool sequences suggest exploratory work:
228
-
229
- ```yaml
230
- # BEFORE: Combined research + work
231
- - name: "fix bugs"
232
- prompt: "Find and fix all bugs in the payment flow"
233
-
234
- # AFTER: Separated discovery and fixing
235
- - name: "identify payment bugs"
236
- prompt: |
237
- Analyze the payment flow for bugs:
238
- 1. Read src/payment/*.ts files
239
- 2. Check for error handling gaps
240
- 3. List files that need fixes
241
-
242
- - name: "fix payment bugs"
243
- llm_as_judge: true
244
- prompt: |
245
- Fix bugs identified in previous step:
246
- - Add error handling in src/payment/checkout.ts
247
- - Validate input in src/payment/process.ts
248
- - Update tests in src/payment/__tests__/
249
-
250
- Success: All payment tests pass
251
- ```
252
-
253
- ### Pattern: Add Explicit Success Criteria
254
-
255
- When judge fails due to unclear expectations:
256
-
257
- ```yaml
258
- # BEFORE: No clear success criteria
259
- - name: "improve test coverage"
260
- llm_as_judge: true
261
- prompt: "Improve test coverage for the API module"
262
-
263
- # AFTER: Explicit threshold and verification
264
- - name: "improve test coverage"
265
- llm_as_judge: true
266
- prompt: |
267
- Improve test coverage for src/api/ to at least 80%:
268
- 1. Run: npm test -- --coverage src/api/
269
- 2. Identify files with <80% coverage
270
- 3. Write tests for uncovered code paths
271
- 4. Re-run coverage and verify ≥80%
272
-
273
- Success criteria: Coverage report shows ≥80% for all files in src/api/
274
- ```
275
-
276
- # Output Format
277
-
278
- Respond with a single JSON object:
279
- {
280
- "improved_yaml": "<complete improved task YAML — no markdown fences, raw YAML only>",
281
- "changelog": "<markdown: Problems Identified / Changes Applied / Expected Impact>"
282
- }
283
-
284
- Output only the JSON object — no prose before or after.
285
-
286
- # Important Requirements
287
-
288
- 1. **Always preserve the original goal** - Do not change the goal statement
289
- 2. **Keep self_improve: true** - This enables recursive improvement
290
- 3. **Only fix problems shown in highlights** - Don't add unnecessary changes
291
- 4. **Be specific in improvements** - Vague fixes won't help
292
- 5. **Generate valid YAML** - The improved task must be parseable
293
- 6. **Explain all changes** - The changelog should justify each modification
294
-
295
- # Example Response
296
-
297
- ```json
298
- {
299
- "improved_yaml": "goal: \"Convert CoffeeScript to TypeScript with validation\"\nself_improve: true\n\nsteps:\n - name: \"install dependencies\"\n type: script\n command: npm install\n\n - name: \"convert to TypeScript\"\n type: script\n command: coffee2ts convert app.coffee\n\n - name: \"validate conversion\"\n llm_as_judge: true\n prompt: |\n Validate the TypeScript conversion by:\n 1. Reading app.ts and checking all functions have type annotations\n 2. Running: tsc --noEmit to check for type errors\n 3. Running: npm test to verify functionality\n\n Success criteria: No type errors, all tests pass",
300
- "changelog": "## Problems Identified\n- Judge failure in \"validate conversion\": Instructions were too vague\n- Self-healing activation: npm dependencies were missing\n\n## Changes Applied\n\n### Step 1: install dependencies (NEW)\n- Before: Not present\n- After: Added explicit npm install step\n- Rationale: Self-healing had to install deps, do it upfront\n\n### Step 3: validate conversion (MODIFIED)\n- Before: \"Validate the results\"\n- After: Specific 3-step validation with success criteria\n- Rationale: Judge failed because unclear what to validate and how\n\n## Expected Impact\n- Judge retries: 1 → 0 (clearer validation steps)\n- Self-healing activations: 1 → 0 (deps installed first)"
301
- }
302
- ```
303
-
304
- Now analyze the highlights and generate the improved task YAML with detailed changelog.