@alis-build/harness-eval 0.1.2 → 0.1.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (37) hide show
  1. package/README.md +92 -8
  2. package/dist/adapters/claude-code/index.d.ts +2 -2
  3. package/dist/adapters/claude-code/index.js +2 -1
  4. package/dist/adapters/codex/index.d.ts +68 -0
  5. package/dist/adapters/codex/index.js +3 -0
  6. package/dist/{claude-code-DZ4Vkgp6.js → claude-code-C_7hxC8z.js} +3 -245
  7. package/dist/claude-code-C_7hxC8z.js.map +1 -0
  8. package/dist/cli/bin.js +131 -151
  9. package/dist/cli/bin.js.map +1 -1
  10. package/dist/codex-0cHO2te9.js +496 -0
  11. package/dist/codex-0cHO2te9.js.map +1 -0
  12. package/dist/config/loader.d.ts +2 -2
  13. package/dist/config/loader.js +2 -2
  14. package/dist/{index-V22PrR0p.d.ts → index-DnvP1UBl.d.ts} +2 -2
  15. package/dist/index.d.ts +132 -6
  16. package/dist/index.js +6 -5
  17. package/dist/index.js.map +1 -1
  18. package/dist/loader-B1WmGGzf.d.ts +107 -0
  19. package/dist/{loader-DcI0KfRX.js → loader-DnQ6Jt0i.js} +472 -209
  20. package/dist/loader-DnQ6Jt0i.js.map +1 -0
  21. package/dist/{projections-BcX7w-f6.js → reporter-Biy-5-9M.js} +1335 -758
  22. package/dist/reporter-Biy-5-9M.js.map +1 -0
  23. package/dist/runner/suite.d.ts +1 -1
  24. package/dist/runner/suite.js +1 -1
  25. package/dist/{suite-DPJMIEbu.d.ts → suite-BEShV0by.d.ts} +2 -2
  26. package/dist/{suite-Dlzl-HI0.js → suite-BcP64nlb.js} +16 -2
  27. package/dist/{suite-Dlzl-HI0.js.map → suite-BcP64nlb.js.map} +1 -1
  28. package/dist/{types-CD3TwOtZ.d.ts → types-0QkNVyp9.d.ts} +2 -2
  29. package/dist/types-Bac8_Ixb.js +246 -0
  30. package/dist/types-Bac8_Ixb.js.map +1 -0
  31. package/dist/types-Bu8uOZZN.d.ts +77 -0
  32. package/dist/{types-B9H4IZtA.d.ts → types-C0gBkl0-.d.ts} +3 -2
  33. package/package.json +6 -2
  34. package/dist/claude-code-DZ4Vkgp6.js.map +0 -1
  35. package/dist/loader-C9yQHUPC.d.ts +0 -50
  36. package/dist/loader-DcI0KfRX.js.map +0 -1
  37. package/dist/projections-BcX7w-f6.js.map +0 -1
package/README.md CHANGED
@@ -54,10 +54,11 @@ pnpm exec harness-eval --help
54
54
 
55
55
  Suites are YAML files. Committed examples:
56
56
 
57
- - [`examples/basic.yaml`](examples/basic.yaml) — smoke test using the built-in `Read` tool on this repo's README
57
+ - [`examples/pipeline/`](examples/pipeline/) — **recommended** unified layout with inline `judge:` + `pipeline:` orchestration
58
+ - [`examples/basic.yaml`](examples/basic.yaml) — minimal smoke test using the built-in `Read` tool on this repo's README
58
59
  - [`examples/matrix.yaml`](examples/matrix.yaml) — same idea with a model matrix (sonnet vs opus)
59
60
  - [`examples/multi-file/`](examples/multi-file/) — directory layout with `suite.yaml` plus cases under `cases/`
60
- - [`examples/grading.yaml`](examples/grading.yaml) — standalone judge config for `harness-eval grade`
61
+ - [`examples/grading.yaml`](examples/grading.yaml) — standalone judge config (alternate to inline `judge:`)
61
62
 
62
63
  ```yaml
63
64
  adapter: claude-code
@@ -96,11 +97,15 @@ cases:
96
97
 
97
98
  Generic fields (`model`, `cwd`, `timeoutMs`, `env`) sit at the top level. Claude-specific options go under `claudeCode`.
98
99
 
99
- **Full suite & grading YAML reference:** [docs/suite-config.md](docs/suite-config.md) — all case/matrix fields, `reference_trajectory`, `human_ratings`, multi-file layout, and `grading.yaml` options.
100
+ **Full suite & grading YAML reference:** [docs/suite-config.md](docs/suite-config.md) — all case/matrix fields, inline `judge:` / `pipeline:`, multi-file layout, and standalone `grading.yaml`.
100
101
 
101
102
  ### 2. Run behavioral eval
102
103
 
103
104
  ```bash
105
+ # Unified pipeline (run + optional grade + envelope when pipeline: is defined)
106
+ npx @alis-build/harness-eval pipeline examples/pipeline/
107
+
108
+ # Or run harness only
104
109
  npx @alis-build/harness-eval run examples/basic.yaml --output report.json --max-concurrent 1 --format console
105
110
  ```
106
111
 
@@ -112,7 +117,14 @@ Exit code `0` = all cells passed all assertion thresholds.
112
117
 
113
118
  ### 3. Grade outcomes (optional)
114
119
 
115
- Judge model, timeout, env, and `claudeCode` flags live in a separate **`grading.yaml`** (not in the suite file). See [`examples/grading.yaml`](examples/grading.yaml).
120
+ **Unified suite:** add a top-level `judge:` block in `suite.yaml` (see [`examples/pipeline/suite.yaml`](examples/pipeline/suite.yaml)), then:
121
+
122
+ ```bash
123
+ npx @alis-build/harness-eval grade report.json --suite examples/pipeline/suite.yaml --output grading.json --max-concurrent 1 --format console
124
+ # or: npx @alis-build/harness-eval pipeline examples/pipeline/ --steps grade
125
+ ```
126
+
127
+ **Standalone grading file:** judge config in a separate **`grading.yaml`** (still supported). See [`examples/grading.yaml`](examples/grading.yaml).
116
128
 
117
129
  ```bash
118
130
  npx @alis-build/harness-eval grade report.json --config examples/grading.yaml --output grading.json --max-concurrent 1 --format console
@@ -401,6 +413,7 @@ Both layers use statistical thresholds: a case runs `repetitions` times per matr
401
413
  npx @alis-build/harness-eval run <suite.yaml> [options]
402
414
  npx @alis-build/harness-eval grade <report.json> [options]
403
415
  npx @alis-build/harness-eval envelope <report.json> [options]
416
+ npx @alis-build/harness-eval pipeline <suite.yaml|dir> [options]
404
417
  npx @alis-build/harness-eval format <report.json> [options]
405
418
  npx @alis-build/harness-eval --help
406
419
  ```
@@ -422,7 +435,7 @@ npx @alis-build/harness-eval --help
422
435
 
423
436
  ### `grade`
424
437
 
425
- Uses a standalone **`grading.yaml`** for judge model, timeout, env, and `claudeCode` flags (Option B — separate from the suite file).
438
+ Uses **`grading.yaml`** or an inline **`judge:`** block in `suite.yaml` (`--suite`).
426
439
 
427
440
  **Field reference:** [docs/suite-config.md — Grading config](docs/suite-config.md#grading-config-gradingyaml)
428
441
 
@@ -444,6 +457,7 @@ npx @alis-build/harness-eval grade report.json --config examples/grading.yaml --
444
457
  | Option | Description |
445
458
  | -------------------------------------- | ----------------------------------------------------------------- |
446
459
  | `--config <path>` | Grading YAML (`judge` block) — model, env, timeout, `claudeCode` |
460
+ | `--suite <path>` | Unified `suite.yaml` with inline `judge:` (alternative to `--config`) |
447
461
  | `--output <path>` | Write grading JSON |
448
462
  | `--expectations <path>` | Sidecar YAML/JSON if report lacks expectations |
449
463
  | `--format console\|json` | Output format |
@@ -485,6 +499,28 @@ npx @alis-build/harness-eval envelope report.json --projection instances --outpu
485
499
 
486
500
  Exit codes: `0` = envelope built and behavioral pass; `1` = built but behavioral failures; `2` = usage or file errors.
487
501
 
502
+ ### `pipeline`
503
+
504
+ Orchestrate **run → grade → envelope** from a unified `suite.yaml` when a `pipeline:` block is present. See [docs/suite-config.md — Pipeline orchestration](docs/suite-config.md#pipeline-orchestration-pipeline).
505
+
506
+ ```bash
507
+ npx @alis-build/harness-eval pipeline examples/pipeline/
508
+ npx @alis-build/harness-eval pipeline my-suite/ --steps run,grade
509
+ ```
510
+
511
+ | Option | Description |
512
+ | ------ | ----------- |
513
+ | `--steps run,grade,envelope` | Subset of configured steps (default: all configured) |
514
+ | `--output <path>` | Override `pipeline.run.output` |
515
+ | `--report <path>` | Override report input for grade/envelope |
516
+ | `--grading <path>` | Override grading input for envelope |
517
+ | `--grading-output <path>` | Override `pipeline.grade.output` |
518
+ | `--envelope-output <path>` | Override `pipeline.envelope.output` |
519
+ | `--projection envelope\|trajectory\|instances` | Envelope projection |
520
+ | `--max-concurrent <n>` | Parallel harness/judge workers |
521
+
522
+ Exit codes match the first failing step (`run`, `grade`, or `envelope`). Returns `2` when no `pipeline:` block exists.
523
+
488
524
  ### `format`
489
525
 
490
526
  Re-render an existing `report.json` without re-running the harness.
@@ -549,7 +585,7 @@ Define expected tool calls for Vertex trajectory metrics on the eval envelope. U
549
585
 
550
586
  ## Adding harness adapters
551
587
 
552
- Built-in adapters register at module load. Today only `claude-code` ships; additional harnesses (Codex, Gemini CLI, Antigravity CLI) plug in via the same pattern:
588
+ Built-in adapters register at module load. **`claude-code`** and **`codex`** ship today; additional harnesses (Gemini CLI, Antigravity CLI) plug in via the same pattern:
553
589
 
554
590
  1. Implement `HarnessAdapter` under `src/adapters/<id>/` with a `run(config)` that returns a `TrajectoryView`.
555
591
  2. Add a nested config key on `SuiteConfig` (e.g. `codex: { ... }`) for harness-specific options.
@@ -564,7 +600,7 @@ import {
564
600
  } from "@alis-build/harness-eval";
565
601
 
566
602
  registerAdapter("my-harness", myAdapter);
567
- console.log(listAdapters()); // ["claude-code", "my-harness"]
603
+ console.log(listAdapters()); // ["claude-code", "codex", "my-harness"]
568
604
  ```
569
605
 
570
606
  Duplicate registration throws so accidental overrides fail fast during startup or tests.
@@ -620,12 +656,55 @@ The adapter captures Claude’s stream-json output and builds a `TrajectoryView`
620
656
 
621
657
  ---
622
658
 
659
+ ## Codex CLI adapter
660
+
661
+ Nested under `codex` in YAML (or flat in programmatic config). Maps to [Codex CLI reference](https://developers.openai.com/codex/cli/reference) (`codex exec` flags).
662
+
663
+ The harness adapter invokes:
664
+
665
+ ```bash
666
+ codex --ask-for-approval never exec --json [exec flags…] "<prompt>"
667
+ ```
668
+
669
+ `--ask-for-approval` is a **global** flag (before `exec`); other options attach to the `exec` subcommand.
670
+
671
+ | Field | CLI flag | Notes |
672
+ | ----- | -------- | ----- |
673
+ | `binary` | — | Default `codex` |
674
+ | `model` | `--model` | Also settable at top level |
675
+ | `profile` | `--profile` | Layer `$CODEX_HOME/<profile>.config.toml` |
676
+ | `sandbox` | `--sandbox` | `read-only`, `workspace-write`, `danger-full-access` |
677
+ | `addDirs` | `--add-dir` | Extra writable dirs (repeatable) |
678
+ | `configOverrides` | `-c key=value` | Inline TOML overrides (repeatable) |
679
+ | `askForApproval` | `--ask-for-approval` | Default `never` for non-interactive eval |
680
+ | `dangerouslyBypassApprovalsAndSandbox` | `--yolo` | Hardened CI only |
681
+ | `dangerouslyBypassHookTrust` | `--dangerously-bypass-hook-trust` | Automation with vetted hooks |
682
+ | `ephemeral` | `--ephemeral` | No session rollout files |
683
+ | `ignoreUserConfig` | `--ignore-user-config` | Skip `$CODEX_HOME/config.toml` |
684
+ | `skipGitRepoCheck` | `--skip-git-repo-check` | Allow runs outside git repos |
685
+ | `outputSchema` | `--output-schema` | JSON Schema for structured final output |
686
+ | `outputLastMessage` | `--output-last-message` | Write final assistant message to file (auto temp path when `captureLastMessage` is true) |
687
+ | `captureLastMessage` | — | Default `true`: auto `--output-last-message` and read into `finalResponse` if JSONL has no assistant text |
688
+ | `isolateConfig` | — | `false` (default) = inherit `~/.codex`; `true` = temp `$CODEX_HOME` per run |
689
+
690
+ Generic `cwd` sets the child process working directory (`--cd`). MCP tool calls in Codex `--json` output map to harness names `mcp__<server>__<tool>`; shell commands map to `Bash`.
691
+
692
+ The adapter maps Codex JSONL events into the shared `StreamEvent` shape and feeds `TrajectoryBuilder`. Fixture-driven tests use committed recordings under `tests/fixtures/codex/` — CI does not require `codex` on `PATH`.
693
+
694
+ **Example suite:** [examples/codex-basic.yaml](examples/codex-basic.yaml)
695
+
696
+ **Codex judge:** set `judge.adapter: codex` and nest options under `judge.codex` in grading YAML (see [docs/suite-config.md](docs/suite-config.md)).
697
+
698
+ ---
699
+
623
700
  ## Library API
624
701
 
625
702
  ```typescript
626
703
  import {
627
704
  loadSuite,
705
+ loadSuiteDocument,
628
706
  runSuite,
707
+ runPipeline,
629
708
  gradeReport,
630
709
  buildEvalRunEnvelope,
631
710
  trajectoryToTranscript,
@@ -635,6 +714,11 @@ import {
635
714
  } from "@alis-build/harness-eval";
636
715
  import { loadGradingConfig } from "@alis-build/harness-eval/config";
637
716
 
717
+ // Unified pipeline
718
+ const doc = await loadSuiteDocument("./examples/pipeline/suite.yaml");
719
+ const { exitCode } = await runPipeline(doc, { maxConcurrent: 2 });
720
+
721
+ // Or step-by-step
638
722
  const suite = await loadSuite("./examples/basic.yaml");
639
723
  const report = await runSuite(suite, { maxConcurrent: 2 });
640
724
 
@@ -659,7 +743,7 @@ const envelope = buildEvalRunEnvelope(report, {
659
743
  });
660
744
  ```
661
745
 
662
- Subpath exports: `@alis-build/harness-eval/runner`, `@alis-build/harness-eval/config`, `@alis-build/harness-eval/adapters/claude-code`.
746
+ Subpath exports: `@alis-build/harness-eval/runner`, `@alis-build/harness-eval/config`, `@alis-build/harness-eval/adapters/claude-code`, `@alis-build/harness-eval/adapters/codex`.
663
747
 
664
748
  ---
665
749
 
@@ -1,3 +1,3 @@
1
- import { n as AdapterError, o as ParseErrorRecord, r as AdapterResult, t as AdapterDiagnostics } from "../../types-B9H4IZtA.js";
2
- import { a as ClaudeCodeAdapterResult, i as ClaudeCodeAdapterConfig, o as ClaudeCodeOptions, r as runClaudeCode, s as PermissionMode, t as claudeCodeAdapter } from "../../index-V22PrR0p.js";
1
+ import { n as AdapterError, o as ParseErrorRecord, r as AdapterResult, t as AdapterDiagnostics } from "../../types-C0gBkl0-.js";
2
+ import { a as ClaudeCodeAdapterResult, i as ClaudeCodeAdapterConfig, o as ClaudeCodeOptions, r as runClaudeCode, s as PermissionMode, t as claudeCodeAdapter } from "../../index-DnvP1UBl.js";
3
3
  export { type AdapterDiagnostics, AdapterError, type AdapterResult, type ClaudeCodeAdapterConfig, type ClaudeCodeAdapterResult, type ClaudeCodeOptions, type ParseErrorRecord, type PermissionMode, claudeCodeAdapter, runClaudeCode };
@@ -1,2 +1,3 @@
1
- import { a as AdapterError, r as runClaudeCode, t as claudeCodeAdapter } from "../../claude-code-DZ4Vkgp6.js";
1
+ import { t as AdapterError } from "../../types-Bac8_Ixb.js";
2
+ import { r as runClaudeCode, t as claudeCodeAdapter } from "../../claude-code-C_7hxC8z.js";
2
3
  export { AdapterError, claudeCodeAdapter, runClaudeCode };
@@ -0,0 +1,68 @@
1
+ import { a as HarnessAdapter, n as AdapterError, o as ParseErrorRecord, r as AdapterResult, t as AdapterDiagnostics, x as StreamEvent } from "../../types-C0gBkl0-.js";
2
+ import { i as CodexOptions, n as CodexAdapterResult, r as CodexJsonEvent, t as CodexAdapterConfig } from "../../types-Bu8uOZZN.js";
3
+
4
+ //#region src/adapters/codex/map-events.d.ts
5
+ /** Stateful mapper — tracks session id and pending tool calls across the stream. */
6
+ declare class CodexEventMapper {
7
+ private sessionId;
8
+ private sawInit;
9
+ private startedItems;
10
+ private turnCount;
11
+ /** Map one parsed Codex JSON object to zero or more stream events. */
12
+ map(event: CodexJsonEvent): StreamEvent[];
13
+ private buildInit;
14
+ private ensureInit;
15
+ private mapItemStarted;
16
+ private mapItemCompleted;
17
+ private toolUseEvent;
18
+ private commandUseEvent;
19
+ private toolResultEvent;
20
+ private buildResult;
21
+ }
22
+ /** Map an entire fixture or stream of Codex events through a fresh mapper. */
23
+ declare function mapCodexEvents(events: CodexJsonEvent[]): StreamEvent[];
24
+ /** Build harness-qualified MCP tool name from Codex server + tool fields. */
25
+ declare function mcpToolName(server: string, tool: string): string;
26
+ //#endregion
27
+ //#region src/adapters/codex/flags.d.ts
28
+ /** Prepend global flags that must appear before the `exec` subcommand. */
29
+ declare function appendGlobalCodexFlags(args: string[], config: CodexOptions): void;
30
+ /** Append `codex exec` subcommand flags (after `exec`, before prompt). */
31
+ declare function appendExecCodexFlags(args: string[], config: CodexOptions & {
32
+ model?: string;
33
+ cwd?: string;
34
+ }): void;
35
+ /** @deprecated Use appendGlobalCodexFlags + appendExecCodexFlags */
36
+ declare function appendCodexFlags(args: string[], config: CodexOptions & {
37
+ model?: string;
38
+ cwd?: string;
39
+ }): void;
40
+ /**
41
+ * Ensure harness runs pass `--output-last-message` when capture is enabled.
42
+ * Returns the auto-generated path (for cleanup), or null if unchanged.
43
+ */
44
+ declare function ensureHarnessOutputLastMessage(config: CodexAdapterConfig): string | null;
45
+ /**
46
+ * Build argv for `codex --ask-for-approval never exec --json … "<prompt>"`.
47
+ *
48
+ * Expects `config.outputLastMessage` to already be set if capture is desired;
49
+ * call {@link ensureHarnessOutputLastMessage} before this if spawning outside
50
+ * of {@link spawnCodex}.
51
+ */
52
+ declare function buildArgs(config: CodexAdapterConfig): string[];
53
+ /**
54
+ * Build argv for `codex --ask-for-approval never exec … "<prompt>"` (no `--json`).
55
+ */
56
+ declare function buildJudgeArgs(prompt: string, config?: CodexOptions & {
57
+ model?: string;
58
+ cwd?: string;
59
+ }): string[];
60
+ //#endregion
61
+ //#region src/adapters/codex/index.d.ts
62
+ /** Run Codex in headless `exec --json` mode and return a trajectory. */
63
+ declare function runCodex(config: CodexAdapterConfig): Promise<CodexAdapterResult>;
64
+ /** Registered {@link HarnessAdapter} for Codex CLI headless runs. */
65
+ declare const codexAdapter: HarnessAdapter<CodexAdapterConfig>;
66
+ //#endregion
67
+ export { type AdapterDiagnostics, AdapterError, type AdapterResult, type CodexAdapterConfig, type CodexAdapterResult, CodexEventMapper, type CodexOptions, type ParseErrorRecord, appendCodexFlags, appendExecCodexFlags, appendGlobalCodexFlags, buildArgs, buildJudgeArgs, codexAdapter, ensureHarnessOutputLastMessage, mapCodexEvents, mcpToolName, runCodex };
68
+ //# sourceMappingURL=index.d.ts.map
@@ -0,0 +1,3 @@
1
+ import { t as AdapterError } from "../../types-Bac8_Ixb.js";
2
+ import { a as appendGlobalCodexFlags, c as ensureHarnessOutputLastMessage, d as mcpToolName, i as appendExecCodexFlags, l as CodexEventMapper, n as runCodex, o as buildArgs, r as appendCodexFlags, s as buildJudgeArgs, t as codexAdapter, u as mapCodexEvents } from "../../codex-0cHO2te9.js";
3
+ export { AdapterError, CodexEventMapper, appendCodexFlags, appendExecCodexFlags, appendGlobalCodexFlags, buildArgs, buildJudgeArgs, codexAdapter, ensureHarnessOutputLastMessage, mapCodexEvents, mcpToolName, runCodex };
@@ -1,235 +1,9 @@
1
1
  import { t as __exportAll } from "./rolldown-runtime-D7D4PA-g.js";
2
+ import { n as TrajectoryBuilder, t as AdapterError } from "./types-Bac8_Ixb.js";
2
3
  import { spawn } from "node:child_process";
3
4
  import { mkdtemp, rm } from "node:fs/promises";
4
5
  import { tmpdir } from "node:os";
5
6
  import { join } from "node:path";
6
- //#region src/types/stream.ts
7
- /** Type guards. Prefer these over manual `e.type === "..."` checks at call sites. */
8
- function isSystemInit(e) {
9
- return e.type === "system" && e.subtype === "init";
10
- }
11
- function isSystemRetry(e) {
12
- return e.type === "system" && e.subtype === "api_retry";
13
- }
14
- function isAssistantMessage(e) {
15
- return e.type === "assistant";
16
- }
17
- function isUserMessage(e) {
18
- return e.type === "user";
19
- }
20
- function isResult(e) {
21
- return e.type === "result";
22
- }
23
- function isTextBlock(b) {
24
- return b.type === "text";
25
- }
26
- function isToolUseBlock(b) {
27
- return b.type === "tool_use";
28
- }
29
- function isToolResultBlock(b) {
30
- return b.type === "tool_result";
31
- }
32
- //#endregion
33
- //#region src/types/trajectory.ts
34
- /**
35
- * Extract the MCP namespace prefix from a tool name.
36
- *
37
- * Claude Code formats MCP tool names as `mcp__<server>__<tool>`. The namespace
38
- * is the first two segments joined: `mcp__<server>`. Returns null for non-MCP
39
- * tool names (built-ins like `Bash`, `Read`, `Edit`).
40
- *
41
- * @example
42
- * namespaceOf("mcp__api__search_skills") // "mcp__api"
43
- * namespaceOf("Bash") // null
44
- */
45
- function namespaceOf(toolName) {
46
- if (!toolName.startsWith("mcp__")) return null;
47
- const parts = toolName.split("__");
48
- if (parts.length < 3) return null;
49
- return `${parts[0]}__${parts[1]}`;
50
- }
51
- //#endregion
52
- //#region src/trajectory/builder.ts
53
- /**
54
- * TrajectoryBuilder — consumes a stream of {@link StreamEvent} values and
55
- * produces a {@link TrajectoryView}.
56
- *
57
- * State machine: the builder is a small, tolerant state machine. Invariants:
58
- *
59
- * - Exactly one `system/init` event opens the session. The builder requires
60
- * it to be present before `build()`.
61
- * - Each `assistant` event begins a new turn. Text blocks accumulate into
62
- * the turn's text; `tool_use` blocks become `ToolCall` records.
63
- * - `user` events with `tool_result` blocks deliver tool results back. We
64
- * match them to pending calls by `tool_use_id`.
65
- * - One `result` event closes the session and carries aggregate usage.
66
- *
67
- * The builder is *tolerant of partial streams*: a process killed mid-run
68
- * produces a coherent (but flagged) view. Tool calls without matching results
69
- * keep `result: null`. The `success` flag reflects whether a successful result
70
- * event was actually observed.
71
- *
72
- * Why a class (not a reducer)?
73
- * The internal `pendingCalls` map is mutable by design — we modify ToolCall
74
- * objects in place when results arrive, so other parts of the view (which
75
- * hold references to the same objects) see the update for free. A reducer
76
- * would force a deep copy per result event, which is wasteful and would
77
- * complicate identity-based queries.
78
- */
79
- var TrajectoryBuilder = class {
80
- meta = null;
81
- sessionStartTs = null;
82
- turns = [];
83
- allToolCalls = [];
84
- /**
85
- * tool_use_id → ToolCall, for matching results back to calls.
86
- * Entries are removed once a result is observed.
87
- */
88
- pendingCalls = /* @__PURE__ */ new Map();
89
- retries = [];
90
- finalUsage = null;
91
- finalCostUsd = 0;
92
- finalDurationMs = 0;
93
- finalNumTurns = 0;
94
- finalResultText = "";
95
- sawResultEvent = false;
96
- resultIsError = false;
97
- /**
98
- * Consume one event. Safe to call with events in stream order.
99
- *
100
- * Unknown event types are silently ignored — the schema evolves and we
101
- * don't want CI to break on a new event type we haven't modelled.
102
- */
103
- consume(event) {
104
- if (isSystemInit(event)) {
105
- this.meta = {
106
- sessionId: event.session_id,
107
- model: event.model,
108
- cwd: event.cwd,
109
- permissionMode: event.permissionMode,
110
- availableTools: event.tools ?? [],
111
- mcpServers: (event.mcp_servers ?? []).map((s) => ({
112
- name: s.name,
113
- status: s.status
114
- }))
115
- };
116
- this.sessionStartTs = Date.now();
117
- return;
118
- }
119
- if (event.type === "system" && event.subtype === "api_retry") {
120
- this.retries.push({
121
- offsetMs: this.sessionStartTs ? Date.now() - this.sessionStartTs : 0,
122
- raw: event
123
- });
124
- return;
125
- }
126
- if (isAssistantMessage(event)) {
127
- this.handleAssistantMessage(event);
128
- return;
129
- }
130
- if (isUserMessage(event)) {
131
- this.handleUserMessage(event);
132
- return;
133
- }
134
- if (isResult(event)) {
135
- this.sawResultEvent = true;
136
- this.resultIsError = event.is_error;
137
- this.finalUsage = event.usage ?? null;
138
- this.finalCostUsd = event.total_cost_usd ?? 0;
139
- this.finalDurationMs = event.duration_ms ?? 0;
140
- this.finalNumTurns = event.num_turns ?? 0;
141
- this.finalResultText = event.result ?? "";
142
- return;
143
- }
144
- }
145
- /**
146
- * Finalize the view. Call after consuming the last event from the stream.
147
- *
148
- * Throws if no `system/init` was observed — at that point we have no model,
149
- * no session id, and no available-tools list, which means assertions like
150
- * "called any mcp__api__* tool" can't even be evaluated meaningfully.
151
- */
152
- build() {
153
- if (this.meta === null) throw new Error("TrajectoryBuilder.build() called before any system/init event was observed. The harness may have failed to start, or the stream was truncated before init.");
154
- const lastTurn = this.turns[this.turns.length - 1];
155
- const accumulatedText = this.turns.map((t) => t.text).filter((t) => t.length > 0).join("\n\n").trim();
156
- return {
157
- meta: this.meta,
158
- toolCalls: this.allToolCalls,
159
- turns: this.turns,
160
- finalResponse: accumulatedText || this.finalResultText,
161
- finalStopReason: lastTurn?.stopReason ?? null,
162
- usage: {
163
- inputTokens: this.finalUsage?.input_tokens ?? 0,
164
- outputTokens: this.finalUsage?.output_tokens ?? 0,
165
- totalCostUsd: this.finalCostUsd,
166
- durationMs: this.finalDurationMs,
167
- numTurns: this.finalNumTurns || this.turns.length
168
- },
169
- retries: this.retries,
170
- success: this.sawResultEvent && !this.resultIsError
171
- };
172
- }
173
- handleAssistantMessage(event) {
174
- const turnIndex = this.turns.length;
175
- const textChunks = [];
176
- const toolCallsThisTurn = [];
177
- for (const block of event.message.content) {
178
- if (isTextBlock(block)) {
179
- textChunks.push(block.text);
180
- continue;
181
- }
182
- if (isToolUseBlock(block)) {
183
- const call = {
184
- name: block.name,
185
- namespace: namespaceOf(block.name),
186
- callId: block.id,
187
- args: block.input,
188
- result: null,
189
- isError: false,
190
- turnIndex,
191
- callIndex: this.allToolCalls.length
192
- };
193
- this.allToolCalls.push(call);
194
- this.pendingCalls.set(block.id, call);
195
- toolCallsThisTurn.push(call);
196
- continue;
197
- }
198
- }
199
- this.turns.push({
200
- turnIndex,
201
- text: textChunks.join("").trim(),
202
- toolCalls: toolCallsThisTurn,
203
- stopReason: event.message.stop_reason ?? null
204
- });
205
- }
206
- handleUserMessage(event) {
207
- const content = event.message.content;
208
- if (typeof content === "string") return;
209
- for (const block of content) {
210
- if (!isToolResultBlock(block)) continue;
211
- const call = this.pendingCalls.get(block.tool_use_id);
212
- if (!call) continue;
213
- call.result = block.content;
214
- call.isError = block.is_error ?? false;
215
- this.pendingCalls.delete(block.tool_use_id);
216
- }
217
- }
218
- };
219
- /**
220
- * Convenience: drain an async iterable of events through a fresh builder.
221
- *
222
- * Suitable when you have the full event stream and just want the view.
223
- * For interactive/incremental scenarios (e.g. surfacing partial state in a UI)
224
- * instantiate {@link TrajectoryBuilder} directly and call `consume()` /
225
- * `build()` yourself.
226
- */
227
- async function buildTrajectory(events) {
228
- const builder = new TrajectoryBuilder();
229
- for await (const event of events) builder.consume(event);
230
- return builder.build();
231
- }
232
- //#endregion
233
7
  //#region src/parsers/stream-json.ts
234
8
  /**
235
9
  * Parse a readable stream of NDJSON into a sequence of typed stream-json events.
@@ -281,22 +55,6 @@ function tryParseLine(line) {
281
55
  }
282
56
  }
283
57
  //#endregion
284
- //#region src/adapters/types.ts
285
- /**
286
- * Thrown when the harness fails to produce a usable trajectory.
287
- *
288
- * Most commonly this means the process failed before emitting a usable
289
- * session init event. Inspect `diagnostics.stderr` for the cause.
290
- */
291
- var AdapterError = class extends Error {
292
- diagnostics;
293
- constructor(message, diagnostics) {
294
- super(message);
295
- this.diagnostics = diagnostics;
296
- this.name = "AdapterError";
297
- }
298
- };
299
- //#endregion
300
58
  //#region src/adapters/claude-code/flags.ts
301
59
  /** Append repeated `--flag value` pairs for array config fields. */
302
60
  function pushRepeatableFlag(args, flag, values) {
@@ -587,6 +345,6 @@ const claudeCodeAdapter = {
587
345
  run: runClaudeCode
588
346
  };
589
347
  //#endregion
590
- export { isUserMessage as _, AdapterError as a, buildTrajectory as c, isResult as d, isSystemInit as f, isToolUseBlock as g, isToolResultBlock as h, buildJudgeArgs as i, namespaceOf as l, isTextBlock as m, claude_code_exports as n, parseStreamJson as o, isSystemRetry as p, runClaudeCode as r, TrajectoryBuilder as s, claudeCodeAdapter as t, isAssistantMessage as u };
348
+ export { parseStreamJson as a, buildJudgeArgs as i, claude_code_exports as n, runClaudeCode as r, claudeCodeAdapter as t };
591
349
 
592
- //# sourceMappingURL=claude-code-DZ4Vkgp6.js.map
350
+ //# sourceMappingURL=claude-code-C_7hxC8z.js.map
@@ -0,0 +1 @@
1
+ {"version":3,"file":"claude-code-C_7hxC8z.js","names":[],"sources":["../src/parsers/stream-json.ts","../src/adapters/claude-code/flags.ts","../src/adapters/claude-code/process.ts","../src/adapters/claude-code/index.ts"],"sourcesContent":["/**\n * Line-buffered NDJSON parser for Claude Code's `--output-format stream-json`.\n *\n * Claude Code emits one JSON object per line on stdout. The parser:\n * - buffers across chunk boundaries (a single JSON line may arrive in two reads)\n * - skips empty lines (defensive — shouldn't occur, but harmless if it does)\n * - emits a discriminated `ParseResult` per line so callers can decide whether\n * a malformed line should abort the run or just be logged.\n *\n * Why a generator (and not a Transform stream)?\n * The eval adapter consumes events sequentially and synchronously updates a\n * builder. Async iteration is the simplest interface for that pattern and\n * composes cleanly with `for await` in the adapter. A Transform would force\n * the builder into event-handler style.\n */\n\nimport type { Readable } from \"node:stream\";\nimport type { StreamEvent } from \"../types/stream\";\n\n/**\n * Result of attempting to parse a single line.\n *\n * Successful parses yield `{ ok: true }` with the typed event and the raw line\n * (kept for diagnostics and OTel `events.attributes.raw`). Failed parses yield\n * `{ ok: false }` with the parse error and the raw line — callers can log,\n * skip, or fail the run as they see fit.\n */\nexport type ParseResult =\n | { ok: true; event: StreamEvent; rawLine: string }\n | { ok: false; error: Error; rawLine: string };\n\n/**\n * Parse a readable stream of NDJSON into a sequence of typed stream-json events.\n *\n * @example\n * const child = spawn(\"claude\", [\"-p\", prompt, \"--output-format\", \"stream-json\", \"--verbose\"]);\n * for await (const result of parseStreamJson(child.stdout)) {\n * if (result.ok) builder.consume(result.event);\n * else console.warn(\"malformed stream line:\", result.rawLine, result.error);\n * }\n */\nexport async function* parseStreamJson(\n stream: Readable,\n): AsyncGenerator<ParseResult, void, void> {\n let buffer = \"\";\n // The Node child_process stdout is a binary stream by default. Setting the\n // encoding here means `for await (const chunk of stream)` yields strings.\n stream.setEncoding(\"utf8\");\n\n for await (const chunk of stream) {\n buffer += chunk as string;\n\n // Drain every complete line currently in the buffer before reading more.\n // Multiple JSON objects can arrive in one chunk (e.g. when the harness\n // emits a burst of events at session start).\n let newlineIdx: number;\n while ((newlineIdx = buffer.indexOf(\"\\n\")) !== -1) {\n const line = buffer.slice(0, newlineIdx).trim();\n buffer = buffer.slice(newlineIdx + 1);\n if (line.length === 0) continue;\n yield tryParseLine(line);\n }\n }\n\n // Flush any trailing content that arrived without a final newline. Stream-json\n // typically ends with a newline-terminated `result` event, but a killed\n // process may not flush, so we still try to emit what we have.\n const trailing = buffer.trim();\n if (trailing.length > 0) {\n yield tryParseLine(trailing);\n }\n}\n\n/**\n * Parse a single line. Extracted as a helper so the generator stays readable.\n *\n * Note: we do not validate the event structure beyond `JSON.parse`. Runtime\n * validation (e.g. zod) is overkill here — the schema is stable enough at\n * runtime, and the TrajectoryBuilder is tolerant of missing fields. Adding\n * validation would be premature.\n */\nfunction tryParseLine(line: string): ParseResult {\n try {\n const event = JSON.parse(line) as StreamEvent;\n return { ok: true, event, rawLine: line };\n } catch (err) {\n return {\n ok: false,\n error: err instanceof Error ? err : new Error(String(err)),\n rawLine: line,\n };\n }\n}\n","/**\n * Build CLI args for Claude Code judge subprocesses (JSON output, not stream-json).\n *\n * Shared flag assembly for harness runs (`buildArgs`) and LLM grading judges\n * (`buildJudgeArgs`).\n */\n\nimport type { ClaudeCodeAdapterConfig, ClaudeCodeOptions } from \"./types\";\n\n/** Append repeated `--flag value` pairs for array config fields. */\nfunction pushRepeatableFlag(args: string[], flag: string, values?: string[]): void {\n if (!values) return;\n for (const value of values) {\n args.push(flag, value);\n }\n}\n\n/**\n * Append an optional CLI flag. Boolean `true` emits the flag alone; other\n * scalars emit `--flag value`.\n */\nfunction pushOptionalFlag(\n args: string[],\n flag: string,\n value: string | number | boolean | undefined,\n): void {\n if (value === undefined) return;\n if (typeof value === \"boolean\") {\n if (value) args.push(flag);\n return;\n }\n args.push(flag, String(value));\n}\n\n/** Append Claude Code CLI flags shared by harness runs and grading judges. */\nexport function appendClaudeCodeFlags(\n args: string[],\n config: ClaudeCodeOptions & { model?: string },\n): void {\n pushRepeatableFlag(args, \"--plugin-dir\", config.pluginDirs);\n pushRepeatableFlag(args, \"--plugin-url\", config.pluginUrls);\n pushRepeatableFlag(args, \"--add-dir\", config.addDirs);\n\n pushOptionalFlag(args, \"--mcp-config\", config.mcpConfig);\n pushOptionalFlag(args, \"--model\", config.model);\n pushOptionalFlag(args, \"--permission-mode\", config.permissionMode);\n pushOptionalFlag(args, \"--effort\", config.effort);\n pushOptionalFlag(args, \"--agent\", config.agent);\n pushOptionalFlag(args, \"--fallback-model\", config.fallbackModel);\n pushOptionalFlag(args, \"--tools\", config.tools);\n pushOptionalFlag(args, \"--settings\", config.settings);\n pushOptionalFlag(args, \"--setting-sources\", config.settingSources);\n pushOptionalFlag(args, \"--max-turns\", config.maxTurns);\n pushOptionalFlag(args, \"--max-budget-usd\", config.maxBudgetUsd);\n pushOptionalFlag(args, \"--system-prompt\", config.systemPrompt);\n pushOptionalFlag(args, \"--system-prompt-file\", config.systemPromptFile);\n pushOptionalFlag(args, \"--append-system-prompt\", config.appendSystemPrompt);\n pushOptionalFlag(\n args,\n \"--append-system-prompt-file\",\n config.appendSystemPromptFile,\n );\n pushOptionalFlag(args, \"--debug\", config.debug);\n pushOptionalFlag(args, \"--debug-file\", config.debugFile);\n\n if (config.allowedTools && config.allowedTools.length > 0) {\n args.push(\"--allowedTools\", config.allowedTools.join(\",\"));\n }\n\n if (config.disallowedTools && config.disallowedTools.length > 0) {\n args.push(\"--disallowedTools\", config.disallowedTools.join(\",\"));\n }\n\n pushOptionalFlag(args, \"--strict-mcp-config\", config.strictMcpConfig);\n pushOptionalFlag(args, \"--include-hook-events\", config.includeHookEvents);\n pushOptionalFlag(args, \"--no-session-persistence\", config.noSessionPersistence);\n pushOptionalFlag(args, \"--disable-slash-commands\", config.disableSlashCommands);\n pushOptionalFlag(args, \"--bare\", config.bare);\n pushOptionalFlag(args, \"--safe-mode\", config.safeMode);\n pushOptionalFlag(\n args,\n \"--allow-dangerously-skip-permissions\",\n config.allowDangerouslySkipPermissions,\n );\n pushOptionalFlag(\n args,\n \"--dangerously-skip-permissions\",\n config.dangerouslySkipPermissions,\n );\n}\n\n/**\n * Build the argument vector for spawning `claude`.\n *\n * Order matters only for flags that take values — value flags must come\n * after their flag name. Everything else is order-independent.\n */\nexport function buildArgs(config: ClaudeCodeAdapterConfig): string[] {\n const args: string[] = [\n \"-p\",\n config.prompt,\n \"--output-format\",\n \"stream-json\",\n \"--verbose\",\n ];\n\n appendClaudeCodeFlags(args, config);\n\n return args;\n}\n\n/**\n * Build args for an LLM judge subprocess (`--output-format json`).\n *\n * Defaults permission mode to `bypassPermissions` so the judge does not\n * block on tool permission prompts during single-shot JSON grading.\n */\nexport function buildJudgeArgs(\n prompt: string,\n config: ClaudeCodeOptions & { model?: string } = {},\n): string[] {\n const args: string[] = [\"-p\", prompt, \"--output-format\", \"json\"];\n const permissionMode = config.permissionMode ?? \"bypassPermissions\";\n appendClaudeCodeFlags(args, {\n ...config,\n permissionMode,\n });\n return args;\n}\n","/**\n * Process management for the Claude Code adapter.\n *\n * This module owns spawning, timeout, abort signal handling, and process-tree\n * teardown. The orchestrator (`index.ts`) consumes the returned handle —\n * reading stdout and waiting for completion — but doesn't worry about how\n * the process gets killed or how its config gets isolated.\n *\n * Why a separate module? Process management is the one part of the adapter\n * with real I/O complexity (process groups, signal escalation, temp-dir\n * lifecycle, env merging). Isolating it makes the orchestrator easy to read\n * and lets us swap the spawning logic if we later need to, e.g., wrap claude\n * in a sandbox runner.\n */\n\nimport { spawn, type ChildProcess } from \"node:child_process\";\nimport { mkdtemp, rm } from \"node:fs/promises\";\nimport { tmpdir } from \"node:os\";\nimport { join } from \"node:path\";\nimport type { Readable } from \"node:stream\";\n\nimport { buildArgs } from \"./flags\";\nimport type { ClaudeCodeAdapterConfig } from \"./types\";\n\n/** Default hard timeout per run. Tunable via config.timeoutMs. */\nconst DEFAULT_TIMEOUT_MS = 5 * 60 * 1000;\n\n/**\n * Grace period between SIGTERM and SIGKILL. Most processes shut down cleanly\n * within a few seconds; this gives them that chance while preventing CI from\n * hanging indefinitely on a stuck child.\n */\nconst KILL_GRACE_MS = 5_000;\n\n/**\n * Handle to a spawned `claude` process. The orchestrator drives it:\n * - Read `stdout` (typically via parseStreamJson).\n * - Await `done` to learn the exit state.\n * - Await `stderrCollected` for diagnostic stderr.\n * - Check `timedOut()` after exit to distinguish kill-by-timeout from\n * normal termination.\n * - Call `cleanup()` after all of the above to remove the temp config dir.\n */\nexport interface SpawnedClaude {\n stdout: Readable;\n done: Promise<{ exitCode: number | null; signal: NodeJS.Signals | null }>;\n stderrCollected: Promise<string>;\n timedOut: () => boolean;\n cleanup: () => Promise<void>;\n}\n\n/**\n * Spawn `claude` in headless mode with isolated config and a process-group\n * lifecycle. See {@link SpawnedClaude} for how to consume the result.\n *\n * **Kill sequence:** timeout and abort both follow the same two-step path:\n * `SIGTERM` to the process group, then `SIGKILL` after {@link KILL_GRACE_MS}\n * if the group is still alive. This avoids leaving MCP/tool subprocesses\n * running while still giving claude a chance to flush stream-json output.\n *\n * @param config - Adapter options; `timeoutMs`, `signal`, and `isolateConfig`\n * control lifecycle and config isolation.\n */\nexport async function spawnClaude(\n config: ClaudeCodeAdapterConfig,\n): Promise<SpawnedClaude> {\n const binary = config.binary ?? \"claude\";\n const args = buildArgs(config);\n\n const isolateConfig = config.isolateConfig !== false;\n\n // Isolated runs use a fresh temp dir so plugins/settings don't leak between\n // reps. Non-isolated runs inherit the caller's Claude login and plugins.\n const tempConfigDir = isolateConfig\n ? await mkdtemp(join(tmpdir(), \"harness-eval-\"))\n : null;\n\n const env: Record<string, string | undefined> = {\n ...process.env,\n ...config.env,\n };\n if (tempConfigDir) {\n // Override after ...env so callers can't accidentally un-isolate.\n env.CLAUDE_CONFIG_DIR = tempConfigDir;\n }\n\n const child = spawn(binary, args, {\n cwd: config.cwd ?? process.cwd(),\n env,\n stdio: [\"ignore\", \"pipe\", \"pipe\"],\n // detached: true means the child becomes the leader of its own process\n // group. We exploit this to kill the entire group (including any MCP\n // server subprocesses and tool processes) on timeout/abort.\n detached: true,\n });\n\n\n // `timedOut` is set only by the hard timeout timer, not by abort — callers\n // use it to distinguish \"ran too long\" from user cancellation or normal exit.\n let timedOut = false;\n let killEscalation: NodeJS.Timeout | null = null;\n const timeoutMs = config.timeoutMs ?? DEFAULT_TIMEOUT_MS;\n\n /**\n * Arm (or re-arm) the SIGKILL fallback. Each SIGTERM attempt gets its own\n * grace window so a slow shutdown doesn't leave orphaned MCP servers.\n */\n const scheduleKillEscalation = () => {\n if (killEscalation) clearTimeout(killEscalation);\n killEscalation = setTimeout(\n () => killTree(child, \"SIGKILL\"),\n KILL_GRACE_MS,\n );\n };\n\n const timeoutTimer = setTimeout(() => {\n timedOut = true;\n killTree(child, \"SIGTERM\");\n scheduleKillEscalation();\n }, timeoutMs);\n\n // AbortSignal cancellation mirrors timeout kills but does not flip `timedOut`.\n const onAbort = () => {\n killTree(child, \"SIGTERM\");\n scheduleKillEscalation();\n };\n config.signal?.addEventListener(\"abort\", onAbort, { once: true });\n\n\n // Drain stderr eagerly so the OS-level buffer never fills and stalls the\n // child (Node child processes will block on a full pipe).\n const stderrChunks: string[] = [];\n child.stderr?.setEncoding(\"utf8\");\n child.stderr?.on(\"data\", (chunk: string) => {\n stderrChunks.push(chunk);\n });\n\n const stderrCollected = new Promise<string>((resolve) => {\n const finalize = () => resolve(stderrChunks.join(\"\"));\n child.stderr?.on(\"end\", finalize);\n // Errors during stderr capture shouldn't fail the whole run; we just\n // return what we've buffered so far.\n child.stderr?.on(\"error\", finalize);\n });\n\n\n // Resolve once the process exits or fails to spawn. Guard against double\n // settlement because both `close` and `error` can fire in edge cases.\n const done = new Promise<{\n exitCode: number | null;\n signal: NodeJS.Signals | null;\n }>((resolve) => {\n let settled = false;\n const finalize = (\n exitCode: number | null,\n signal: NodeJS.Signals | null,\n ) => {\n if (settled) return;\n settled = true;\n // Tear down timers/listeners so a late timeout cannot SIGKILL a reused PID.\n clearTimeout(timeoutTimer);\n if (killEscalation) clearTimeout(killEscalation);\n config.signal?.removeEventListener(\"abort\", onAbort);\n resolve({ exitCode, signal });\n };\n\n child.on(\"close\", (code, signal) => finalize(code, signal));\n // ENOENT and other spawn failures emit `error` — `close` may not follow.\n child.on(\"error\", () => finalize(null, null));\n });\n\n\n const cleanup = async () => {\n if (!tempConfigDir) return;\n try {\n await rm(tempConfigDir, { recursive: true, force: true });\n } catch {\n // Best-effort. A leftover temp dir is annoying but not catastrophic;\n // we don't want to fail the run for it.\n }\n };\n\n // stdout is guaranteed non-null because we passed `stdio: [..., \"pipe\", ...]`.\n // The `!` is safe; the alternative would be a redundant runtime check that\n // could never fire.\n return {\n stdout: child.stdout!,\n done,\n stderrCollected,\n timedOut: () => timedOut,\n cleanup,\n };\n}\n\n/**\n * Kill the child's process group, then fall back to the bare PID if the\n * group is already gone. This catches MCP server subprocesses and tool\n * processes spawned by claude.\n *\n * **Signal escalation:** callers typically invoke this first with `SIGTERM`,\n * then again with `SIGKILL` after {@link KILL_GRACE_MS}. The group kill is\n * essential — a bare `child.kill()` would leave MCP servers running.\n *\n * **Platform edge case:** when the group leader exits first, `kill(-pid)`\n * throws `ESRCH`. The single-PID fallback covers that without failing the\n * adapter run.\n *\n * @param child - Spawned process handle from {@link spawn}.\n * @param signal - POSIX signal to deliver (`SIGTERM` or `SIGKILL` in practice).\n */\nfunction killTree(child: ChildProcess, signal: NodeJS.Signals): void {\n if (child.pid === undefined) return;\n try {\n // Negative PID targets the entire process group (requires detached spawn).\n process.kill(-child.pid, signal);\n } catch {\n try {\n // Group already reaped — try the leader PID directly.\n child.kill(signal);\n } catch {\n // Process fully gone; nothing to do.\n }\n }\n}\n","/**\n * Claude Code adapter — public API.\n */\n\nimport { parseStreamJson } from \"../../parsers/stream-json\";\nimport { TrajectoryBuilder } from \"../../trajectory/builder\";\nimport type { StreamEvent } from \"../../types/stream\";\n\nimport { AdapterError } from \"../types\";\nimport { spawnClaude } from \"./process\";\nimport type {\n AdapterDiagnostics,\n ClaudeCodeAdapterConfig,\n ClaudeCodeAdapterResult,\n ParseErrorRecord,\n} from \"./types\";\nimport type { HarnessAdapter } from \"../types\";\n\nexport { AdapterError } from \"../types\";\nexport type {\n AdapterDiagnostics,\n AdapterResult,\n ClaudeCodeAdapterConfig,\n ClaudeCodeAdapterResult,\n ClaudeCodeOptions,\n ParseErrorRecord,\n PermissionMode,\n} from \"./types\";\n\n/**\n * Run Claude Code in headless mode and return a trajectory.\n */\nexport async function runClaudeCode(\n config: ClaudeCodeAdapterConfig,\n): Promise<ClaudeCodeAdapterResult> {\n const startTs = Date.now();\n const spawned = await spawnClaude(config);\n\n const builder = new TrajectoryBuilder();\n const rawEvents: StreamEvent[] = [];\n const parseErrors: ParseErrorRecord[] = [];\n\n try {\n for await (const result of parseStreamJson(spawned.stdout)) {\n if (result.ok) {\n builder.consume(result.event);\n rawEvents.push(result.event);\n } else {\n parseErrors.push({\n line: result.rawLine,\n error: result.error.message,\n });\n }\n }\n\n const [{ exitCode, signal }, stderr] = await Promise.all([\n spawned.done,\n spawned.stderrCollected,\n ]);\n\n const diagnostics: AdapterDiagnostics = {\n exitCode,\n signal,\n stderr,\n parseErrors,\n timedOut: spawned.timedOut(),\n durationMs: Date.now() - startTs,\n };\n\n let view;\n try {\n view = builder.build();\n } catch (err) {\n const message = err instanceof Error ? err.message : String(err);\n throw new AdapterError(\n `harness produced no usable trajectory: ${message}`,\n diagnostics,\n );\n }\n\n return { view, diagnostics, rawEvents };\n } finally {\n await spawned.cleanup();\n }\n}\n\n/** Registered {@link HarnessAdapter} for Claude Code headless runs. */\nexport const claudeCodeAdapter: HarnessAdapter<ClaudeCodeAdapterConfig> = {\n id: \"claude-code\",\n run: runClaudeCode,\n};\n"],"mappings":";;;;;;;;;;;;;;;;;AAyCA,gBAAuB,gBACrB,QACyC;CACzC,IAAI,SAAS;CAGb,OAAO,YAAY,MAAM;CAEzB,WAAW,MAAM,SAAS,QAAQ;EAChC,UAAU;EAKV,IAAI;EACJ,QAAQ,aAAa,OAAO,QAAQ,IAAI,OAAO,IAAI;GACjD,MAAM,OAAO,OAAO,MAAM,GAAG,UAAU,CAAC,CAAC,KAAK;GAC9C,SAAS,OAAO,MAAM,aAAa,CAAC;GACpC,IAAI,KAAK,WAAW,GAAG;GACvB,MAAM,aAAa,IAAI;EACzB;CACF;CAKA,MAAM,WAAW,OAAO,KAAK;CAC7B,IAAI,SAAS,SAAS,GACpB,MAAM,aAAa,QAAQ;AAE/B;;;;;;;;;AAUA,SAAS,aAAa,MAA2B;CAC/C,IAAI;EAEF,OAAO;GAAE,IAAI;GAAM,OADL,KAAK,MAAM,IACF;GAAG,SAAS;EAAK;CAC1C,SAAS,KAAK;EACZ,OAAO;GACL,IAAI;GACJ,OAAO,eAAe,QAAQ,MAAM,IAAI,MAAM,OAAO,GAAG,CAAC;GACzD,SAAS;EACX;CACF;AACF;;;;AClFA,SAAS,mBAAmB,MAAgB,MAAc,QAAyB;CACjF,IAAI,CAAC,QAAQ;CACb,KAAK,MAAM,SAAS,QAClB,KAAK,KAAK,MAAM,KAAK;AAEzB;;;;;AAMA,SAAS,iBACP,MACA,MACA,OACM;CACN,IAAI,UAAU,KAAA,GAAW;CACzB,IAAI,OAAO,UAAU,WAAW;EAC9B,IAAI,OAAO,KAAK,KAAK,IAAI;EACzB;CACF;CACA,KAAK,KAAK,MAAM,OAAO,KAAK,CAAC;AAC/B;;AAGA,SAAgB,sBACd,MACA,QACM;CACN,mBAAmB,MAAM,gBAAgB,OAAO,UAAU;CAC1D,mBAAmB,MAAM,gBAAgB,OAAO,UAAU;CAC1D,mBAAmB,MAAM,aAAa,OAAO,OAAO;CAEpD,iBAAiB,MAAM,gBAAgB,OAAO,SAAS;CACvD,iBAAiB,MAAM,WAAW,OAAO,KAAK;CAC9C,iBAAiB,MAAM,qBAAqB,OAAO,cAAc;CACjE,iBAAiB,MAAM,YAAY,OAAO,MAAM;CAChD,iBAAiB,MAAM,WAAW,OAAO,KAAK;CAC9C,iBAAiB,MAAM,oBAAoB,OAAO,aAAa;CAC/D,iBAAiB,MAAM,WAAW,OAAO,KAAK;CAC9C,iBAAiB,MAAM,cAAc,OAAO,QAAQ;CACpD,iBAAiB,MAAM,qBAAqB,OAAO,cAAc;CACjE,iBAAiB,MAAM,eAAe,OAAO,QAAQ;CACrD,iBAAiB,MAAM,oBAAoB,OAAO,YAAY;CAC9D,iBAAiB,MAAM,mBAAmB,OAAO,YAAY;CAC7D,iBAAiB,MAAM,wBAAwB,OAAO,gBAAgB;CACtE,iBAAiB,MAAM,0BAA0B,OAAO,kBAAkB;CAC1E,iBACE,MACA,+BACA,OAAO,sBACT;CACA,iBAAiB,MAAM,WAAW,OAAO,KAAK;CAC9C,iBAAiB,MAAM,gBAAgB,OAAO,SAAS;CAEvD,IAAI,OAAO,gBAAgB,OAAO,aAAa,SAAS,GACtD,KAAK,KAAK,kBAAkB,OAAO,aAAa,KAAK,GAAG,CAAC;CAG3D,IAAI,OAAO,mBAAmB,OAAO,gBAAgB,SAAS,GAC5D,KAAK,KAAK,qBAAqB,OAAO,gBAAgB,KAAK,GAAG,CAAC;CAGjE,iBAAiB,MAAM,uBAAuB,OAAO,eAAe;CACpE,iBAAiB,MAAM,yBAAyB,OAAO,iBAAiB;CACxE,iBAAiB,MAAM,4BAA4B,OAAO,oBAAoB;CAC9E,iBAAiB,MAAM,4BAA4B,OAAO,oBAAoB;CAC9E,iBAAiB,MAAM,UAAU,OAAO,IAAI;CAC5C,iBAAiB,MAAM,eAAe,OAAO,QAAQ;CACrD,iBACE,MACA,wCACA,OAAO,+BACT;CACA,iBACE,MACA,kCACA,OAAO,0BACT;AACF;;;;;;;AAQA,SAAgB,UAAU,QAA2C;CACnE,MAAM,OAAiB;EACrB;EACA,OAAO;EACP;EACA;EACA;CACF;CAEA,sBAAsB,MAAM,MAAM;CAElC,OAAO;AACT;;;;;;;AAQA,SAAgB,eACd,QACA,SAAiD,CAAC,GACxC;CACV,MAAM,OAAiB;EAAC;EAAM;EAAQ;EAAmB;CAAM;CAC/D,MAAM,iBAAiB,OAAO,kBAAkB;CAChD,sBAAsB,MAAM;EAC1B,GAAG;EACH;CACF,CAAC;CACD,OAAO;AACT;;;;;;;;;;;;;;;;;;ACvGA,MAAM,qBAAqB,MAAS;;;;;;AAOpC,MAAM,gBAAgB;;;;;;;;;;;;;AA+BtB,eAAsB,YACpB,QACwB;CACxB,MAAM,SAAS,OAAO,UAAU;CAChC,MAAM,OAAO,UAAU,MAAM;CAM7B,MAAM,gBAJgB,OAAO,kBAAkB,QAK3C,MAAM,QAAQ,KAAK,OAAO,GAAG,eAAe,CAAC,IAC7C;CAEJ,MAAM,MAA0C;EAC9C,GAAG,QAAQ;EACX,GAAG,OAAO;CACZ;CACA,IAAI,eAEF,IAAI,oBAAoB;CAG1B,MAAM,QAAQ,MAAM,QAAQ,MAAM;EAChC,KAAK,OAAO,OAAO,QAAQ,IAAI;EAC/B;EACA,OAAO;GAAC;GAAU;GAAQ;EAAM;EAIhC,UAAU;CACZ,CAAC;CAKD,IAAI,WAAW;CACf,IAAI,iBAAwC;CAC5C,MAAM,YAAY,OAAO,aAAa;;;;;CAMtC,MAAM,+BAA+B;EACnC,IAAI,gBAAgB,aAAa,cAAc;EAC/C,iBAAiB,iBACT,SAAS,OAAO,SAAS,GAC/B,aACF;CACF;CAEA,MAAM,eAAe,iBAAiB;EACpC,WAAW;EACX,SAAS,OAAO,SAAS;EACzB,uBAAuB;CACzB,GAAG,SAAS;CAGZ,MAAM,gBAAgB;EACpB,SAAS,OAAO,SAAS;EACzB,uBAAuB;CACzB;CACA,OAAO,QAAQ,iBAAiB,SAAS,SAAS,EAAE,MAAM,KAAK,CAAC;CAKhE,MAAM,eAAyB,CAAC;CAChC,MAAM,QAAQ,YAAY,MAAM;CAChC,MAAM,QAAQ,GAAG,SAAS,UAAkB;EAC1C,aAAa,KAAK,KAAK;CACzB,CAAC;CAED,MAAM,kBAAkB,IAAI,SAAiB,YAAY;EACvD,MAAM,iBAAiB,QAAQ,aAAa,KAAK,EAAE,CAAC;EACpD,MAAM,QAAQ,GAAG,OAAO,QAAQ;EAGhC,MAAM,QAAQ,GAAG,SAAS,QAAQ;CACpC,CAAC;CAKD,MAAM,OAAO,IAAI,SAGb,YAAY;EACd,IAAI,UAAU;EACd,MAAM,YACJ,UACA,WACG;GACH,IAAI,SAAS;GACb,UAAU;GAEV,aAAa,YAAY;GACzB,IAAI,gBAAgB,aAAa,cAAc;GAC/C,OAAO,QAAQ,oBAAoB,SAAS,OAAO;GACnD,QAAQ;IAAE;IAAU;GAAO,CAAC;EAC9B;EAEA,MAAM,GAAG,UAAU,MAAM,WAAW,SAAS,MAAM,MAAM,CAAC;EAE1D,MAAM,GAAG,eAAe,SAAS,MAAM,IAAI,CAAC;CAC9C,CAAC;CAGD,MAAM,UAAU,YAAY;EAC1B,IAAI,CAAC,eAAe;EACpB,IAAI;GACF,MAAM,GAAG,eAAe;IAAE,WAAW;IAAM,OAAO;GAAK,CAAC;EAC1D,QAAQ,CAGR;CACF;CAKA,OAAO;EACL,QAAQ,MAAM;EACd;EACA;EACA,gBAAgB;EAChB;CACF;AACF;;;;;;;;;;;;;;;;;AAkBA,SAAS,SAAS,OAAqB,QAA8B;CACnE,IAAI,MAAM,QAAQ,KAAA,GAAW;CAC7B,IAAI;EAEF,QAAQ,KAAK,CAAC,MAAM,KAAK,MAAM;CACjC,QAAQ;EACN,IAAI;GAEF,MAAM,KAAK,MAAM;EACnB,QAAQ,CAER;CACF;AACF;;;;;;;;;;;;;;AC/LA,eAAsB,cACpB,QACkC;CAClC,MAAM,UAAU,KAAK,IAAI;CACzB,MAAM,UAAU,MAAM,YAAY,MAAM;CAExC,MAAM,UAAU,IAAI,kBAAkB;CACtC,MAAM,YAA2B,CAAC;CAClC,MAAM,cAAkC,CAAC;CAEzC,IAAI;EACF,WAAW,MAAM,UAAU,gBAAgB,QAAQ,MAAM,GACvD,IAAI,OAAO,IAAI;GACb,QAAQ,QAAQ,OAAO,KAAK;GAC5B,UAAU,KAAK,OAAO,KAAK;EAC7B,OACE,YAAY,KAAK;GACf,MAAM,OAAO;GACb,OAAO,OAAO,MAAM;EACtB,CAAC;EAIL,MAAM,CAAC,EAAE,UAAU,UAAU,UAAU,MAAM,QAAQ,IAAI,CACvD,QAAQ,MACR,QAAQ,eACV,CAAC;EAED,MAAM,cAAkC;GACtC;GACA;GACA;GACA;GACA,UAAU,QAAQ,SAAS;GAC3B,YAAY,KAAK,IAAI,IAAI;EAC3B;EAEA,IAAI;EACJ,IAAI;GACF,OAAO,QAAQ,MAAM;EACvB,SAAS,KAAK;GAEZ,MAAM,IAAI,aACR,0CAFc,eAAe,QAAQ,IAAI,UAAU,OAAO,GAAG,KAG7D,WACF;EACF;EAEA,OAAO;GAAE;GAAM;GAAa;EAAU;CACxC,UAAU;EACR,MAAM,QAAQ,QAAQ;CACxB;AACF;;AAGA,MAAa,oBAA6D;CACxE,IAAI;CACJ,KAAK;AACP"}