agentv 0.5.1 → 0.5.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -64,39 +64,17 @@ You are now ready to start development. The monorepo contains:
64
64
 
65
65
  ### Environment Setup
66
66
 
67
- 1. Configure environment variables:
68
- - Copy [.env.template](docs/examples/simple/.env.template) to `.env` in your project root
69
- - Fill in your API keys, endpoints, and other configuration values
67
+ 1. Initialize your workspace:
68
+ - Run `agentv init` at the root of your repository
69
+ - This command automatically sets up the `.agentv/` directory structure and configuration files
70
70
 
71
- 2. Set up targets:
72
- - Copy [targets.yaml](docs/examples/simple/.agentv/targets.yaml) to `.agentv/targets.yaml`
73
- - Update the environment variable names in targets.yaml to match those defined in your `.env` file
71
+ 2. Configure environment variables:
72
+ - The init command creates a `.env.template` file in your project root
73
+ - Copy `.env.template` to `.env` and fill in your API keys, endpoints, and other configuration values
74
+ - Update the environment variable names in `.agentv/targets.yaml` to match those defined in your `.env` file
74
75
 
75
76
  ## Quick Start
76
77
 
77
- ### Configuring Guideline Patterns
78
-
79
- AgentV automatically detects guideline files and treats them differently from regular file content. You can customize which files are considered guidelines using an optional `.agentv/config.yaml` configuration file.
80
-
81
- **Config file discovery:**
82
- - AgentV searches for `.agentv/config.yaml` starting from the eval file's directory
83
- - Walks up the directory tree to the repository root
84
- - Uses the first config file found (similar to how `targets.yaml` is discovered)
85
- - This allows you to place one config file at the project root for all evals
86
-
87
- **Custom patterns** (create `.agentv/config.yaml` in same directory as your eval file):
88
-
89
- ```yaml
90
- # .agentv/config.yaml
91
- guideline_patterns:
92
- - "**/*.guide.md" # Match all .guide.md files
93
- - "**/guidelines/**" # Match all files in /guidelines/ dirs
94
- - "docs/AGENTS.md" # Match specific files
95
- - "**/*.rules.md" # Match by naming convention
96
- ```
97
-
98
- See [config.yaml example](docs/examples/simple/.agentv/config.yaml) for more pattern examples.
99
-
100
78
  ### Validating Eval Files
101
79
 
102
80
  Validate your eval and targets files before running them:
@@ -157,7 +135,7 @@ agentv eval --target vscode_projectx --targets "path/to/targets.yaml" --eval-id
157
135
  - `--targets TARGETS`: Path to targets.yaml file (default: ./.agentv/targets.yaml)
158
136
  - `--eval-id EVAL_ID`: Run only the eval case with this specific ID
159
137
  - `--out OUTPUT_FILE`: Output file path (default: results/{evalname}_{timestamp}.jsonl)
160
- - `--format FORMAT`: Output format: 'jsonl' or 'yaml' (default: jsonl)
138
+ - `--output-format FORMAT`: Output format: 'jsonl' or 'yaml' (default: jsonl)
161
139
  - `--dry-run`: Run with mock model for testing
162
140
  - `--agent-timeout SECONDS`: Timeout in seconds for agent response polling (default: 120)
163
141
  - `--max-retries COUNT`: Maximum number of retries for timeout cases (default: 2)
@@ -256,21 +234,6 @@ Each target specifies:
256
234
  Codex targets require the standalone `codex` CLI and a configured profile (via `codex configure`) so credentials are stored in `~/.codex/config` (or whatever path the CLI already uses). AgentV mirrors all guideline and attachment files into a fresh scratch workspace, so the `file://` preread links remain valid even when the CLI runs outside your repo tree.
257
235
  Confirm the CLI works by running `codex exec --json --profile <name> "ping"` (or any supported dry run) before starting an eval. This prints JSONL events; seeing `item.completed` messages indicates the CLI is healthy.
258
236
 
259
- ## Timeout Handling and Retries
260
-
261
- When using VS Code or other AI agents that may experience timeouts, the evaluator includes automatic retry functionality:
262
-
263
- - **Timeout detection:** Automatically detects when agents timeout
264
- - **Automatic retries:** When a timeout occurs, the same eval case is retried up to `--max-retries` times (default: 2)
265
- - **Retry behavior:** Only timeouts trigger retries; other errors proceed to the next eval case
266
- - **Timeout configuration:** Use `--agent-timeout` to adjust how long to wait for agent responses
267
-
268
- Example with custom timeout settings:
269
-
270
- ```bash
271
- agentv eval evals/projectx/example.yaml --target vscode_projectx --agent-timeout 180 --max-retries 3
272
- ```
273
-
274
237
  ## Writing Custom Evaluators
275
238
 
276
239
  ### Code Evaluator I/O Contract
@@ -370,110 +333,17 @@ Evaluation criteria and guidelines...
370
333
 
371
334
  ## Next Steps
372
335
 
373
- - Review `docs/examples/simple/evals/example-eval.yaml` to understand the schema
374
- - Create your own eval cases following the schema
375
- - Write custom evaluator scripts for domain-specific validation
376
- - Create LLM judge templates for semantic evaluation
336
+ - Review [docs/examples/simple/evals/example-eval.yaml](docs/examples/simple/evals/example-eval.yaml) to understand the schema
337
+ - Create your own eval dataset following the schema
338
+ - Write custom evaluator scripts for deterministic evaluation
339
+ - Create LLM judge prompts for semantic evaluation
377
340
  - Set up optimizer configs when ready to improve prompts
378
341
 
379
342
  ## Resources
380
343
 
381
344
  - [Simple Example README](docs/examples/simple/README.md)
382
- - [Schema Specification](docs/openspec/changes/update-eval-schema-v2/)
383
345
  - [Ax ACE Documentation](https://github.com/ax-llm/ax/blob/main/docs/ACE.md)
384
346
 
385
- ## Scoring and Outputs
386
-
387
- Run with `--verbose` to print detailed information and stack traces on errors.
388
-
389
- ### Scoring Methodology
390
-
391
- AgentV uses an AI-powered quality grader that:
392
-
393
- - Extracts key aspects from the expected answer
394
- - Compares model output against those aspects
395
- - Provides detailed hit/miss analysis with reasoning
396
- - Returns a normalized score (0.0 to 1.0)
397
-
398
- ### Output Formats
399
-
400
- **JSONL format (default):**
401
-
402
- - One JSON object per line (newline-delimited)
403
- - Fields: `eval_id`, `score`, `hits`, `misses`, `model_answer`, `expected_aspect_count`, `target`, `timestamp`, `reasoning`, `raw_request`, `grader_raw_request`
404
-
405
- **YAML format (with `--format yaml`):**
406
-
407
- - Human-readable YAML documents
408
- - Same fields as JSONL, properly formatted for readability
409
- - Multi-line strings use literal block style
410
-
411
- ### Summary Statistics
412
-
413
- After running all eval cases, AgentV displays:
414
-
415
- - Mean, median, min, max scores
416
- - Standard deviation
417
- - Distribution histogram
418
- - Total eval count and execution time
419
-
420
- ## Architecture
421
-
422
- AgentV is built as a TypeScript monorepo using:
423
-
424
- - **pnpm workspaces:** Efficient dependency management
425
- - **Turbo:** Build system and task orchestration
426
- - **@ax-llm/ax:** Unified LLM provider abstraction
427
- - **Vercel AI SDK:** Streaming and tool use capabilities
428
- - **Zod:** Runtime type validation
429
- - **Commander.js:** CLI argument parsing
430
- - **Vitest:** Testing framework
431
-
432
- ### Package Structure
433
-
434
- - `@agentv/core` - Core evaluation engine, providers, grading logic
435
- - `agentv` - Main package that bundles CLI functionality
436
-
437
- ## Troubleshooting
438
-
439
- ### Installation Issues
440
-
441
- **Problem:** Package installation fails or command not found.
442
-
443
- **Solution:**
444
-
445
- ```bash
446
- # Clear npm cache and reinstall
447
- npm cache clean --force
448
- npm uninstall -g agentv
449
- npm install -g agentv
450
-
451
- # Or use npx without installing
452
- npx agentv@latest --help
453
- ```
454
-
455
- ### VS Code Integration Issues
456
-
457
- **Problem:** VS Code workspace doesn't open or prompts aren't injected.
458
-
459
- **Solution:**
460
-
461
- - Ensure the `subagent` package is installed (should be automatic)
462
- - Verify your workspace path in `.env` is correct and points to a `.code-workspace` file
463
- - Close any other VS Code instances before running evals
464
- - Use `--verbose` flag to see detailed workspace switching logs
465
-
466
- ### Provider Configuration Issues
467
-
468
- **Problem:** API authentication errors or missing credentials.
469
-
470
- **Solution:**
471
-
472
- - Double-check environment variables in your `.env` file
473
- - Verify the variable names in `targets.yaml` match your `.env` file
474
- - Use `--dry-run` first to test without making API calls
475
- - Check provider-specific documentation for required environment variables
476
-
477
347
  ## License
478
348
 
479
349
  MIT License - see [LICENSE](LICENSE) for details.
@@ -5040,7 +5040,8 @@ import { exec as execWithCallback } from "node:child_process";
5040
5040
  import path22 from "node:path";
5041
5041
  import { promisify as promisify2 } from "node:util";
5042
5042
  import { exec as execCallback, spawn as spawn2 } from "node:child_process";
5043
- import { constants as constants22 } from "node:fs";
5043
+ import { randomUUID } from "node:crypto";
5044
+ import { constants as constants22, createWriteStream } from "node:fs";
5044
5045
  import { access as access22, copyFile as copyFile2, mkdtemp, mkdir as mkdir3, rm as rm2, writeFile as writeFile3 } from "node:fs/promises";
5045
5046
  import { tmpdir } from "node:os";
5046
5047
  import path42 from "node:path";
@@ -11032,8 +11033,8 @@ import { constants as constants32 } from "node:fs";
11032
11033
  import { access as access32, readFile as readFile3 } from "node:fs/promises";
11033
11034
  import path62 from "node:path";
11034
11035
  import { parse as parse22 } from "yaml";
11035
- import { randomUUID } from "node:crypto";
11036
- import { createHash, randomUUID as randomUUID2 } from "node:crypto";
11036
+ import { randomUUID as randomUUID2 } from "node:crypto";
11037
+ import { createHash, randomUUID as randomUUID3 } from "node:crypto";
11037
11038
  import { mkdir as mkdir22, readFile as readFile4, writeFile as writeFile22 } from "node:fs/promises";
11038
11039
  import path72 from "node:path";
11039
11040
  var TEST_MESSAGE_ROLE_VALUES = ["system", "user", "assistant", "tool"];
@@ -12088,6 +12089,7 @@ var CodexProvider = class {
12088
12089
  collectGuidelineFiles(inputFiles, request.guideline_patterns).map((file) => path42.resolve(file))
12089
12090
  );
12090
12091
  const workspaceRoot = await this.createWorkspace();
12092
+ const logger = await this.createStreamLogger(request).catch(() => void 0);
12091
12093
  try {
12092
12094
  const { mirroredInputFiles, guidelineMirrors } = await this.mirrorInputFiles(
12093
12095
  inputFiles,
@@ -12102,7 +12104,7 @@ var CodexProvider = class {
12102
12104
  await writeFile3(promptFile, promptContent, "utf8");
12103
12105
  const args = this.buildCodexArgs();
12104
12106
  const cwd = this.resolveCwd(workspaceRoot);
12105
- const result = await this.executeCodex(args, cwd, promptContent, request.signal);
12107
+ const result = await this.executeCodex(args, cwd, promptContent, request.signal, logger);
12106
12108
  if (result.timedOut) {
12107
12109
  throw new Error(
12108
12110
  `Codex CLI timed out${formatTimeoutSuffix2(this.config.timeoutMs ?? void 0)}`
@@ -12126,10 +12128,12 @@ var CodexProvider = class {
12126
12128
  executable: this.resolvedExecutable ?? this.config.executable,
12127
12129
  promptFile,
12128
12130
  workspace: workspaceRoot,
12129
- inputFiles: mirroredInputFiles
12131
+ inputFiles: mirroredInputFiles,
12132
+ logFile: logger?.filePath
12130
12133
  }
12131
12134
  };
12132
12135
  } finally {
12136
+ await logger?.close();
12133
12137
  await this.cleanupWorkspace(workspaceRoot);
12134
12138
  }
12135
12139
  }
@@ -12156,7 +12160,7 @@ var CodexProvider = class {
12156
12160
  args.push("-");
12157
12161
  return args;
12158
12162
  }
12159
- async executeCodex(args, cwd, promptContent, signal) {
12163
+ async executeCodex(args, cwd, promptContent, signal, logger) {
12160
12164
  try {
12161
12165
  return await this.runCodex({
12162
12166
  executable: this.resolvedExecutable ?? this.config.executable,
@@ -12165,7 +12169,9 @@ var CodexProvider = class {
12165
12169
  prompt: promptContent,
12166
12170
  timeoutMs: this.config.timeoutMs,
12167
12171
  env: process.env,
12168
- signal
12172
+ signal,
12173
+ onStdoutChunk: logger ? (chunk) => logger.handleStdoutChunk(chunk) : void 0,
12174
+ onStderrChunk: logger ? (chunk) => logger.handleStderrChunk(chunk) : void 0
12169
12175
  });
12170
12176
  } catch (error) {
12171
12177
  const err = error;
@@ -12217,7 +12223,235 @@ var CodexProvider = class {
12217
12223
  } catch {
12218
12224
  }
12219
12225
  }
12226
+ resolveLogDirectory() {
12227
+ const disabled = isCodexLogStreamingDisabled();
12228
+ if (disabled) {
12229
+ return void 0;
12230
+ }
12231
+ if (this.config.logDir) {
12232
+ return path42.resolve(this.config.logDir);
12233
+ }
12234
+ return path42.join(process.cwd(), ".agentv", "logs", "codex");
12235
+ }
12236
+ async createStreamLogger(request) {
12237
+ const logDir = this.resolveLogDirectory();
12238
+ if (!logDir) {
12239
+ return void 0;
12240
+ }
12241
+ try {
12242
+ await mkdir3(logDir, { recursive: true });
12243
+ } catch (error) {
12244
+ const message = error instanceof Error ? error.message : String(error);
12245
+ console.warn(`Skipping Codex stream logging (could not create ${logDir}): ${message}`);
12246
+ return void 0;
12247
+ }
12248
+ const filePath = path42.join(logDir, buildLogFilename(request, this.targetName));
12249
+ try {
12250
+ const logger = await CodexStreamLogger.create({
12251
+ filePath,
12252
+ targetName: this.targetName,
12253
+ evalCaseId: request.evalCaseId,
12254
+ attempt: request.attempt,
12255
+ format: this.config.logFormat ?? "summary"
12256
+ });
12257
+ console.log(`Streaming Codex CLI output to ${filePath}`);
12258
+ return logger;
12259
+ } catch (error) {
12260
+ const message = error instanceof Error ? error.message : String(error);
12261
+ console.warn(`Skipping Codex stream logging for ${filePath}: ${message}`);
12262
+ return void 0;
12263
+ }
12264
+ }
12220
12265
  };
12266
+ var CodexStreamLogger = class _CodexStreamLogger {
12267
+ filePath;
12268
+ stream;
12269
+ startedAt = Date.now();
12270
+ stdoutBuffer = "";
12271
+ stderrBuffer = "";
12272
+ format;
12273
+ constructor(filePath, format) {
12274
+ this.filePath = filePath;
12275
+ this.format = format;
12276
+ this.stream = createWriteStream(filePath, { flags: "a" });
12277
+ }
12278
+ static async create(options) {
12279
+ const logger = new _CodexStreamLogger(options.filePath, options.format);
12280
+ const header = [
12281
+ "# Codex CLI stream log",
12282
+ `# target: ${options.targetName}`,
12283
+ options.evalCaseId ? `# eval: ${options.evalCaseId}` : void 0,
12284
+ options.attempt !== void 0 ? `# attempt: ${options.attempt + 1}` : void 0,
12285
+ `# started: ${(/* @__PURE__ */ new Date()).toISOString()}`,
12286
+ ""
12287
+ ].filter((line2) => Boolean(line2));
12288
+ logger.writeLines(header);
12289
+ return logger;
12290
+ }
12291
+ handleStdoutChunk(chunk) {
12292
+ this.stdoutBuffer += chunk;
12293
+ this.flushBuffer("stdout");
12294
+ }
12295
+ handleStderrChunk(chunk) {
12296
+ this.stderrBuffer += chunk;
12297
+ this.flushBuffer("stderr");
12298
+ }
12299
+ async close() {
12300
+ this.flushBuffer("stdout");
12301
+ this.flushBuffer("stderr");
12302
+ this.flushRemainder();
12303
+ await new Promise((resolve, reject) => {
12304
+ this.stream.once("error", reject);
12305
+ this.stream.end(() => resolve());
12306
+ });
12307
+ }
12308
+ writeLines(lines) {
12309
+ for (const line2 of lines) {
12310
+ this.stream.write(`${line2}
12311
+ `);
12312
+ }
12313
+ }
12314
+ flushBuffer(source2) {
12315
+ const buffer2 = source2 === "stdout" ? this.stdoutBuffer : this.stderrBuffer;
12316
+ const lines = buffer2.split(/\r?\n/);
12317
+ const remainder = lines.pop() ?? "";
12318
+ if (source2 === "stdout") {
12319
+ this.stdoutBuffer = remainder;
12320
+ } else {
12321
+ this.stderrBuffer = remainder;
12322
+ }
12323
+ for (const line2 of lines) {
12324
+ const formatted = this.formatLine(line2, source2);
12325
+ if (formatted) {
12326
+ this.stream.write(formatted);
12327
+ this.stream.write("\n");
12328
+ }
12329
+ }
12330
+ }
12331
+ formatLine(rawLine, source2) {
12332
+ const trimmed = rawLine.trim();
12333
+ if (trimmed.length === 0) {
12334
+ return void 0;
12335
+ }
12336
+ const message = this.format === "json" ? formatCodexJsonLog(trimmed) : formatCodexLogMessage(trimmed, source2);
12337
+ return `[+${formatElapsed(this.startedAt)}] [${source2}] ${message}`;
12338
+ }
12339
+ flushRemainder() {
12340
+ const stdoutRemainder = this.stdoutBuffer.trim();
12341
+ if (stdoutRemainder.length > 0) {
12342
+ const formatted = this.formatLine(stdoutRemainder, "stdout");
12343
+ if (formatted) {
12344
+ this.stream.write(formatted);
12345
+ this.stream.write("\n");
12346
+ }
12347
+ }
12348
+ const stderrRemainder = this.stderrBuffer.trim();
12349
+ if (stderrRemainder.length > 0) {
12350
+ const formatted = this.formatLine(stderrRemainder, "stderr");
12351
+ if (formatted) {
12352
+ this.stream.write(formatted);
12353
+ this.stream.write("\n");
12354
+ }
12355
+ }
12356
+ this.stdoutBuffer = "";
12357
+ this.stderrBuffer = "";
12358
+ }
12359
+ };
12360
+ function isCodexLogStreamingDisabled() {
12361
+ const envValue = process.env.AGENTV_CODEX_STREAM_LOGS;
12362
+ if (!envValue) {
12363
+ return false;
12364
+ }
12365
+ const normalized = envValue.trim().toLowerCase();
12366
+ return normalized === "false" || normalized === "0" || normalized === "off";
12367
+ }
12368
+ function buildLogFilename(request, targetName) {
12369
+ const timestamp = (/* @__PURE__ */ new Date()).toISOString().replace(/[:.]/g, "-");
12370
+ const evalId = sanitizeForFilename(request.evalCaseId ?? "codex");
12371
+ const attemptSuffix = request.attempt !== void 0 ? `_attempt-${request.attempt + 1}` : "";
12372
+ const target = sanitizeForFilename(targetName);
12373
+ return `${timestamp}_${target}_${evalId}${attemptSuffix}_${randomUUID().slice(0, 8)}.log`;
12374
+ }
12375
+ function sanitizeForFilename(value) {
12376
+ const sanitized = value.replace(/[^A-Za-z0-9._-]+/g, "_");
12377
+ return sanitized.length > 0 ? sanitized : "codex";
12378
+ }
12379
+ function formatElapsed(startedAt) {
12380
+ const elapsedSeconds = Math.floor((Date.now() - startedAt) / 1e3);
12381
+ const hours = Math.floor(elapsedSeconds / 3600);
12382
+ const minutes = Math.floor(elapsedSeconds % 3600 / 60);
12383
+ const seconds = elapsedSeconds % 60;
12384
+ if (hours > 0) {
12385
+ return `${hours.toString().padStart(2, "0")}:${minutes.toString().padStart(2, "0")}:${seconds.toString().padStart(2, "0")}`;
12386
+ }
12387
+ return `${minutes.toString().padStart(2, "0")}:${seconds.toString().padStart(2, "0")}`;
12388
+ }
12389
+ function formatCodexLogMessage(rawLine, source2) {
12390
+ const parsed = tryParseJsonValue(rawLine);
12391
+ if (parsed) {
12392
+ const summary = summarizeCodexEvent(parsed);
12393
+ if (summary) {
12394
+ return summary;
12395
+ }
12396
+ }
12397
+ if (source2 === "stderr") {
12398
+ return `stderr: ${rawLine}`;
12399
+ }
12400
+ return rawLine;
12401
+ }
12402
+ function formatCodexJsonLog(rawLine) {
12403
+ const parsed = tryParseJsonValue(rawLine);
12404
+ if (!parsed) {
12405
+ return rawLine;
12406
+ }
12407
+ try {
12408
+ return JSON.stringify(parsed, null, 2);
12409
+ } catch {
12410
+ return rawLine;
12411
+ }
12412
+ }
12413
+ function summarizeCodexEvent(event) {
12414
+ if (!event || typeof event !== "object") {
12415
+ return void 0;
12416
+ }
12417
+ const record = event;
12418
+ const type = typeof record.type === "string" ? record.type : void 0;
12419
+ let message = extractFromEvent(event) ?? extractFromItem(record.item) ?? flattenContent(record.output ?? record.content);
12420
+ if (!message && type === JSONL_TYPE_ITEM_COMPLETED) {
12421
+ const item = record.item;
12422
+ if (item && typeof item === "object") {
12423
+ const candidate = flattenContent(
12424
+ item.text ?? item.content ?? item.output
12425
+ );
12426
+ if (candidate) {
12427
+ message = candidate;
12428
+ }
12429
+ }
12430
+ }
12431
+ if (!message) {
12432
+ const itemType = typeof record.item?.type === "string" ? record.item.type : void 0;
12433
+ if (type && itemType) {
12434
+ return `${type}:${itemType}`;
12435
+ }
12436
+ if (type) {
12437
+ return type;
12438
+ }
12439
+ }
12440
+ if (type && message) {
12441
+ return `${type}: ${message}`;
12442
+ }
12443
+ if (message) {
12444
+ return message;
12445
+ }
12446
+ return type;
12447
+ }
12448
+ function tryParseJsonValue(rawLine) {
12449
+ try {
12450
+ return JSON.parse(rawLine);
12451
+ } catch {
12452
+ return void 0;
12453
+ }
12454
+ }
12221
12455
  async function locateExecutable(candidate) {
12222
12456
  const includesPathSeparator = candidate.includes("/") || candidate.includes("\\");
12223
12457
  if (includesPathSeparator) {
@@ -12487,10 +12721,12 @@ async function defaultCodexRunner(options) {
12487
12721
  child.stdout.setEncoding("utf8");
12488
12722
  child.stdout.on("data", (chunk) => {
12489
12723
  stdout += chunk;
12724
+ options.onStdoutChunk?.(chunk);
12490
12725
  });
12491
12726
  child.stderr.setEncoding("utf8");
12492
12727
  child.stderr.on("data", (chunk) => {
12493
12728
  stderr += chunk;
12729
+ options.onStderrChunk?.(chunk);
12494
12730
  });
12495
12731
  child.stdin.end(options.prompt);
12496
12732
  const cleanup = () => {
@@ -12730,6 +12966,8 @@ function resolveCodexConfig(target, env) {
12730
12966
  const argsSource = settings.args ?? settings.arguments;
12731
12967
  const cwdSource = settings.cwd;
12732
12968
  const timeoutSource = settings.timeout_seconds ?? settings.timeoutSeconds;
12969
+ const logDirSource = settings.log_dir ?? settings.logDir ?? settings.log_directory ?? settings.logDirectory;
12970
+ const logFormatSource = settings.log_format ?? settings.logFormat ?? settings.log_output_format ?? settings.logOutputFormat ?? env.AGENTV_CODEX_LOG_FORMAT;
12733
12971
  const executable = resolveOptionalString(executableSource, env, `${target.name} codex executable`, {
12734
12972
  allowLiteral: true,
12735
12973
  optionalEnv: true
@@ -12740,13 +12978,33 @@ function resolveCodexConfig(target, env) {
12740
12978
  optionalEnv: true
12741
12979
  });
12742
12980
  const timeoutMs = resolveTimeoutMs(timeoutSource, `${target.name} codex timeout`);
12981
+ const logDir = resolveOptionalString(logDirSource, env, `${target.name} codex log directory`, {
12982
+ allowLiteral: true,
12983
+ optionalEnv: true
12984
+ });
12985
+ const logFormat = normalizeCodexLogFormat(logFormatSource);
12743
12986
  return {
12744
12987
  executable,
12745
12988
  args,
12746
12989
  cwd,
12747
- timeoutMs
12990
+ timeoutMs,
12991
+ logDir,
12992
+ logFormat
12748
12993
  };
12749
12994
  }
12995
+ function normalizeCodexLogFormat(value) {
12996
+ if (value === void 0 || value === null) {
12997
+ return void 0;
12998
+ }
12999
+ if (typeof value !== "string") {
13000
+ throw new Error("codex log format must be 'summary' or 'json'");
13001
+ }
13002
+ const normalized = value.trim().toLowerCase();
13003
+ if (normalized === "json" || normalized === "summary") {
13004
+ return normalized;
13005
+ }
13006
+ throw new Error("codex log format must be 'summary' or 'json'");
13007
+ }
12750
13008
  function resolveMockConfig(target) {
12751
13009
  const settings = target.settings ?? {};
12752
13010
  const response = typeof settings.response === "string" ? settings.response : void 0;
@@ -13394,7 +13652,7 @@ var LlmJudgeEvaluator = class {
13394
13652
  const misses = Array.isArray(parsed.misses) ? parsed.misses.filter(isNonEmptyString).slice(0, 4) : [];
13395
13653
  const reasoning = parsed.reasoning ?? response.reasoning;
13396
13654
  const evaluatorRawRequest = {
13397
- id: randomUUID(),
13655
+ id: randomUUID2(),
13398
13656
  provider: judgeProvider.id,
13399
13657
  prompt,
13400
13658
  target: context2.target.name,
@@ -14395,7 +14653,7 @@ function sanitizeFilename(value) {
14395
14653
  return "prompt";
14396
14654
  }
14397
14655
  const sanitized = value.replace(/[^A-Za-z0-9._-]+/g, "_");
14398
- return sanitized.length > 0 ? sanitized : randomUUID2();
14656
+ return sanitized.length > 0 ? sanitized : randomUUID3();
14399
14657
  }
14400
14658
  async function invokeProvider(provider, options) {
14401
14659
  const { evalCase, promptInputs, attempt, agentTimeoutMs, signal } = options;
@@ -14756,7 +15014,7 @@ var Mutex = class {
14756
15014
  };
14757
15015
 
14758
15016
  // src/commands/eval/jsonl-writer.ts
14759
- import { createWriteStream } from "node:fs";
15017
+ import { createWriteStream as createWriteStream2 } from "node:fs";
14760
15018
  import { mkdir as mkdir4 } from "node:fs/promises";
14761
15019
  import path10 from "node:path";
14762
15020
  import { finished } from "node:stream/promises";
@@ -14769,7 +15027,7 @@ var JsonlWriter = class _JsonlWriter {
14769
15027
  }
14770
15028
  static async open(filePath) {
14771
15029
  await mkdir4(path10.dirname(filePath), { recursive: true });
14772
- const stream = createWriteStream(filePath, { flags: "w", encoding: "utf8" });
15030
+ const stream = createWriteStream2(filePath, { flags: "w", encoding: "utf8" });
14773
15031
  return new _JsonlWriter(stream);
14774
15032
  }
14775
15033
  async append(record) {
@@ -14798,7 +15056,7 @@ var JsonlWriter = class _JsonlWriter {
14798
15056
  };
14799
15057
 
14800
15058
  // src/commands/eval/yaml-writer.ts
14801
- import { createWriteStream as createWriteStream2 } from "node:fs";
15059
+ import { createWriteStream as createWriteStream3 } from "node:fs";
14802
15060
  import { mkdir as mkdir5 } from "node:fs/promises";
14803
15061
  import path11 from "node:path";
14804
15062
  import { finished as finished2 } from "node:stream/promises";
@@ -14813,7 +15071,7 @@ var YamlWriter = class _YamlWriter {
14813
15071
  }
14814
15072
  static async open(filePath) {
14815
15073
  await mkdir5(path11.dirname(filePath), { recursive: true });
14816
- const stream = createWriteStream2(filePath, { flags: "w", encoding: "utf8" });
15074
+ const stream = createWriteStream3(filePath, { flags: "w", encoding: "utf8" });
14817
15075
  return new _YamlWriter(stream);
14818
15076
  }
14819
15077
  async append(record) {
@@ -15269,7 +15527,7 @@ function normalizeNumber(value, fallback) {
15269
15527
  return fallback;
15270
15528
  }
15271
15529
  function normalizeOptions(rawOptions) {
15272
- const formatStr = normalizeString(rawOptions.format) ?? "jsonl";
15530
+ const formatStr = normalizeString(rawOptions.outputFormat) ?? "jsonl";
15273
15531
  const format = formatStr === "yaml" ? "yaml" : "jsonl";
15274
15532
  const workers = normalizeNumber(rawOptions.workers, 0);
15275
15533
  return {
@@ -15489,7 +15747,11 @@ function registerEvalCommand(program) {
15489
15747
  "--workers <count>",
15490
15748
  "Number of parallel workers (default: 1, max: 50). Can also be set per-target in targets.yaml",
15491
15749
  (value) => parseInteger(value, 1)
15492
- ).option("--out <path>", "Write results to the specified path").option("--format <format>", "Output format: 'jsonl' or 'yaml' (default: jsonl)", "jsonl").option("--dry-run", "Use mock provider responses instead of real LLM calls", false).option(
15750
+ ).option("--out <path>", "Write results to the specified path").option(
15751
+ "--output-format <format>",
15752
+ "Output format: 'jsonl' or 'yaml' (default: jsonl)",
15753
+ "jsonl"
15754
+ ).option("--dry-run", "Use mock provider responses instead of real LLM calls", false).option(
15493
15755
  "--dry-run-delay <ms>",
15494
15756
  "Fixed delay in milliseconds for dry-run mode (overridden by delay range if specified)",
15495
15757
  (value) => parseInteger(value, 0),
@@ -16618,4 +16880,4 @@ export {
16618
16880
  createProgram,
16619
16881
  runCli
16620
16882
  };
16621
- //# sourceMappingURL=chunk-HPH4YWGU.js.map
16883
+ //# sourceMappingURL=chunk-5WBKOCCW.js.map