agentv 0.5.1 → 0.5.3
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +12 -142
- package/dist/{chunk-HPH4YWGU.js → chunk-5WBKOCCW.js} +279 -17
- package/dist/chunk-5WBKOCCW.js.map +1 -0
- package/dist/cli.js +1 -1
- package/dist/index.js +1 -1
- package/dist/templates/agentv/config.yaml +2 -3
- package/dist/templates/agentv/targets.yaml +13 -13
- package/package.json +4 -3
- package/dist/chunk-HPH4YWGU.js.map +0 -1
package/README.md
CHANGED
|
@@ -64,39 +64,17 @@ You are now ready to start development. The monorepo contains:
|
|
|
64
64
|
|
|
65
65
|
### Environment Setup
|
|
66
66
|
|
|
67
|
-
1.
|
|
68
|
-
-
|
|
69
|
-
-
|
|
67
|
+
1. Initialize your workspace:
|
|
68
|
+
- Run `agentv init` at the root of your repository
|
|
69
|
+
- This command automatically sets up the `.agentv/` directory structure and configuration files
|
|
70
70
|
|
|
71
|
-
2.
|
|
72
|
-
-
|
|
73
|
-
-
|
|
71
|
+
2. Configure environment variables:
|
|
72
|
+
- The init command creates a `.env.template` file in your project root
|
|
73
|
+
- Copy `.env.template` to `.env` and fill in your API keys, endpoints, and other configuration values
|
|
74
|
+
- Update the environment variable names in `.agentv/targets.yaml` to match those defined in your `.env` file
|
|
74
75
|
|
|
75
76
|
## Quick Start
|
|
76
77
|
|
|
77
|
-
### Configuring Guideline Patterns
|
|
78
|
-
|
|
79
|
-
AgentV automatically detects guideline files and treats them differently from regular file content. You can customize which files are considered guidelines using an optional `.agentv/config.yaml` configuration file.
|
|
80
|
-
|
|
81
|
-
**Config file discovery:**
|
|
82
|
-
- AgentV searches for `.agentv/config.yaml` starting from the eval file's directory
|
|
83
|
-
- Walks up the directory tree to the repository root
|
|
84
|
-
- Uses the first config file found (similar to how `targets.yaml` is discovered)
|
|
85
|
-
- This allows you to place one config file at the project root for all evals
|
|
86
|
-
|
|
87
|
-
**Custom patterns** (create `.agentv/config.yaml` in same directory as your eval file):
|
|
88
|
-
|
|
89
|
-
```yaml
|
|
90
|
-
# .agentv/config.yaml
|
|
91
|
-
guideline_patterns:
|
|
92
|
-
- "**/*.guide.md" # Match all .guide.md files
|
|
93
|
-
- "**/guidelines/**" # Match all files in /guidelines/ dirs
|
|
94
|
-
- "docs/AGENTS.md" # Match specific files
|
|
95
|
-
- "**/*.rules.md" # Match by naming convention
|
|
96
|
-
```
|
|
97
|
-
|
|
98
|
-
See [config.yaml example](docs/examples/simple/.agentv/config.yaml) for more pattern examples.
|
|
99
|
-
|
|
100
78
|
### Validating Eval Files
|
|
101
79
|
|
|
102
80
|
Validate your eval and targets files before running them:
|
|
@@ -157,7 +135,7 @@ agentv eval --target vscode_projectx --targets "path/to/targets.yaml" --eval-id
|
|
|
157
135
|
- `--targets TARGETS`: Path to targets.yaml file (default: ./.agentv/targets.yaml)
|
|
158
136
|
- `--eval-id EVAL_ID`: Run only the eval case with this specific ID
|
|
159
137
|
- `--out OUTPUT_FILE`: Output file path (default: results/{evalname}_{timestamp}.jsonl)
|
|
160
|
-
- `--format FORMAT`: Output format: 'jsonl' or 'yaml' (default: jsonl)
|
|
138
|
+
- `--output-format FORMAT`: Output format: 'jsonl' or 'yaml' (default: jsonl)
|
|
161
139
|
- `--dry-run`: Run with mock model for testing
|
|
162
140
|
- `--agent-timeout SECONDS`: Timeout in seconds for agent response polling (default: 120)
|
|
163
141
|
- `--max-retries COUNT`: Maximum number of retries for timeout cases (default: 2)
|
|
@@ -256,21 +234,6 @@ Each target specifies:
|
|
|
256
234
|
Codex targets require the standalone `codex` CLI and a configured profile (via `codex configure`) so credentials are stored in `~/.codex/config` (or whatever path the CLI already uses). AgentV mirrors all guideline and attachment files into a fresh scratch workspace, so the `file://` preread links remain valid even when the CLI runs outside your repo tree.
|
|
257
235
|
Confirm the CLI works by running `codex exec --json --profile <name> "ping"` (or any supported dry run) before starting an eval. This prints JSONL events; seeing `item.completed` messages indicates the CLI is healthy.
|
|
258
236
|
|
|
259
|
-
## Timeout Handling and Retries
|
|
260
|
-
|
|
261
|
-
When using VS Code or other AI agents that may experience timeouts, the evaluator includes automatic retry functionality:
|
|
262
|
-
|
|
263
|
-
- **Timeout detection:** Automatically detects when agents timeout
|
|
264
|
-
- **Automatic retries:** When a timeout occurs, the same eval case is retried up to `--max-retries` times (default: 2)
|
|
265
|
-
- **Retry behavior:** Only timeouts trigger retries; other errors proceed to the next eval case
|
|
266
|
-
- **Timeout configuration:** Use `--agent-timeout` to adjust how long to wait for agent responses
|
|
267
|
-
|
|
268
|
-
Example with custom timeout settings:
|
|
269
|
-
|
|
270
|
-
```bash
|
|
271
|
-
agentv eval evals/projectx/example.yaml --target vscode_projectx --agent-timeout 180 --max-retries 3
|
|
272
|
-
```
|
|
273
|
-
|
|
274
237
|
## Writing Custom Evaluators
|
|
275
238
|
|
|
276
239
|
### Code Evaluator I/O Contract
|
|
@@ -370,110 +333,17 @@ Evaluation criteria and guidelines...
|
|
|
370
333
|
|
|
371
334
|
## Next Steps
|
|
372
335
|
|
|
373
|
-
- Review
|
|
374
|
-
- Create your own eval
|
|
375
|
-
- Write custom evaluator scripts for
|
|
376
|
-
- Create LLM judge
|
|
336
|
+
- Review [docs/examples/simple/evals/example-eval.yaml](docs/examples/simple/evals/example-eval.yaml) to understand the schema
|
|
337
|
+
- Create your own eval dataset following the schema
|
|
338
|
+
- Write custom evaluator scripts for deterministic evaluation
|
|
339
|
+
- Create LLM judge prompts for semantic evaluation
|
|
377
340
|
- Set up optimizer configs when ready to improve prompts
|
|
378
341
|
|
|
379
342
|
## Resources
|
|
380
343
|
|
|
381
344
|
- [Simple Example README](docs/examples/simple/README.md)
|
|
382
|
-
- [Schema Specification](docs/openspec/changes/update-eval-schema-v2/)
|
|
383
345
|
- [Ax ACE Documentation](https://github.com/ax-llm/ax/blob/main/docs/ACE.md)
|
|
384
346
|
|
|
385
|
-
## Scoring and Outputs
|
|
386
|
-
|
|
387
|
-
Run with `--verbose` to print detailed information and stack traces on errors.
|
|
388
|
-
|
|
389
|
-
### Scoring Methodology
|
|
390
|
-
|
|
391
|
-
AgentV uses an AI-powered quality grader that:
|
|
392
|
-
|
|
393
|
-
- Extracts key aspects from the expected answer
|
|
394
|
-
- Compares model output against those aspects
|
|
395
|
-
- Provides detailed hit/miss analysis with reasoning
|
|
396
|
-
- Returns a normalized score (0.0 to 1.0)
|
|
397
|
-
|
|
398
|
-
### Output Formats
|
|
399
|
-
|
|
400
|
-
**JSONL format (default):**
|
|
401
|
-
|
|
402
|
-
- One JSON object per line (newline-delimited)
|
|
403
|
-
- Fields: `eval_id`, `score`, `hits`, `misses`, `model_answer`, `expected_aspect_count`, `target`, `timestamp`, `reasoning`, `raw_request`, `grader_raw_request`
|
|
404
|
-
|
|
405
|
-
**YAML format (with `--format yaml`):**
|
|
406
|
-
|
|
407
|
-
- Human-readable YAML documents
|
|
408
|
-
- Same fields as JSONL, properly formatted for readability
|
|
409
|
-
- Multi-line strings use literal block style
|
|
410
|
-
|
|
411
|
-
### Summary Statistics
|
|
412
|
-
|
|
413
|
-
After running all eval cases, AgentV displays:
|
|
414
|
-
|
|
415
|
-
- Mean, median, min, max scores
|
|
416
|
-
- Standard deviation
|
|
417
|
-
- Distribution histogram
|
|
418
|
-
- Total eval count and execution time
|
|
419
|
-
|
|
420
|
-
## Architecture
|
|
421
|
-
|
|
422
|
-
AgentV is built as a TypeScript monorepo using:
|
|
423
|
-
|
|
424
|
-
- **pnpm workspaces:** Efficient dependency management
|
|
425
|
-
- **Turbo:** Build system and task orchestration
|
|
426
|
-
- **@ax-llm/ax:** Unified LLM provider abstraction
|
|
427
|
-
- **Vercel AI SDK:** Streaming and tool use capabilities
|
|
428
|
-
- **Zod:** Runtime type validation
|
|
429
|
-
- **Commander.js:** CLI argument parsing
|
|
430
|
-
- **Vitest:** Testing framework
|
|
431
|
-
|
|
432
|
-
### Package Structure
|
|
433
|
-
|
|
434
|
-
- `@agentv/core` - Core evaluation engine, providers, grading logic
|
|
435
|
-
- `agentv` - Main package that bundles CLI functionality
|
|
436
|
-
|
|
437
|
-
## Troubleshooting
|
|
438
|
-
|
|
439
|
-
### Installation Issues
|
|
440
|
-
|
|
441
|
-
**Problem:** Package installation fails or command not found.
|
|
442
|
-
|
|
443
|
-
**Solution:**
|
|
444
|
-
|
|
445
|
-
```bash
|
|
446
|
-
# Clear npm cache and reinstall
|
|
447
|
-
npm cache clean --force
|
|
448
|
-
npm uninstall -g agentv
|
|
449
|
-
npm install -g agentv
|
|
450
|
-
|
|
451
|
-
# Or use npx without installing
|
|
452
|
-
npx agentv@latest --help
|
|
453
|
-
```
|
|
454
|
-
|
|
455
|
-
### VS Code Integration Issues
|
|
456
|
-
|
|
457
|
-
**Problem:** VS Code workspace doesn't open or prompts aren't injected.
|
|
458
|
-
|
|
459
|
-
**Solution:**
|
|
460
|
-
|
|
461
|
-
- Ensure the `subagent` package is installed (should be automatic)
|
|
462
|
-
- Verify your workspace path in `.env` is correct and points to a `.code-workspace` file
|
|
463
|
-
- Close any other VS Code instances before running evals
|
|
464
|
-
- Use `--verbose` flag to see detailed workspace switching logs
|
|
465
|
-
|
|
466
|
-
### Provider Configuration Issues
|
|
467
|
-
|
|
468
|
-
**Problem:** API authentication errors or missing credentials.
|
|
469
|
-
|
|
470
|
-
**Solution:**
|
|
471
|
-
|
|
472
|
-
- Double-check environment variables in your `.env` file
|
|
473
|
-
- Verify the variable names in `targets.yaml` match your `.env` file
|
|
474
|
-
- Use `--dry-run` first to test without making API calls
|
|
475
|
-
- Check provider-specific documentation for required environment variables
|
|
476
|
-
|
|
477
347
|
## License
|
|
478
348
|
|
|
479
349
|
MIT License - see [LICENSE](LICENSE) for details.
|
|
@@ -5040,7 +5040,8 @@ import { exec as execWithCallback } from "node:child_process";
|
|
|
5040
5040
|
import path22 from "node:path";
|
|
5041
5041
|
import { promisify as promisify2 } from "node:util";
|
|
5042
5042
|
import { exec as execCallback, spawn as spawn2 } from "node:child_process";
|
|
5043
|
-
import {
|
|
5043
|
+
import { randomUUID } from "node:crypto";
|
|
5044
|
+
import { constants as constants22, createWriteStream } from "node:fs";
|
|
5044
5045
|
import { access as access22, copyFile as copyFile2, mkdtemp, mkdir as mkdir3, rm as rm2, writeFile as writeFile3 } from "node:fs/promises";
|
|
5045
5046
|
import { tmpdir } from "node:os";
|
|
5046
5047
|
import path42 from "node:path";
|
|
@@ -11032,8 +11033,8 @@ import { constants as constants32 } from "node:fs";
|
|
|
11032
11033
|
import { access as access32, readFile as readFile3 } from "node:fs/promises";
|
|
11033
11034
|
import path62 from "node:path";
|
|
11034
11035
|
import { parse as parse22 } from "yaml";
|
|
11035
|
-
import { randomUUID } from "node:crypto";
|
|
11036
|
-
import { createHash, randomUUID as
|
|
11036
|
+
import { randomUUID as randomUUID2 } from "node:crypto";
|
|
11037
|
+
import { createHash, randomUUID as randomUUID3 } from "node:crypto";
|
|
11037
11038
|
import { mkdir as mkdir22, readFile as readFile4, writeFile as writeFile22 } from "node:fs/promises";
|
|
11038
11039
|
import path72 from "node:path";
|
|
11039
11040
|
var TEST_MESSAGE_ROLE_VALUES = ["system", "user", "assistant", "tool"];
|
|
@@ -12088,6 +12089,7 @@ var CodexProvider = class {
|
|
|
12088
12089
|
collectGuidelineFiles(inputFiles, request.guideline_patterns).map((file) => path42.resolve(file))
|
|
12089
12090
|
);
|
|
12090
12091
|
const workspaceRoot = await this.createWorkspace();
|
|
12092
|
+
const logger = await this.createStreamLogger(request).catch(() => void 0);
|
|
12091
12093
|
try {
|
|
12092
12094
|
const { mirroredInputFiles, guidelineMirrors } = await this.mirrorInputFiles(
|
|
12093
12095
|
inputFiles,
|
|
@@ -12102,7 +12104,7 @@ var CodexProvider = class {
|
|
|
12102
12104
|
await writeFile3(promptFile, promptContent, "utf8");
|
|
12103
12105
|
const args = this.buildCodexArgs();
|
|
12104
12106
|
const cwd = this.resolveCwd(workspaceRoot);
|
|
12105
|
-
const result = await this.executeCodex(args, cwd, promptContent, request.signal);
|
|
12107
|
+
const result = await this.executeCodex(args, cwd, promptContent, request.signal, logger);
|
|
12106
12108
|
if (result.timedOut) {
|
|
12107
12109
|
throw new Error(
|
|
12108
12110
|
`Codex CLI timed out${formatTimeoutSuffix2(this.config.timeoutMs ?? void 0)}`
|
|
@@ -12126,10 +12128,12 @@ var CodexProvider = class {
|
|
|
12126
12128
|
executable: this.resolvedExecutable ?? this.config.executable,
|
|
12127
12129
|
promptFile,
|
|
12128
12130
|
workspace: workspaceRoot,
|
|
12129
|
-
inputFiles: mirroredInputFiles
|
|
12131
|
+
inputFiles: mirroredInputFiles,
|
|
12132
|
+
logFile: logger?.filePath
|
|
12130
12133
|
}
|
|
12131
12134
|
};
|
|
12132
12135
|
} finally {
|
|
12136
|
+
await logger?.close();
|
|
12133
12137
|
await this.cleanupWorkspace(workspaceRoot);
|
|
12134
12138
|
}
|
|
12135
12139
|
}
|
|
@@ -12156,7 +12160,7 @@ var CodexProvider = class {
|
|
|
12156
12160
|
args.push("-");
|
|
12157
12161
|
return args;
|
|
12158
12162
|
}
|
|
12159
|
-
async executeCodex(args, cwd, promptContent, signal) {
|
|
12163
|
+
async executeCodex(args, cwd, promptContent, signal, logger) {
|
|
12160
12164
|
try {
|
|
12161
12165
|
return await this.runCodex({
|
|
12162
12166
|
executable: this.resolvedExecutable ?? this.config.executable,
|
|
@@ -12165,7 +12169,9 @@ var CodexProvider = class {
|
|
|
12165
12169
|
prompt: promptContent,
|
|
12166
12170
|
timeoutMs: this.config.timeoutMs,
|
|
12167
12171
|
env: process.env,
|
|
12168
|
-
signal
|
|
12172
|
+
signal,
|
|
12173
|
+
onStdoutChunk: logger ? (chunk) => logger.handleStdoutChunk(chunk) : void 0,
|
|
12174
|
+
onStderrChunk: logger ? (chunk) => logger.handleStderrChunk(chunk) : void 0
|
|
12169
12175
|
});
|
|
12170
12176
|
} catch (error) {
|
|
12171
12177
|
const err = error;
|
|
@@ -12217,7 +12223,235 @@ var CodexProvider = class {
|
|
|
12217
12223
|
} catch {
|
|
12218
12224
|
}
|
|
12219
12225
|
}
|
|
12226
|
+
resolveLogDirectory() {
|
|
12227
|
+
const disabled = isCodexLogStreamingDisabled();
|
|
12228
|
+
if (disabled) {
|
|
12229
|
+
return void 0;
|
|
12230
|
+
}
|
|
12231
|
+
if (this.config.logDir) {
|
|
12232
|
+
return path42.resolve(this.config.logDir);
|
|
12233
|
+
}
|
|
12234
|
+
return path42.join(process.cwd(), ".agentv", "logs", "codex");
|
|
12235
|
+
}
|
|
12236
|
+
async createStreamLogger(request) {
|
|
12237
|
+
const logDir = this.resolveLogDirectory();
|
|
12238
|
+
if (!logDir) {
|
|
12239
|
+
return void 0;
|
|
12240
|
+
}
|
|
12241
|
+
try {
|
|
12242
|
+
await mkdir3(logDir, { recursive: true });
|
|
12243
|
+
} catch (error) {
|
|
12244
|
+
const message = error instanceof Error ? error.message : String(error);
|
|
12245
|
+
console.warn(`Skipping Codex stream logging (could not create ${logDir}): ${message}`);
|
|
12246
|
+
return void 0;
|
|
12247
|
+
}
|
|
12248
|
+
const filePath = path42.join(logDir, buildLogFilename(request, this.targetName));
|
|
12249
|
+
try {
|
|
12250
|
+
const logger = await CodexStreamLogger.create({
|
|
12251
|
+
filePath,
|
|
12252
|
+
targetName: this.targetName,
|
|
12253
|
+
evalCaseId: request.evalCaseId,
|
|
12254
|
+
attempt: request.attempt,
|
|
12255
|
+
format: this.config.logFormat ?? "summary"
|
|
12256
|
+
});
|
|
12257
|
+
console.log(`Streaming Codex CLI output to ${filePath}`);
|
|
12258
|
+
return logger;
|
|
12259
|
+
} catch (error) {
|
|
12260
|
+
const message = error instanceof Error ? error.message : String(error);
|
|
12261
|
+
console.warn(`Skipping Codex stream logging for ${filePath}: ${message}`);
|
|
12262
|
+
return void 0;
|
|
12263
|
+
}
|
|
12264
|
+
}
|
|
12220
12265
|
};
|
|
12266
|
+
var CodexStreamLogger = class _CodexStreamLogger {
|
|
12267
|
+
filePath;
|
|
12268
|
+
stream;
|
|
12269
|
+
startedAt = Date.now();
|
|
12270
|
+
stdoutBuffer = "";
|
|
12271
|
+
stderrBuffer = "";
|
|
12272
|
+
format;
|
|
12273
|
+
constructor(filePath, format) {
|
|
12274
|
+
this.filePath = filePath;
|
|
12275
|
+
this.format = format;
|
|
12276
|
+
this.stream = createWriteStream(filePath, { flags: "a" });
|
|
12277
|
+
}
|
|
12278
|
+
static async create(options) {
|
|
12279
|
+
const logger = new _CodexStreamLogger(options.filePath, options.format);
|
|
12280
|
+
const header = [
|
|
12281
|
+
"# Codex CLI stream log",
|
|
12282
|
+
`# target: ${options.targetName}`,
|
|
12283
|
+
options.evalCaseId ? `# eval: ${options.evalCaseId}` : void 0,
|
|
12284
|
+
options.attempt !== void 0 ? `# attempt: ${options.attempt + 1}` : void 0,
|
|
12285
|
+
`# started: ${(/* @__PURE__ */ new Date()).toISOString()}`,
|
|
12286
|
+
""
|
|
12287
|
+
].filter((line2) => Boolean(line2));
|
|
12288
|
+
logger.writeLines(header);
|
|
12289
|
+
return logger;
|
|
12290
|
+
}
|
|
12291
|
+
handleStdoutChunk(chunk) {
|
|
12292
|
+
this.stdoutBuffer += chunk;
|
|
12293
|
+
this.flushBuffer("stdout");
|
|
12294
|
+
}
|
|
12295
|
+
handleStderrChunk(chunk) {
|
|
12296
|
+
this.stderrBuffer += chunk;
|
|
12297
|
+
this.flushBuffer("stderr");
|
|
12298
|
+
}
|
|
12299
|
+
async close() {
|
|
12300
|
+
this.flushBuffer("stdout");
|
|
12301
|
+
this.flushBuffer("stderr");
|
|
12302
|
+
this.flushRemainder();
|
|
12303
|
+
await new Promise((resolve, reject) => {
|
|
12304
|
+
this.stream.once("error", reject);
|
|
12305
|
+
this.stream.end(() => resolve());
|
|
12306
|
+
});
|
|
12307
|
+
}
|
|
12308
|
+
writeLines(lines) {
|
|
12309
|
+
for (const line2 of lines) {
|
|
12310
|
+
this.stream.write(`${line2}
|
|
12311
|
+
`);
|
|
12312
|
+
}
|
|
12313
|
+
}
|
|
12314
|
+
flushBuffer(source2) {
|
|
12315
|
+
const buffer2 = source2 === "stdout" ? this.stdoutBuffer : this.stderrBuffer;
|
|
12316
|
+
const lines = buffer2.split(/\r?\n/);
|
|
12317
|
+
const remainder = lines.pop() ?? "";
|
|
12318
|
+
if (source2 === "stdout") {
|
|
12319
|
+
this.stdoutBuffer = remainder;
|
|
12320
|
+
} else {
|
|
12321
|
+
this.stderrBuffer = remainder;
|
|
12322
|
+
}
|
|
12323
|
+
for (const line2 of lines) {
|
|
12324
|
+
const formatted = this.formatLine(line2, source2);
|
|
12325
|
+
if (formatted) {
|
|
12326
|
+
this.stream.write(formatted);
|
|
12327
|
+
this.stream.write("\n");
|
|
12328
|
+
}
|
|
12329
|
+
}
|
|
12330
|
+
}
|
|
12331
|
+
formatLine(rawLine, source2) {
|
|
12332
|
+
const trimmed = rawLine.trim();
|
|
12333
|
+
if (trimmed.length === 0) {
|
|
12334
|
+
return void 0;
|
|
12335
|
+
}
|
|
12336
|
+
const message = this.format === "json" ? formatCodexJsonLog(trimmed) : formatCodexLogMessage(trimmed, source2);
|
|
12337
|
+
return `[+${formatElapsed(this.startedAt)}] [${source2}] ${message}`;
|
|
12338
|
+
}
|
|
12339
|
+
flushRemainder() {
|
|
12340
|
+
const stdoutRemainder = this.stdoutBuffer.trim();
|
|
12341
|
+
if (stdoutRemainder.length > 0) {
|
|
12342
|
+
const formatted = this.formatLine(stdoutRemainder, "stdout");
|
|
12343
|
+
if (formatted) {
|
|
12344
|
+
this.stream.write(formatted);
|
|
12345
|
+
this.stream.write("\n");
|
|
12346
|
+
}
|
|
12347
|
+
}
|
|
12348
|
+
const stderrRemainder = this.stderrBuffer.trim();
|
|
12349
|
+
if (stderrRemainder.length > 0) {
|
|
12350
|
+
const formatted = this.formatLine(stderrRemainder, "stderr");
|
|
12351
|
+
if (formatted) {
|
|
12352
|
+
this.stream.write(formatted);
|
|
12353
|
+
this.stream.write("\n");
|
|
12354
|
+
}
|
|
12355
|
+
}
|
|
12356
|
+
this.stdoutBuffer = "";
|
|
12357
|
+
this.stderrBuffer = "";
|
|
12358
|
+
}
|
|
12359
|
+
};
|
|
12360
|
+
function isCodexLogStreamingDisabled() {
|
|
12361
|
+
const envValue = process.env.AGENTV_CODEX_STREAM_LOGS;
|
|
12362
|
+
if (!envValue) {
|
|
12363
|
+
return false;
|
|
12364
|
+
}
|
|
12365
|
+
const normalized = envValue.trim().toLowerCase();
|
|
12366
|
+
return normalized === "false" || normalized === "0" || normalized === "off";
|
|
12367
|
+
}
|
|
12368
|
+
function buildLogFilename(request, targetName) {
|
|
12369
|
+
const timestamp = (/* @__PURE__ */ new Date()).toISOString().replace(/[:.]/g, "-");
|
|
12370
|
+
const evalId = sanitizeForFilename(request.evalCaseId ?? "codex");
|
|
12371
|
+
const attemptSuffix = request.attempt !== void 0 ? `_attempt-${request.attempt + 1}` : "";
|
|
12372
|
+
const target = sanitizeForFilename(targetName);
|
|
12373
|
+
return `${timestamp}_${target}_${evalId}${attemptSuffix}_${randomUUID().slice(0, 8)}.log`;
|
|
12374
|
+
}
|
|
12375
|
+
function sanitizeForFilename(value) {
|
|
12376
|
+
const sanitized = value.replace(/[^A-Za-z0-9._-]+/g, "_");
|
|
12377
|
+
return sanitized.length > 0 ? sanitized : "codex";
|
|
12378
|
+
}
|
|
12379
|
+
function formatElapsed(startedAt) {
|
|
12380
|
+
const elapsedSeconds = Math.floor((Date.now() - startedAt) / 1e3);
|
|
12381
|
+
const hours = Math.floor(elapsedSeconds / 3600);
|
|
12382
|
+
const minutes = Math.floor(elapsedSeconds % 3600 / 60);
|
|
12383
|
+
const seconds = elapsedSeconds % 60;
|
|
12384
|
+
if (hours > 0) {
|
|
12385
|
+
return `${hours.toString().padStart(2, "0")}:${minutes.toString().padStart(2, "0")}:${seconds.toString().padStart(2, "0")}`;
|
|
12386
|
+
}
|
|
12387
|
+
return `${minutes.toString().padStart(2, "0")}:${seconds.toString().padStart(2, "0")}`;
|
|
12388
|
+
}
|
|
12389
|
+
function formatCodexLogMessage(rawLine, source2) {
|
|
12390
|
+
const parsed = tryParseJsonValue(rawLine);
|
|
12391
|
+
if (parsed) {
|
|
12392
|
+
const summary = summarizeCodexEvent(parsed);
|
|
12393
|
+
if (summary) {
|
|
12394
|
+
return summary;
|
|
12395
|
+
}
|
|
12396
|
+
}
|
|
12397
|
+
if (source2 === "stderr") {
|
|
12398
|
+
return `stderr: ${rawLine}`;
|
|
12399
|
+
}
|
|
12400
|
+
return rawLine;
|
|
12401
|
+
}
|
|
12402
|
+
function formatCodexJsonLog(rawLine) {
|
|
12403
|
+
const parsed = tryParseJsonValue(rawLine);
|
|
12404
|
+
if (!parsed) {
|
|
12405
|
+
return rawLine;
|
|
12406
|
+
}
|
|
12407
|
+
try {
|
|
12408
|
+
return JSON.stringify(parsed, null, 2);
|
|
12409
|
+
} catch {
|
|
12410
|
+
return rawLine;
|
|
12411
|
+
}
|
|
12412
|
+
}
|
|
12413
|
+
function summarizeCodexEvent(event) {
|
|
12414
|
+
if (!event || typeof event !== "object") {
|
|
12415
|
+
return void 0;
|
|
12416
|
+
}
|
|
12417
|
+
const record = event;
|
|
12418
|
+
const type = typeof record.type === "string" ? record.type : void 0;
|
|
12419
|
+
let message = extractFromEvent(event) ?? extractFromItem(record.item) ?? flattenContent(record.output ?? record.content);
|
|
12420
|
+
if (!message && type === JSONL_TYPE_ITEM_COMPLETED) {
|
|
12421
|
+
const item = record.item;
|
|
12422
|
+
if (item && typeof item === "object") {
|
|
12423
|
+
const candidate = flattenContent(
|
|
12424
|
+
item.text ?? item.content ?? item.output
|
|
12425
|
+
);
|
|
12426
|
+
if (candidate) {
|
|
12427
|
+
message = candidate;
|
|
12428
|
+
}
|
|
12429
|
+
}
|
|
12430
|
+
}
|
|
12431
|
+
if (!message) {
|
|
12432
|
+
const itemType = typeof record.item?.type === "string" ? record.item.type : void 0;
|
|
12433
|
+
if (type && itemType) {
|
|
12434
|
+
return `${type}:${itemType}`;
|
|
12435
|
+
}
|
|
12436
|
+
if (type) {
|
|
12437
|
+
return type;
|
|
12438
|
+
}
|
|
12439
|
+
}
|
|
12440
|
+
if (type && message) {
|
|
12441
|
+
return `${type}: ${message}`;
|
|
12442
|
+
}
|
|
12443
|
+
if (message) {
|
|
12444
|
+
return message;
|
|
12445
|
+
}
|
|
12446
|
+
return type;
|
|
12447
|
+
}
|
|
12448
|
+
function tryParseJsonValue(rawLine) {
|
|
12449
|
+
try {
|
|
12450
|
+
return JSON.parse(rawLine);
|
|
12451
|
+
} catch {
|
|
12452
|
+
return void 0;
|
|
12453
|
+
}
|
|
12454
|
+
}
|
|
12221
12455
|
async function locateExecutable(candidate) {
|
|
12222
12456
|
const includesPathSeparator = candidate.includes("/") || candidate.includes("\\");
|
|
12223
12457
|
if (includesPathSeparator) {
|
|
@@ -12487,10 +12721,12 @@ async function defaultCodexRunner(options) {
|
|
|
12487
12721
|
child.stdout.setEncoding("utf8");
|
|
12488
12722
|
child.stdout.on("data", (chunk) => {
|
|
12489
12723
|
stdout += chunk;
|
|
12724
|
+
options.onStdoutChunk?.(chunk);
|
|
12490
12725
|
});
|
|
12491
12726
|
child.stderr.setEncoding("utf8");
|
|
12492
12727
|
child.stderr.on("data", (chunk) => {
|
|
12493
12728
|
stderr += chunk;
|
|
12729
|
+
options.onStderrChunk?.(chunk);
|
|
12494
12730
|
});
|
|
12495
12731
|
child.stdin.end(options.prompt);
|
|
12496
12732
|
const cleanup = () => {
|
|
@@ -12730,6 +12966,8 @@ function resolveCodexConfig(target, env) {
|
|
|
12730
12966
|
const argsSource = settings.args ?? settings.arguments;
|
|
12731
12967
|
const cwdSource = settings.cwd;
|
|
12732
12968
|
const timeoutSource = settings.timeout_seconds ?? settings.timeoutSeconds;
|
|
12969
|
+
const logDirSource = settings.log_dir ?? settings.logDir ?? settings.log_directory ?? settings.logDirectory;
|
|
12970
|
+
const logFormatSource = settings.log_format ?? settings.logFormat ?? settings.log_output_format ?? settings.logOutputFormat ?? env.AGENTV_CODEX_LOG_FORMAT;
|
|
12733
12971
|
const executable = resolveOptionalString(executableSource, env, `${target.name} codex executable`, {
|
|
12734
12972
|
allowLiteral: true,
|
|
12735
12973
|
optionalEnv: true
|
|
@@ -12740,13 +12978,33 @@ function resolveCodexConfig(target, env) {
|
|
|
12740
12978
|
optionalEnv: true
|
|
12741
12979
|
});
|
|
12742
12980
|
const timeoutMs = resolveTimeoutMs(timeoutSource, `${target.name} codex timeout`);
|
|
12981
|
+
const logDir = resolveOptionalString(logDirSource, env, `${target.name} codex log directory`, {
|
|
12982
|
+
allowLiteral: true,
|
|
12983
|
+
optionalEnv: true
|
|
12984
|
+
});
|
|
12985
|
+
const logFormat = normalizeCodexLogFormat(logFormatSource);
|
|
12743
12986
|
return {
|
|
12744
12987
|
executable,
|
|
12745
12988
|
args,
|
|
12746
12989
|
cwd,
|
|
12747
|
-
timeoutMs
|
|
12990
|
+
timeoutMs,
|
|
12991
|
+
logDir,
|
|
12992
|
+
logFormat
|
|
12748
12993
|
};
|
|
12749
12994
|
}
|
|
12995
|
+
function normalizeCodexLogFormat(value) {
|
|
12996
|
+
if (value === void 0 || value === null) {
|
|
12997
|
+
return void 0;
|
|
12998
|
+
}
|
|
12999
|
+
if (typeof value !== "string") {
|
|
13000
|
+
throw new Error("codex log format must be 'summary' or 'json'");
|
|
13001
|
+
}
|
|
13002
|
+
const normalized = value.trim().toLowerCase();
|
|
13003
|
+
if (normalized === "json" || normalized === "summary") {
|
|
13004
|
+
return normalized;
|
|
13005
|
+
}
|
|
13006
|
+
throw new Error("codex log format must be 'summary' or 'json'");
|
|
13007
|
+
}
|
|
12750
13008
|
function resolveMockConfig(target) {
|
|
12751
13009
|
const settings = target.settings ?? {};
|
|
12752
13010
|
const response = typeof settings.response === "string" ? settings.response : void 0;
|
|
@@ -13394,7 +13652,7 @@ var LlmJudgeEvaluator = class {
|
|
|
13394
13652
|
const misses = Array.isArray(parsed.misses) ? parsed.misses.filter(isNonEmptyString).slice(0, 4) : [];
|
|
13395
13653
|
const reasoning = parsed.reasoning ?? response.reasoning;
|
|
13396
13654
|
const evaluatorRawRequest = {
|
|
13397
|
-
id:
|
|
13655
|
+
id: randomUUID2(),
|
|
13398
13656
|
provider: judgeProvider.id,
|
|
13399
13657
|
prompt,
|
|
13400
13658
|
target: context2.target.name,
|
|
@@ -14395,7 +14653,7 @@ function sanitizeFilename(value) {
|
|
|
14395
14653
|
return "prompt";
|
|
14396
14654
|
}
|
|
14397
14655
|
const sanitized = value.replace(/[^A-Za-z0-9._-]+/g, "_");
|
|
14398
|
-
return sanitized.length > 0 ? sanitized :
|
|
14656
|
+
return sanitized.length > 0 ? sanitized : randomUUID3();
|
|
14399
14657
|
}
|
|
14400
14658
|
async function invokeProvider(provider, options) {
|
|
14401
14659
|
const { evalCase, promptInputs, attempt, agentTimeoutMs, signal } = options;
|
|
@@ -14756,7 +15014,7 @@ var Mutex = class {
|
|
|
14756
15014
|
};
|
|
14757
15015
|
|
|
14758
15016
|
// src/commands/eval/jsonl-writer.ts
|
|
14759
|
-
import { createWriteStream } from "node:fs";
|
|
15017
|
+
import { createWriteStream as createWriteStream2 } from "node:fs";
|
|
14760
15018
|
import { mkdir as mkdir4 } from "node:fs/promises";
|
|
14761
15019
|
import path10 from "node:path";
|
|
14762
15020
|
import { finished } from "node:stream/promises";
|
|
@@ -14769,7 +15027,7 @@ var JsonlWriter = class _JsonlWriter {
|
|
|
14769
15027
|
}
|
|
14770
15028
|
static async open(filePath) {
|
|
14771
15029
|
await mkdir4(path10.dirname(filePath), { recursive: true });
|
|
14772
|
-
const stream =
|
|
15030
|
+
const stream = createWriteStream2(filePath, { flags: "w", encoding: "utf8" });
|
|
14773
15031
|
return new _JsonlWriter(stream);
|
|
14774
15032
|
}
|
|
14775
15033
|
async append(record) {
|
|
@@ -14798,7 +15056,7 @@ var JsonlWriter = class _JsonlWriter {
|
|
|
14798
15056
|
};
|
|
14799
15057
|
|
|
14800
15058
|
// src/commands/eval/yaml-writer.ts
|
|
14801
|
-
import { createWriteStream as
|
|
15059
|
+
import { createWriteStream as createWriteStream3 } from "node:fs";
|
|
14802
15060
|
import { mkdir as mkdir5 } from "node:fs/promises";
|
|
14803
15061
|
import path11 from "node:path";
|
|
14804
15062
|
import { finished as finished2 } from "node:stream/promises";
|
|
@@ -14813,7 +15071,7 @@ var YamlWriter = class _YamlWriter {
|
|
|
14813
15071
|
}
|
|
14814
15072
|
static async open(filePath) {
|
|
14815
15073
|
await mkdir5(path11.dirname(filePath), { recursive: true });
|
|
14816
|
-
const stream =
|
|
15074
|
+
const stream = createWriteStream3(filePath, { flags: "w", encoding: "utf8" });
|
|
14817
15075
|
return new _YamlWriter(stream);
|
|
14818
15076
|
}
|
|
14819
15077
|
async append(record) {
|
|
@@ -15269,7 +15527,7 @@ function normalizeNumber(value, fallback) {
|
|
|
15269
15527
|
return fallback;
|
|
15270
15528
|
}
|
|
15271
15529
|
function normalizeOptions(rawOptions) {
|
|
15272
|
-
const formatStr = normalizeString(rawOptions.
|
|
15530
|
+
const formatStr = normalizeString(rawOptions.outputFormat) ?? "jsonl";
|
|
15273
15531
|
const format = formatStr === "yaml" ? "yaml" : "jsonl";
|
|
15274
15532
|
const workers = normalizeNumber(rawOptions.workers, 0);
|
|
15275
15533
|
return {
|
|
@@ -15489,7 +15747,11 @@ function registerEvalCommand(program) {
|
|
|
15489
15747
|
"--workers <count>",
|
|
15490
15748
|
"Number of parallel workers (default: 1, max: 50). Can also be set per-target in targets.yaml",
|
|
15491
15749
|
(value) => parseInteger(value, 1)
|
|
15492
|
-
).option("--out <path>", "Write results to the specified path").option(
|
|
15750
|
+
).option("--out <path>", "Write results to the specified path").option(
|
|
15751
|
+
"--output-format <format>",
|
|
15752
|
+
"Output format: 'jsonl' or 'yaml' (default: jsonl)",
|
|
15753
|
+
"jsonl"
|
|
15754
|
+
).option("--dry-run", "Use mock provider responses instead of real LLM calls", false).option(
|
|
15493
15755
|
"--dry-run-delay <ms>",
|
|
15494
15756
|
"Fixed delay in milliseconds for dry-run mode (overridden by delay range if specified)",
|
|
15495
15757
|
(value) => parseInteger(value, 0),
|
|
@@ -16618,4 +16880,4 @@ export {
|
|
|
16618
16880
|
createProgram,
|
|
16619
16881
|
runCli
|
|
16620
16882
|
};
|
|
16621
|
-
//# sourceMappingURL=chunk-
|
|
16883
|
+
//# sourceMappingURL=chunk-5WBKOCCW.js.map
|