incantx 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md ADDED
@@ -0,0 +1,254 @@
1
+ # incantx
2
+
3
+ Test agent conversation flows (including tool calls) using declarative YAML fixtures.
4
+
5
+ This repo is intentionally **early-stage**: the runner/CLI works for “next message” assertions (including tool calls), while multi-step tool-execution loops and richer reporting are still in progress.
6
+
7
+ ## Getting started
8
+
9
+ ```bash
10
+ bun install
11
+ bun src/cli.ts tests/fixtures
12
+ ```
13
+
14
+ ## Installation
15
+
16
+ incantx’s CLI is a Bun script (`#!/usr/bin/env bun`), so you need Bun installed even if you install incantx via npm.
17
+
18
+ Global install (choose one):
19
+
20
+ ```bash
21
+ # Bun
22
+ bun add -g incantx
23
+
24
+ # npm
25
+ npm i -g incantx
26
+ ```
27
+
28
+ Then run:
29
+
30
+ ```bash
31
+ incantx path/to/fixtures --judge off
32
+ ```
33
+
34
+ ### Use as a linked CLI (Bun)
35
+
36
+ From this repo:
37
+
38
+ ```bash
39
+ bun link
40
+ ```
41
+
42
+ From another repo:
43
+
44
+ ```bash
45
+ bun link incantx
46
+ incantx path/to/fixtures --judge off
47
+ ```
48
+
49
+ ## What this is
50
+
51
+ - A test framework for “agents” that behave like a **single Chat Completions call**: input is `{messages, tools}` and output is the **next assistant message** (which may include `tool_calls`).
52
+ - A fixture format that supports:
53
+ - Starting tests **mid-conversation** via full message history.
54
+ - Asserting on the **next assistant message**, including **tool call expectations**.
55
+ - (Planned) executing tools and continuing until the agent returns a final message.
56
+ - LLM-based assertions (“judge”) for fuzzy/semantic checks (implemented via OpenAI in `auto` mode; skipped if no `OPENAI_API_KEY`).
57
+
58
+ ## Repo layout
59
+
60
+ - `src/agent/`: agent types + example agents
61
+ - `src/agent/types.ts`: OpenAI-style message/tool-call types
62
+ - `src/agent/exampleJsonlAgent.ts`: a language-agnostic “agent process” example (JSONL over stdin/stdout)
63
+ - `src/fixture/`: fixture schema types
64
+ - `src/fixture/types.ts`: YAML fixture file types (agent command + expectations)
65
+ - `tests/fixtures/`: example YAML fixtures
66
+ - `tests/fixtures/weather.yaml`: fixture file that targets `exampleJsonlAgent.ts`
67
+
68
+ ## CLI
69
+
70
+ Run fixtures from a file or a directory:
71
+
72
+ ```bash
73
+ incantx tests/fixtures
74
+ ```
75
+
76
+ LLM judge modes:
77
+
78
+ - `--judge auto` (default): use OpenAI judge if `OPENAI_API_KEY` is set; otherwise skip LLM assertions
79
+ - `--judge off`: never call an LLM judge
80
+ - `--judge on`: require `OPENAI_API_KEY` (fail if missing)
81
+
82
+ Example (no judge calls):
83
+
84
+ ```bash
85
+ incantx tests/fixtures --judge off
86
+ ```
87
+
88
+ ## YAML fixtures
89
+
90
+ Fixture files are YAML and contain:
91
+
92
+ - a file-level `agent` config (optional), and
93
+ - a `fixtures[]` list, each with:
94
+ - optional `history[]` (OpenAI Chat Completions message format)
95
+ - `input` (the new user message to append)
96
+ - optional `expect` (assertions on the next assistant message)
97
+
98
+ ### Example fixture file
99
+
100
+ See `tests/fixtures/weather.yaml` for a complete working example. The key idea is that `history` is verbatim OpenAI-style messages (including `assistant.tool_calls` and `role: tool` messages), so you can paste real traces.
101
+
102
+ ```yaml
103
+ agent:
104
+ type: subprocess
105
+ command: ["bun", "src/agent/exampleJsonlAgent.ts"]
106
+
107
+ fixtures:
108
+ - id: weather-requests-tool
109
+ history:
110
+ - role: system
111
+ content: You are a helpful assistant.
112
+ input: What's the weather in Dublin?
113
+ expect:
114
+ tool_calls_match: contains
115
+ tool_calls:
116
+ - name: get_weather
117
+ arguments:
118
+ location: Dublin
119
+ unit: c
120
+ ```
121
+
122
+ ## Expectations
123
+
124
+ All expectations apply to the **next assistant message** produced after the runner:
125
+
126
+ 1. takes `history` (if any),
127
+ 2. appends `{ role: "user", content: input }`,
128
+ 3. calls the agent once.
129
+
130
+ ### Expecting tool calls
131
+
132
+ Use `expect.tool_calls` to assert that the next assistant message includes tool calls.
133
+
134
+ - `expect.tool_calls_match: contains` (default): each expected tool call must appear somewhere in the returned `tool_calls`; extra tool calls are allowed; order is ignored.
135
+ - `expect.tool_calls_match: exact`: the returned `tool_calls` must match exactly (same length/order and matching entries).
136
+
137
+ Tool call matching details:
138
+
139
+ - `name` matches `tool_calls[].function.name`.
140
+ - `arguments` is a **subset match** against the parsed JSON from `tool_calls[].function.arguments` (which is a JSON string in OpenAI format).
141
+
142
+ ### Expecting assistant content (LLM-judged)
143
+
144
+ Use `expect.assistant.llm` to express the intended outcome in natural language.
145
+
146
+ The CLI can grade this using an LLM judge. By default (`--judge auto`), it will only run if `OPENAI_API_KEY` is set; otherwise these checks are marked `SKIP`.
147
+
148
+ ## Agent integration (language-agnostic)
149
+
150
+ ### Subprocess (recommended for local agents)
151
+
152
+ For local, language-agnostic agents, the runner will spawn a subprocess and communicate via **JSON Lines** (one JSON object per line) over stdin/stdout.
153
+
154
+ - Runner writes one request JSON object per line to the agent’s stdin.
155
+ - Agent writes one response JSON object per line to stdout.
156
+ - One message per call (no streaming).
157
+
158
+ **Request (minimal)**
159
+
160
+ ```json
161
+ {
162
+ "messages": [{ "role": "user", "content": "hi" }],
163
+ "tools": [],
164
+ "tool_choice": "auto",
165
+ "model": "optional-model-id"
166
+ }
167
+ ```
168
+
169
+ #### How history is passed
170
+
171
+ History is passed by sending the full conversation so far in `messages` **on every call** (Chat Completions style). This is what makes “start tests mid-conversation” possible: the runner simply begins `messages` with whatever prior turns you want.
172
+
173
+ When tools are involved, history typically includes:
174
+
175
+ 1. an assistant message containing `tool_calls`, then
176
+ 2. one or more `role: "tool"` messages containing tool results (each with `tool_call_id`), then
177
+ 3. the next user message, etc.
178
+
179
+ Example (abridged):
180
+
181
+ ```json
182
+ {
183
+ "messages": [
184
+ { "role": "system", "content": "You are a helpful assistant." },
185
+ { "role": "user", "content": "What's the weather in Dublin?" },
186
+ {
187
+ "role": "assistant",
188
+ "content": "",
189
+ "tool_calls": [
190
+ {
191
+ "id": "call_1",
192
+ "type": "function",
193
+ "function": { "name": "get_weather", "arguments": "{\"location\":\"Dublin\"}" }
194
+ }
195
+ ]
196
+ },
197
+ {
198
+ "role": "tool",
199
+ "tool_call_id": "call_1",
200
+ "name": "get_weather",
201
+ "content": "{\"temp_c\":10,\"condition\":\"Rain\"}"
202
+ },
203
+ { "role": "user", "content": "Should I bring an umbrella?" }
204
+ ]
205
+ }
206
+ ```
207
+
208
+ **Response (minimal)**
209
+
210
+ ```json
211
+ {
212
+ "message": { "role": "assistant", "content": "hello", "tool_calls": [] }
213
+ }
214
+ ```
215
+
216
+ If the agent process can’t handle the request, it should return:
217
+
218
+ ```json
219
+ { "error": { "message": "..." } }
220
+ ```
221
+
222
+ ### HTTP (optional; useful later for remote agents)
223
+
224
+ Once you want to test remote agents, a small OpenAI-compatible subset (e.g. `POST /chat/completions`) is a good secondary adapter. Payload shape should match the `messages`/`tools` schema above.
225
+
226
+ ## Example agent process
227
+
228
+ `src/agent/exampleJsonlAgent.ts` is a reference implementation of the subprocess protocol. It wraps a tiny mock agent (`createMockWeatherAgent`) that demonstrates:
229
+
230
+ - returning `tool_calls` when asked about “weather”
231
+ - using a prior `role: tool` message to answer follow-ups like “umbrella”
232
+
233
+ ## Roadmap
234
+
235
+ - Tool execution loop:
236
+ - if the assistant returns tool calls, run tools, append `role: tool` messages, and continue until no tool calls
237
+ - LLM judge + deterministic grading:
238
+ - stable prompts, scoring, and report output
239
+ - CLI + GitHub Action wrapper:
240
+ - run fixture directories, emit JSON/markdown summaries
241
+
242
+ ## Publishing
243
+
244
+ Preview the npm tarball contents:
245
+
246
+ ```bash
247
+ bun run publish:dry
248
+ ```
249
+
250
+ This repo uses `prepack`/`prepublishOnly` scripts to build `dist/` and run checks before publishing.
251
+
252
+ ## License
253
+
254
+ Currently `UNLICENSED` (set in `package.json`). Change this before publishing if you intend open-source use.
package/dist/cli.js ADDED
@@ -0,0 +1,460 @@
1
+ #!/usr/bin/env bun
2
+ // @bun
3
+
4
+ // src/cli.ts
5
+ import { stat } from "fs/promises";
6
+ import { extname, resolve as resolve2 } from "path";
7
+ var {Glob } = globalThis.Bun;
8
+
9
+ // src/runner/runFixtureFile.ts
10
+ import { readFile } from "fs/promises";
11
+ import { resolve } from "path";
12
+
13
+ // src/fixture/load.ts
14
+ import { parse as parseYaml } from "yaml";
15
+ function expandEnvVars(value, env) {
16
+ return value.replace(/\$\{([A-Z0-9_]+)\}/gi, (_, key) => env[key] ?? "");
17
+ }
18
+ function normalizeAgentSpec(raw, env) {
19
+ const normalized = { ...raw };
20
+ if (normalized.type === undefined)
21
+ normalized.type = "subprocess";
22
+ if (!Array.isArray(normalized.command) || normalized.command.length === 0) {
23
+ throw new Error("`agent.command` must be a non-empty array of strings.");
24
+ }
25
+ normalized.command = normalized.command.map((part) => String(part));
26
+ if (normalized.cwd !== undefined)
27
+ normalized.cwd = String(normalized.cwd);
28
+ if (normalized.timeout_ms !== undefined)
29
+ normalized.timeout_ms = Number(normalized.timeout_ms);
30
+ if (normalized.env) {
31
+ const expanded = {};
32
+ for (const [k, v] of Object.entries(normalized.env)) {
33
+ expanded[k] = expandEnvVars(String(v), env);
34
+ }
35
+ normalized.env = expanded;
36
+ }
37
+ return normalized;
38
+ }
39
+ function loadFixtureFile(yamlText, env = process.env) {
40
+ const data = parseYaml(yamlText);
41
+ if (!data || typeof data !== "object")
42
+ throw new Error("Fixture file must be a YAML object.");
43
+ const file = data;
44
+ if (!Array.isArray(file.fixtures))
45
+ throw new Error("Fixture file must contain `fixtures: [...]`.");
46
+ const normalized = {
47
+ fixtures: file.fixtures
48
+ };
49
+ if (file.agent)
50
+ normalized.agent = normalizeAgentSpec(file.agent, env);
51
+ normalized.fixtures = normalized.fixtures.map((fixture, index) => {
52
+ if (!fixture || typeof fixture !== "object")
53
+ throw new Error(`fixtures[${index}] must be an object.`);
54
+ if (!("id" in fixture))
55
+ throw new Error(`fixtures[${index}].id is required.`);
56
+ if (!("input" in fixture))
57
+ throw new Error(`fixtures[${index}].input is required.`);
58
+ const out = { ...fixture };
59
+ out.id = String(out.id);
60
+ out.input = String(out.input);
61
+ if (out.agent)
62
+ out.agent = normalizeAgentSpec(out.agent, env);
63
+ if (out.expect?.tool_calls_match === undefined && out.expect?.tool_calls)
64
+ out.expect.tool_calls_match = "contains";
65
+ return out;
66
+ });
67
+ return normalized;
68
+ }
69
+
70
+ // src/judge/openaiJudge.ts
71
+ function createOpenAIJudge(options) {
72
+ const baseUrl = options.baseUrl ?? process.env.OPENAI_BASE_URL ?? "https://api.openai.com/v1";
73
+ const model = options.model ?? process.env.OPENAI_JUDGE_MODEL ?? "gpt-4o-mini";
74
+ return async ({ expectation, message }) => {
75
+ const system = "You are a strict test evaluator. Decide if the assistant message satisfies the expectation. " + 'Respond with ONLY valid JSON: {"pass": boolean, "reason": string}.';
76
+ const user = JSON.stringify({
77
+ expectation,
78
+ assistant_message: message
79
+ }, null, 2);
80
+ const res = await fetch(`${baseUrl}/chat/completions`, {
81
+ method: "POST",
82
+ headers: {
83
+ Authorization: `Bearer ${options.apiKey}`,
84
+ "Content-Type": "application/json"
85
+ },
86
+ body: JSON.stringify({
87
+ model,
88
+ temperature: 0,
89
+ messages: [
90
+ { role: "system", content: system },
91
+ { role: "user", content: user }
92
+ ]
93
+ })
94
+ });
95
+ if (!res.ok) {
96
+ const body = await res.text().catch(() => "");
97
+ return {
98
+ status: "fail",
99
+ reason: `Judge call failed: ${res.status} ${res.statusText}${body ? `
100
+ ${body}` : ""}`
101
+ };
102
+ }
103
+ const data = await res.json();
104
+ const content = data?.choices?.[0]?.message?.content;
105
+ if (typeof content !== "string" || content.trim().length === 0) {
106
+ return { status: "fail", reason: "Judge returned no content." };
107
+ }
108
+ let parsed;
109
+ try {
110
+ parsed = JSON.parse(content);
111
+ } catch {
112
+ return { status: "fail", reason: `Judge did not return valid JSON.
113
+ ${content}` };
114
+ }
115
+ if (parsed.pass)
116
+ return { status: "pass" };
117
+ return { status: "fail", reason: parsed.reason || "Expectation not satisfied." };
118
+ };
119
+ }
120
+
121
+ // src/runner/subprocessAgent.ts
122
+ function isSuccess(value) {
123
+ return !!value && typeof value === "object" && "message" in value;
124
+ }
125
+ function isError(value) {
126
+ return !!value && typeof value === "object" && "error" in value;
127
+ }
128
+ async function readFirstNonEmptyLine(stream, timeoutMs) {
129
+ const reader = stream.getReader();
130
+ const decoder = new TextDecoder;
131
+ let buffer = "";
132
+ const deadline = Date.now() + timeoutMs;
133
+ while (true) {
134
+ const timeLeft = deadline - Date.now();
135
+ if (timeLeft <= 0)
136
+ throw new Error(`Timed out waiting for agent response after ${timeoutMs}ms.`);
137
+ let timerId;
138
+ const timerPromise = new Promise((_, reject) => {
139
+ timerId = setTimeout(() => reject(new Error(`Timed out waiting for agent response after ${timeoutMs}ms.`)), timeLeft);
140
+ });
141
+ let chunk;
142
+ try {
143
+ chunk = await Promise.race([reader.read(), timerPromise]);
144
+ } finally {
145
+ if (timerId !== undefined)
146
+ clearTimeout(timerId);
147
+ }
148
+ const { value, done } = chunk;
149
+ if (done)
150
+ break;
151
+ buffer += decoder.decode(value, { stream: true });
152
+ while (true) {
153
+ const idx = buffer.indexOf(`
154
+ `);
155
+ if (idx === -1)
156
+ break;
157
+ const line = buffer.slice(0, idx).trim();
158
+ buffer = buffer.slice(idx + 1);
159
+ if (line.length > 0)
160
+ return line;
161
+ }
162
+ }
163
+ const tail = buffer.trim();
164
+ if (tail.length > 0)
165
+ return tail;
166
+ throw new Error("Agent produced no output on stdout.");
167
+ }
168
+ async function callSubprocessAgent(spec, request) {
169
+ const timeoutMs = spec.timeout_ms ?? 20000;
170
+ const proc = Bun.spawn(spec.command, {
171
+ cwd: spec.cwd,
172
+ env: { ...process.env, ...spec.env ?? {} },
173
+ stdin: "pipe",
174
+ stdout: "pipe",
175
+ stderr: "pipe"
176
+ });
177
+ const stdin = proc.stdin;
178
+ if (!stdin)
179
+ throw new Error("Failed to open agent stdin.");
180
+ stdin.write(`${JSON.stringify(request)}
181
+ `);
182
+ stdin.end();
183
+ let stderrText = "";
184
+ const stderrPromise = (async () => {
185
+ if (!proc.stderr)
186
+ return;
187
+ stderrText = await new Response(proc.stderr).text().catch(() => "");
188
+ })();
189
+ try {
190
+ if (!proc.stdout)
191
+ throw new Error("Failed to open agent stdout.");
192
+ const line = await readFirstNonEmptyLine(proc.stdout, timeoutMs);
193
+ let payload;
194
+ try {
195
+ payload = JSON.parse(line);
196
+ } catch {
197
+ throw new Error(`Agent stdout is not valid JSON.
198
+ ` + `Line: ${line}
199
+ ` + (stderrText ? `Stderr:
200
+ ${stderrText}` : ""));
201
+ }
202
+ if (isError(payload))
203
+ throw new Error(payload.error.message);
204
+ if (!isSuccess(payload)) {
205
+ throw new Error(`Agent response must be { "message": { ... } } or { "error": { "message": ... } }.`);
206
+ }
207
+ return payload.message;
208
+ } finally {
209
+ proc.kill();
210
+ await Promise.allSettled([proc.exited, stderrPromise]);
211
+ }
212
+ }
213
+
214
+ // src/runner/deepMatch.ts
215
+ function deepPartialMatch(expected, actual) {
216
+ if (expected === actual)
217
+ return true;
218
+ if (expected === null || actual === null)
219
+ return expected === actual;
220
+ const expectedType = typeof expected;
221
+ const actualType = typeof actual;
222
+ if (expectedType !== actualType)
223
+ return false;
224
+ if (Array.isArray(expected)) {
225
+ if (!Array.isArray(actual))
226
+ return false;
227
+ if (expected.length !== actual.length)
228
+ return false;
229
+ for (let i = 0;i < expected.length; i++) {
230
+ if (!deepPartialMatch(expected[i], actual[i]))
231
+ return false;
232
+ }
233
+ return true;
234
+ }
235
+ if (expectedType === "object") {
236
+ if (Array.isArray(actual))
237
+ return false;
238
+ const expectedObj = expected;
239
+ const actualObj = actual;
240
+ for (const [key, expectedValue] of Object.entries(expectedObj)) {
241
+ if (!(key in actualObj))
242
+ return false;
243
+ if (!deepPartialMatch(expectedValue, actualObj[key]))
244
+ return false;
245
+ }
246
+ return true;
247
+ }
248
+ return false;
249
+ }
250
+
251
+ // src/runner/expectations.ts
252
+ function parseJsonOrUndefined(value) {
253
+ try {
254
+ return JSON.parse(value);
255
+ } catch {
256
+ return;
257
+ }
258
+ }
259
+ function matchToolCall(expected, actual) {
260
+ if (actual.function.name !== expected.name)
261
+ return false;
262
+ if (expected.arguments === undefined)
263
+ return true;
264
+ const actualArgs = parseJsonOrUndefined(actual.function.arguments);
265
+ if (actualArgs === undefined)
266
+ return false;
267
+ return deepPartialMatch(expected.arguments, actualArgs);
268
+ }
269
+ function checkToolCalls(expect, message) {
270
+ const expectedCalls = expect.tool_calls ?? [];
271
+ if (expectedCalls.length === 0)
272
+ return { status: "pass" };
273
+ const actualCalls = message.tool_calls ?? [];
274
+ const mode = expect.tool_calls_match ?? "contains";
275
+ if (mode === "contains") {
276
+ for (const expected of expectedCalls) {
277
+ const ok = actualCalls.some((actual) => matchToolCall(expected, actual));
278
+ if (!ok) {
279
+ return {
280
+ status: "fail",
281
+ reason: `Expected tool call not found: ${expected.name}`
282
+ };
283
+ }
284
+ }
285
+ return { status: "pass" };
286
+ }
287
+ if (actualCalls.length !== expectedCalls.length) {
288
+ return {
289
+ status: "fail",
290
+ reason: `Expected exactly ${expectedCalls.length} tool call(s), got ${actualCalls.length}.`
291
+ };
292
+ }
293
+ for (let i = 0;i < expectedCalls.length; i++) {
294
+ const expected = expectedCalls[i];
295
+ const actual = actualCalls[i];
296
+ if (!expected || !actual || !matchToolCall(expected, actual)) {
297
+ return { status: "fail", reason: `Tool call mismatch at index ${i}.` };
298
+ }
299
+ }
300
+ return { status: "pass" };
301
+ }
302
+ async function evaluateExpectations(expect, message, judge) {
303
+ if (!expect)
304
+ return { status: "pass" };
305
+ const toolRes = checkToolCalls(expect, message);
306
+ if (toolRes.status !== "pass")
307
+ return toolRes;
308
+ if (expect.assistant?.llm) {
309
+ if (!judge)
310
+ return { status: "skip", reason: "LLM judge not configured." };
311
+ return await judge({ expectation: expect.assistant.llm, message });
312
+ }
313
+ return { status: "pass" };
314
+ }
315
+
316
+ // src/runner/runFixtureFile.ts
317
+ function pickAgentSpec(file, fixture) {
318
+ const agent = fixture.agent ?? file.agent;
319
+ if (!agent)
320
+ throw new Error(`Fixture '${fixture.id}' has no agent. Add file-level 'agent:' or fixture-level 'agent:'.`);
321
+ if ((agent.type ?? "subprocess") !== "subprocess")
322
+ throw new Error(`Unsupported agent type: ${agent.type}`);
323
+ return agent;
324
+ }
325
+ function makeJudge(mode, model) {
326
+ if (mode === "off")
327
+ return;
328
+ const apiKey = process.env.OPENAI_API_KEY;
329
+ if (!apiKey) {
330
+ if (mode === "on")
331
+ throw new Error("Judge mode is 'on' but OPENAI_API_KEY is not set.");
332
+ return;
333
+ }
334
+ return createOpenAIJudge({ apiKey, model });
335
+ }
336
+ async function runFixtureFile(path, options = {}) {
337
+ const absolute = resolve(path);
338
+ const yamlText = await readFile(absolute, "utf8");
339
+ const file = loadFixtureFile(yamlText);
340
+ const judgeMode = options.judgeMode ?? "auto";
341
+ const judge = makeJudge(judgeMode, options.judgeModel);
342
+ const results = [];
343
+ for (const fixture of file.fixtures) {
344
+ try {
345
+ const agent = pickAgentSpec(file, fixture);
346
+ const messages = [...fixture.history ?? [], { role: "user", content: fixture.input }];
347
+ const message = await callSubprocessAgent(agent, {
348
+ messages,
349
+ tools: [],
350
+ tool_choice: "auto"
351
+ });
352
+ const expectation = await evaluateExpectations(fixture.expect, message, judge);
353
+ results.push({
354
+ id: fixture.id,
355
+ status: expectation.status,
356
+ reason: expectation.reason,
357
+ message
358
+ });
359
+ } catch (err) {
360
+ const reason = err instanceof Error ? err.message : String(err);
361
+ results.push({ id: fixture.id, status: "fail", reason });
362
+ }
363
+ }
364
+ return { path: absolute, results };
365
+ }
366
+
367
+ // src/cli.ts
368
+ function usage() {
369
+ return [
370
+ "Usage:",
371
+ " incantx <file-or-dir> [--judge auto|off|on] [--judge-model <model>]",
372
+ " incantx run <file-or-dir> [--judge auto|off|on] [--judge-model <model>]",
373
+ "",
374
+ "Examples:",
375
+ " incantx tests/fixtures/weather.yaml",
376
+ " incantx tests/fixtures --judge off"
377
+ ].join(`
378
+ `);
379
+ }
380
+ function parseArgs(argv) {
381
+ const [first, ...restAll] = argv;
382
+ const opts = {};
383
+ if (!first)
384
+ return { command: "help" };
385
+ if (first === "-h" || first === "--help")
386
+ return { command: "help" };
387
+ const target = first === "run" ? restAll[0] : first;
388
+ const rest = first === "run" ? restAll.slice(1) : restAll;
389
+ for (let i = 0;i < rest.length; i++) {
390
+ const a = rest[i];
391
+ if (a === "--judge")
392
+ opts.judge = rest[++i];
393
+ else if (a === "--judge-model")
394
+ opts.judgeModel = rest[++i];
395
+ else if (a === "-h" || a === "--help")
396
+ return { command: "help" };
397
+ else
398
+ throw new Error(`Unknown arg: ${a}`);
399
+ }
400
+ return { command: "run", target, opts };
401
+ }
402
+ async function listFixtureFiles(target) {
403
+ const abs = resolve2(target);
404
+ const s = await stat(abs);
405
+ if (s.isFile())
406
+ return [abs];
407
+ const glob = new Glob("**/*.{yaml,yml}");
408
+ const out = [];
409
+ for await (const rel of glob.scan({ cwd: abs, onlyFiles: true })) {
410
+ out.push(resolve2(abs, rel));
411
+ }
412
+ out.sort();
413
+ return out;
414
+ }
415
+ function formatStatus(status) {
416
+ if (status === "pass")
417
+ return "PASS";
418
+ if (status === "skip")
419
+ return "SKIP";
420
+ return "FAIL";
421
+ }
422
+ async function main() {
423
+ const parsed = parseArgs(process.argv.slice(2));
424
+ if (parsed.command === "help" || !("command" in parsed) || parsed.command !== "run") {
425
+ console.log(usage());
426
+ process.exit(0);
427
+ }
428
+ if (!parsed.target)
429
+ throw new Error("Missing <file-or-dir>.");
430
+ const files = await listFixtureFiles(parsed.target);
431
+ if (files.length === 0)
432
+ throw new Error(`No fixture files found under: ${parsed.target}`);
433
+ let pass = 0;
434
+ let fail = 0;
435
+ let skip = 0;
436
+ for (const file of files) {
437
+ if (![".yaml", ".yml"].includes(extname(file)))
438
+ continue;
439
+ const res = await runFixtureFile(file, {
440
+ judgeMode: parsed.opts.judge ?? "auto",
441
+ judgeModel: parsed.opts.judgeModel
442
+ });
443
+ console.log(res.path);
444
+ for (const r of res.results) {
445
+ console.log(` ${formatStatus(r.status)} ${r.id}${r.reason ? ` \u2014 ${r.reason}` : ""}`);
446
+ if (r.status === "pass")
447
+ pass++;
448
+ else if (r.status === "skip")
449
+ skip++;
450
+ else
451
+ fail++;
452
+ }
453
+ }
454
+ console.log(`
455
+ Summary: ${pass} passed, ${fail} failed, ${skip} skipped`);
456
+ process.exit(fail > 0 ? 1 : 0);
457
+ }
458
+ await main();
459
+
460
+ //# debugId=E9C824803FAB034D64756E2164756E21