npm - incantx - Versions diffs - 0.1.0 → 0.1.2 - Mend

incantx 0.1.0 → 0.1.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (10) hide show

package/README.md +53 -29
package/dist/cli.js +281 -22
package/dist/cli.js.map +6 -6
package/dist/index.js +281 -22
package/dist/index.js.map +6 -6
package/dist/src/fixture/types.d.ts +65 -4
package/dist/src/runner/expectations.d.ts +7 -2
package/dist/src/runner/runFixtureFile.d.ts +2 -1
package/dist/src/runner/subprocessAgent.d.ts +6 -2
package/package.json +1 -1

package/README.md CHANGED Viewed

@@ -50,7 +50,7 @@ incantx path/to/fixtures --judge off
 - A test framework for “agents” that behave like a **single Chat Completions call**: input is `{messages, tools}` and output is the **next assistant message** (which may include `tool_calls`).
 - A fixture format that supports:
-  - Starting tests **mid-conversation** via full message history.
+  - Starting tests **mid-conversation** via a full transcript.
   - Asserting on the **next assistant message**, including **tool call expectations**.
   - (Planned) executing tools and continuing until the agent returns a final message.
   - LLM-based assertions (“judge”) for fuzzy/semantic checks (implemented via OpenAI in `auto` mode; skipped if no `OPENAI_API_KEY`).
@@ -91,13 +91,19 @@ Fixture files are YAML and contain:
 - a file-level `agent` config (optional), and
 - a `fixtures[]` list, each with:
-  - optional `history[]` (OpenAI Chat Completions message format)
-  - `input` (the new user message to append)
-  - optional `expect` (assertions on the next assistant message)
+  - a `transcript[]` list of chat turns, ending with `next_assistant` (required)
+Transcript entries are one of:
+- `system: "..."`
+- `user: "..."`
+- `assistant: "..."` or `assistant: { content: "...", tool_calls: [...] }`
+- `tool: { name: "...", tool_call_id: "...", json: {...} }` (or `content: "..."`)
+- `next_assistant: { ...expectations... }` (required, must be last)
 ### Example fixture file
-See `tests/fixtures/weather.yaml` for a complete working example. The key idea is that `history` is verbatim OpenAI-style messages (including `assistant.tool_calls` and `role: tool` messages), so you can paste real traces.
+See `tests/fixtures/weather.yaml` for a complete working example. The key idea is that fixtures read like a chat transcript, and `next_assistant` describes what you expect the agent to return for the next turn.
 ```yaml
 agent:
@@ -106,42 +112,59 @@ agent:
 fixtures:
   - id: weather-requests-tool
-    history:
-      - role: system
-        content: You are a helpful assistant.
-    input: What's the weather in Dublin?
-    expect:
-      tool_calls_match: contains
-      tool_calls:
-        - name: get_weather
-          arguments:
-            location: Dublin
-            unit: c
+    transcript:
+      - system: You are a helpful assistant.
+      - user: What's the weather in Dublin?
+      - next_assistant:
+          content_match: exact
+          content: ""
+          tool_calls_match: contains
+          tool_calls:
+            - get_weather: { location: Dublin, unit: c }
+          tool_results_match: contains
+          tool_results:
+            - get_weather: { condition: Rain }
 ```
 ## Expectations
-All expectations apply to the **next assistant message** produced after the runner:
-1. takes `history` (if any),
-2. appends `{ role: "user", content: input }`,
-3. calls the agent once.
+All expectations apply to the **next assistant message** produced after the runner sends the transcript’s messages (everything before `next_assistant`) to the agent once.
 ### Expecting tool calls
-Use `expect.tool_calls` to assert that the next assistant message includes tool calls.
+Use `next_assistant.tool_calls` to assert that the next assistant message includes tool calls.
-- `expect.tool_calls_match: contains` (default): each expected tool call must appear somewhere in the returned `tool_calls`; extra tool calls are allowed; order is ignored.
-- `expect.tool_calls_match: exact`: the returned `tool_calls` must match exactly (same length/order and matching entries).
+- `next_assistant.tool_calls_match: contains` (default): each expected tool call must appear somewhere in the returned `tool_calls`; extra tool calls are allowed; order is ignored.
+- `next_assistant.tool_calls_match: exact`: the returned `tool_calls` must match exactly (same length/order and matching entries).
 Tool call matching details:
 - `name` matches `tool_calls[].function.name`.
 - `arguments` is a **subset match** against the parsed JSON from `tool_calls[].function.arguments` (which is a JSON string in OpenAI format).
+Fixture sugar:
+- `tool_calls` entries can be either `{ name: get_weather, args: {...}, id?: ... }` or `{ get_weather: {...args} }`.
+- `tool_results` entries can be either `{ name: ..., tool_call_id?: ..., content?: ..., content_json?: ... }` or `{ get_weather: {...jsonSubset} }`.
+### Expecting tool results (agent-provided)
+If your agent returns `tool_messages` alongside `message` (same JSONL response), use `next_assistant.tool_results` to assert on those `role: "tool"` messages.
+Note: `tool_messages` are only used for expectations today; incantx does not yet append them into the next call’s `messages` automatically (that’s part of the planned tool-execution loop).
+Tool result matching details:
+- Match by `tool_call_id` and/or `name`.
+- `content_json` is a subset match against parsed JSON from `tool_messages[].content`.
+### Expecting assistant content
+Use `next_assistant.content` with `next_assistant.content_match: contains|exact` for deterministic checks.
 ### Expecting assistant content (LLM-judged)
-Use `expect.assistant.llm` to express the intended outcome in natural language.
+Use `next_assistant.llm` to express the intended outcome in natural language.
 The CLI can grade this using an LLM judge. By default (`--judge auto`), it will only run if `OPENAI_API_KEY` is set; otherwise these checks are marked `SKIP`.
@@ -166,11 +189,11 @@ For local, language-agnostic agents, the runner will spawn a subprocess and comm
 }
 ```
-#### How history is passed
+#### How messages are passed
-History is passed by sending the full conversation so far in `messages` **on every call** (Chat Completions style). This is what makes “start tests mid-conversation” possible: the runner simply begins `messages` with whatever prior turns you want.
+The runner sends the full conversation so far in `messages` **on every call** (Chat Completions style). This is what makes “start tests mid-conversation” possible: fixtures provide a transcript that becomes the `messages` array.
-When tools are involved, history typically includes:
+When tools are involved, `messages` typically includes:
 1. an assistant message containing `tool_calls`, then
 2. one or more `role: "tool"` messages containing tool results (each with `tool_call_id`), then
@@ -209,7 +232,8 @@ Example (abridged):
 ```json
 {
-  "message": { "role": "assistant", "content": "hello", "tool_calls": [] }
+  "message": { "role": "assistant", "content": "hello", "tool_calls": [] },
+  "tool_messages": []
 }
 ```

package/dist/cli.js CHANGED Viewed

@@ -44,24 +44,207 @@ function loadFixtureFile(yamlText, env = process.env) {
   if (!Array.isArray(file.fixtures))
     throw new Error("Fixture file must contain `fixtures: [...]`.");
   const normalized = {
-    fixtures: file.fixtures
+    fixtures: []
   };
   if (file.agent)
     normalized.agent = normalizeAgentSpec(file.agent, env);
-  normalized.fixtures = normalized.fixtures.map((fixture, index) => {
+  function normalizeToolCallSource(entry, where) {
+    if (!entry || typeof entry !== "object")
+      throw new Error(`${where} must be an object.`);
+    if ("name" in entry) {
+      const e = entry;
+      return {
+        id: e.id !== undefined ? String(e.id) : undefined,
+        name: String(e.name),
+        args: e.args !== undefined ? e.args : undefined
+      };
+    }
+    const keys = Object.keys(entry);
+    if (keys.length !== 1)
+      throw new Error(`${where} must be { name: ..., args: ... } or { toolName: { ...args } }.`);
+    const name = keys[0];
+    const args = entry[name];
+    if (!args || typeof args !== "object")
+      throw new Error(`${where}.${name} must be an object of args.`);
+    return { name, args };
+  }
+  function normalizeToolCallsForMessage(toolCalls, callId) {
+    if (!toolCalls || toolCalls.length === 0)
+      return;
+    const out = [];
+    for (let i = 0;i < toolCalls.length; i++) {
+      const tc = normalizeToolCallSource(toolCalls[i], `tool_calls[${i}]`);
+      out.push({
+        id: callId(tc.id),
+        type: "function",
+        function: {
+          name: tc.name,
+          arguments: JSON.stringify(tc.args ?? {})
+        }
+      });
+    }
+    return out;
+  }
+  function normalizeTranscriptTool(entry, where, inferToolCallId) {
+    if (!entry || typeof entry !== "object")
+      throw new Error(`${where} must be an object.`);
+    const name = String(entry.name);
+    let tool_call_id = entry.tool_call_id !== undefined ? String(entry.tool_call_id) : undefined;
+    if (!tool_call_id)
+      tool_call_id = inferToolCallId();
+    if ("json" in entry && entry.json !== undefined) {
+      return { role: "tool", tool_call_id, name, content: JSON.stringify(entry.json) };
+    }
+    if ("content" in entry && entry.content !== undefined) {
+      return { role: "tool", tool_call_id, name, content: String(entry.content) };
+    }
+    throw new Error(`${where} must include either 'json' or 'content'.`);
+  }
+  function normalizeNextAssistantExpect(expect, where) {
+    if (!expect || typeof expect !== "object")
+      throw new Error(`${where} must be an object.`);
+    const tool_calls = expect.tool_calls;
+    const tool_results = expect.tool_results;
+    const out = {};
+    const assistant = {};
+    if (expect.content !== undefined)
+      assistant.content = String(expect.content);
+    if (expect.content_match !== undefined)
+      assistant.content_match = expect.content_match;
+    if (expect.llm !== undefined)
+      assistant.llm = String(expect.llm);
+    if (assistant.content !== undefined || assistant.llm !== undefined)
+      out.assistant = assistant;
+    if (expect.tool_calls_match !== undefined)
+      out.tool_calls_match = expect.tool_calls_match;
+    if (tool_calls) {
+      out.tool_calls = tool_calls.map((tc, i) => {
+        const normalized2 = normalizeToolCallSource(tc, `${where}.tool_calls[${i}]`);
+        const entry = { name: normalized2.name };
+        if (normalized2.args !== undefined)
+          entry.arguments = normalized2.args;
+        return entry;
+      });
+      if (out.tool_calls_match === undefined)
+        out.tool_calls_match = "contains";
+    }
+    if (expect.tool_results_match !== undefined)
+      out.tool_results_match = expect.tool_results_match;
+    if (tool_results) {
+      out.tool_results = tool_results.map((tr, i) => {
+        const whereItem = `${where}.tool_results[${i}]`;
+        if (!tr || typeof tr !== "object")
+          throw new Error(`${whereItem} must be an object.`);
+        if ("name" in tr || "tool_call_id" in tr || "content" in tr || "content_json" in tr) {
+          const e = tr;
+          const entry = {};
+          if (e.name !== undefined)
+            entry.name = String(e.name);
+          if (e.tool_call_id !== undefined)
+            entry.tool_call_id = String(e.tool_call_id);
+          if (e.content !== undefined)
+            entry.content = String(e.content);
+          if (e.content_match !== undefined)
+            entry.content_match = e.content_match;
+          if (e.content_json !== undefined)
+            entry.content_json = e.content_json;
+          return entry;
+        }
+        const keys = Object.keys(tr);
+        if (keys.length !== 1)
+          throw new Error(`${whereItem} must be { name: ..., ... } or { toolName: { ...jsonSubset } }.`);
+        const name = keys[0];
+        const content_json = tr[name];
+        if (!content_json || typeof content_json !== "object")
+          throw new Error(`${whereItem}.${name} must be an object.`);
+        return { name, content_json };
+      });
+      if (out.tool_results_match === undefined)
+        out.tool_results_match = "contains";
+    }
+    return out;
+  }
+  function normalizeTranscript(entries, where) {
+    if (!Array.isArray(entries))
+      throw new Error(`${where} must be an array.`);
+    if (entries.length === 0)
+      throw new Error(`${where} must not be empty.`);
+    const last = entries[entries.length - 1];
+    if (!last || typeof last !== "object" || !("next_assistant" in last)) {
+      throw new Error(`${where} must end with { next_assistant: ... }.`);
+    }
+    let callCounter = 0;
+    const seenToolCallIds = new Set;
+    const nextCallId = (preferred) => {
+      const id = preferred ?? `call_${++callCounter}`;
+      if (!seenToolCallIds.has(id)) {
+        seenToolCallIds.add(id);
+        return id;
+      }
+      if (preferred)
+        return preferred;
+      throw new Error(`${where}: generated duplicate tool call id: ${id}`);
+    };
+    const inferSingleToolCallId = () => {
+      if (seenToolCallIds.size !== 1) {
+        throw new Error(`${where}: tool entry is missing tool_call_id and it cannot be inferred.`);
+      }
+      return [...seenToolCallIds][0];
+    };
+    const messages = [];
+    for (let i = 0;i < entries.length - 1; i++) {
+      const entry = entries[i];
+      const whereEntry = `${where}[${i}]`;
+      if (!entry || typeof entry !== "object")
+        throw new Error(`${whereEntry} must be an object.`);
+      if ("system" in entry) {
+        messages.push({ role: "system", content: String(entry.system) });
+        continue;
+      }
+      if ("user" in entry) {
+        messages.push({ role: "user", content: String(entry.user) });
+        continue;
+      }
+      if ("assistant" in entry) {
+        const a = entry.assistant;
+        if (typeof a === "string") {
+          messages.push({ role: "assistant", content: a });
+          continue;
+        }
+        if (!a || typeof a !== "object")
+          throw new Error(`${whereEntry}.assistant must be a string or object.`);
+        const tool_calls = normalizeToolCallsForMessage(a.tool_calls, nextCallId);
+        const msg = {
+          role: "assistant",
+          content: a.content !== undefined ? String(a.content) : "",
+          tool_calls
+        };
+        messages.push(msg);
+        continue;
+      }
+      if ("tool" in entry) {
+        const tool = normalizeTranscriptTool(entry.tool, `${whereEntry}.tool`, inferSingleToolCallId);
+        messages.push(tool);
+        continue;
+      }
+      throw new Error(`${whereEntry} must be one of: system, user, assistant, tool.`);
+    }
+    const expect = normalizeNextAssistantExpect(last.next_assistant, `${where}[${entries.length - 1}].next_assistant`);
+    return { messages, expect };
+  }
+  normalized.fixtures = file.fixtures.map((fixture, index) => {
     if (!fixture || typeof fixture !== "object")
       throw new Error(`fixtures[${index}] must be an object.`);
     if (!("id" in fixture))
       throw new Error(`fixtures[${index}].id is required.`);
-    if (!("input" in fixture))
-      throw new Error(`fixtures[${index}].input is required.`);
-    const out = { ...fixture };
-    out.id = String(out.id);
-    out.input = String(out.input);
-    if (out.agent)
-      out.agent = normalizeAgentSpec(out.agent, env);
-    if (out.expect?.tool_calls_match === undefined && out.expect?.tool_calls)
-      out.expect.tool_calls_match = "contains";
+    if (!("transcript" in fixture))
+      throw new Error(`fixtures[${index}].transcript is required.`);
+    const id = String(fixture.id);
+    const transcript = fixture.transcript;
+    const { messages, expect } = normalizeTranscript(transcript, `fixtures[${index}].transcript`);
+    const out = { id, messages, expect };
+    if (fixture.agent)
+      out.agent = normalizeAgentSpec(fixture.agent, env);
     return out;
   });
   return normalized;
@@ -202,9 +385,9 @@ ${stderrText}` : ""));
     if (isError(payload))
       throw new Error(payload.error.message);
     if (!isSuccess(payload)) {
-      throw new Error(`Agent response must be { "message": { ... } } or { "error": { "message": ... } }.`);
+      throw new Error(`Agent response must be { "message": { ... }, "tool_messages"?: [...] } or { "error": { "message": ... } }.`);
     }
-    return payload.message;
+    return payload;
   } finally {
     proc.kill();
     await Promise.allSettled([proc.exited, stderrPromise]);
@@ -299,16 +482,92 @@ function checkToolCalls(expect, message) {
   }
   return { status: "pass" };
 }
-async function evaluateExpectations(expect, message, judge) {
+function matchToolResult(expected, actual) {
+  if (expected.tool_call_id !== undefined && actual.tool_call_id !== expected.tool_call_id)
+    return false;
+  if (expected.name !== undefined && actual.name !== expected.name)
+    return false;
+  if (expected.content !== undefined) {
+    const mode = expected.content_match ?? "exact";
+    if (mode === "exact" && actual.content !== expected.content)
+      return false;
+    if (mode === "contains" && !actual.content.includes(expected.content))
+      return false;
+  }
+  if (expected.content_json !== undefined) {
+    const actualJson = parseJsonOrUndefined(actual.content);
+    if (actualJson === undefined)
+      return false;
+    if (!deepPartialMatch(expected.content_json, actualJson))
+      return false;
+  }
+  return true;
+}
+function checkToolResults(expect, toolMessages) {
+  const expectedResults = expect.tool_results ?? [];
+  if (expectedResults.length === 0)
+    return { status: "pass" };
+  const mode = expect.tool_results_match ?? "contains";
+  if (mode === "contains") {
+    for (const expected of expectedResults) {
+      if (!expected.name && !expected.tool_call_id) {
+        return { status: "fail", reason: "tool_results entries must include at least 'name' or 'tool_call_id'." };
+      }
+      const ok = toolMessages.some((actual) => matchToolResult(expected, actual));
+      if (!ok) {
+        return {
+          status: "fail",
+          reason: `Expected tool result not found${expected.name ? `: ${expected.name}` : ""}.`
+        };
+      }
+    }
+    return { status: "pass" };
+  }
+  if (toolMessages.length !== expectedResults.length) {
+    return {
+      status: "fail",
+      reason: `Expected exactly ${expectedResults.length} tool result(s), got ${toolMessages.length}.`
+    };
+  }
+  for (let i = 0;i < expectedResults.length; i++) {
+    const expected = expectedResults[i];
+    const actual = toolMessages[i];
+    if (!expected || !actual || !matchToolResult(expected, actual)) {
+      return { status: "fail", reason: `Tool result mismatch at index ${i}.` };
+    }
+  }
+  return { status: "pass" };
+}
+function checkAssistantContent(expect, message) {
+  const expected = expect.assistant?.content;
+  if (expected === undefined)
+    return { status: "pass" };
+  const mode = expect.assistant?.content_match ?? "contains";
+  if (mode === "exact") {
+    if (message.content !== expected)
+      return { status: "fail", reason: "Assistant content did not match exactly." };
+    return { status: "pass" };
+  }
+  if (!message.content.includes(expected))
+    return { status: "fail", reason: "Assistant content did not contain expected text." };
+  return { status: "pass" };
+}
+async function evaluateExpectations(expect, turn, judge) {
   if (!expect)
     return { status: "pass" };
-  const toolRes = checkToolCalls(expect, message);
+  const toolRes = checkToolCalls(expect, turn.message);
   if (toolRes.status !== "pass")
     return toolRes;
+  const toolResultsRes = checkToolResults(expect, turn.tool_messages ?? []);
+  if (toolResultsRes.status !== "pass")
+    return toolResultsRes;
+  const contentRes = checkAssistantContent(expect, turn.message);
+  if (contentRes.status !== "pass")
+    return contentRes;
   if (expect.assistant?.llm) {
     if (!judge)
       return { status: "skip", reason: "LLM judge not configured." };
-    return await judge({ expectation: expect.assistant.llm, message });
+    return await judge({ expectation: expect.assistant.llm, message: turn.message });
   }
   return { status: "pass" };
 }
@@ -343,18 +602,18 @@ async function runFixtureFile(path, options = {}) {
   for (const fixture of file.fixtures) {
     try {
       const agent = pickAgentSpec(file, fixture);
-      const messages = [...fixture.history ?? [], { role: "user", content: fixture.input }];
-      const message = await callSubprocessAgent(agent, {
-        messages,
+      const res = await callSubprocessAgent(agent, {
+        messages: fixture.messages,
         tools: [],
         tool_choice: "auto"
       });
-      const expectation = await evaluateExpectations(fixture.expect, message, judge);
+      const expectation = await evaluateExpectations(fixture.expect, res, judge);
       results.push({
         id: fixture.id,
         status: expectation.status,
         reason: expectation.reason,
-        message
+        message: res.message,
+        tool_messages: res.tool_messages
       });
     } catch (err) {
       const reason = err instanceof Error ? err.message : String(err);
@@ -457,4 +716,4 @@ Summary: ${pass} passed, ${fail} failed, ${skip} skipped`);
 }
 await main();
-//# debugId=E9C824803FAB034D64756E2164756E21
+//# debugId=8780BA705018958964756E2164756E21