npm - @huydao/karrot - Versions diffs - 0.1.6 → 0.1.7 - Mend

@huydao/karrot 0.1.6 → 0.1.7

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (15) hide show

package/README.md +496 -243
package/dist/executors/adapters/ag-ui-post.js +87 -12
package/dist/executors/adapters/ag-ui.js +5 -3
package/dist/executors/executor.js +2 -1
package/dist/executors/run-result.d.ts +3 -0
package/dist/reports/report.js +20 -0
package/dist/scenarios/scenario.d.ts +1 -0
package/package.json +5 -2
package/site/assets/app.js +201 -0
package/site/assets/karrot-mark.svg +10 -0
package/site/assets/styles.css +698 -0
package/site/check.js +43 -0
package/site/docs/index.html +505 -0
package/site/index.html +162 -0
package/site/serve.js +50 -0

package/README.md CHANGED Viewed

@@ -1,73 +1,59 @@
 # karrot
-`karrot` is a reusable AI test runner for multi-turn assistant scenarios.
+`karrot` is a reusable AI scenario runner for testing assistants through multi-turn conversations.
-It gives you:
-- scenario execution
-- AG-UI transport integration
-- string and AI-based assertions
-- turn evaluation with OpenAI
-- JSON and HTML reports
+Use it when you want to:
+- send one or more user turns to an agent
+- keep the same conversation thread across turns
+- assert required behavior with deterministic or semantic checks
+- score response quality with eval dimensions
+- write JSON and HTML reports for each run
-This package is designed to be published independently and reused across projects.
+Karrot is product-neutral. It does not know how your product logs in, where your project ID comes from, or how your agent runtime is discovered. The consuming project prepares those values and passes them to Karrot through a YAML config and `execute()`.
-## What Karrot Owns
+## Static Docs Site
-`karrot` is responsible for the AI-test layer:
-- load config
-- resolve `${VARIABLE}` templates
-- load scenario modules
-- execute turns
-- run assertions and evals
-- write artifacts and reports
+Karrot also includes a static documentation site:
-`karrot` does not own product-specific runtime discovery. The consumer project should prepare data such as:
-- `PROJECT_ID`
-- `JWT`
-- `ACCOUNT_ID`
-- `WS_URL`
-- `WS_TOPIC`
-- any transport-specific headers or IDs
+- Landing page: [site/index.html](./site/index.html)
+- Guidance page: [site/docs/index.html](./site/docs/index.html)
-## Core Entry Point
+Run it locally:
-The main high-level API is `execute()`.
+```bash
+npm run site:serve
+```
-```ts
-import { execute } from '@huydao/karrot';
-await execute('./karrot.config.yml', {
-  variables: {
-    PROJECT_ID: process.env.PROJECT_ID,
-    JWT: process.env.JWT,
-    ACCOUNT_ID: process.env.ACCOUNT_ID,
-    WS_URL: process.env.WS_URL,
-    WS_TOPIC: process.env.WS_TOPIC,
-  },
-  scenario: {
-    file: './src/scenarios/basic-two-turn-demo.ts',
-  },
-});
+## Quick Setup
+The normal setup has three files:
+- `karrot.config.yml`: transport, artifacts, eval prompts, report metadata, and scenario context
+- `src/run-karrot.ts`: a small script that collects runtime variables and calls `execute()`
+- `src/scenarios/basic.ts`: one or more scenarios exported as an `AiScenarioSet`
+Install the package in your project:
+```bash
+npm install @huydao/karrot
+npm install -D tsx typescript
+```
+Set the OpenAI key when you use `aiAssert`, `eval`, or generated user messages:
+```bash
+export OPENAI_API_KEY=<your-openai-api-key>
 ```
-`execute()` will:
-1. load YAML or JSON config
-2. resolve `${...}` variables
-3. create `artifacts/<timestamp>`
-4. load the scenario module
-5. run selected scenarios
-6. write JSON and HTML reports
+## 1. YAML Config File
-## Recommended Setup Flow
+Create a config file in the project that will run the tests. The file can live anywhere, but `./karrot.config.yml` or `./scripts/<flow>.karrot.yml` is easiest for agents and humans to find.
-The normal setup path is:
-1. create a YAML config file for the WSS transport
-2. create a scenario module that exports `scenarioSet` and `buildScenarioContext`
-3. create a small run script that calls `execute()`
+Karrot resolves `${VARIABLE}` placeholders from the `variables` object passed to `execute()`, then from `process.env`.
-### 1. WSS config in YAML
+### Generic AG-UI WSS agent
-Use one config file to describe transport, evaluation prompt settings, artifacts, and reporting.
+Use `ag-ui-wss` when the target agent speaks AG-UI over WebSocket/STOMP.
 ```yml
 version: 1
@@ -75,34 +61,34 @@ version: 1
 transport:
   type: ag-ui-wss
   env:
-    JWT: ${JWT}
-    ACCOUNT_ID: ${ACCOUNT_ID}
-    PROJECT_ID: ${PROJECT_ID}
     AGENT_URL: ${AGENT_URL}
     AGENT_ID: ${AGENT_ID}
     WS_URL: ${WS_URL}
     WS_TOPIC: ${WS_TOPIC}
-    WS_STOMP_HEADERS: Authorization:${JWT}
-    WS_HEADERS: Origin:${WS_ORIGIN},User-Agent:Mozilla/5.0
+    AUTH_TOKEN: ${AUTH_TOKEN}
+    WS_STOMP_HEADERS: Authorization:${AUTH_TOKEN}
+    WS_HEADERS: Origin:${APP_BASE_URL},User-Agent:Mozilla/5.0
   processTimeoutMs: 120000
 artifacts:
-  directory: ./artifacts
+  directory: ./artifacts/karrot
 execution:
   stopOnFailure: false
+  concurrency: 1
+context:
+  appBaseUrl: ${APP_BASE_URL}
+  projectId: ${PROJECT_ID}
 evaluation:
   systemPromptPath: ./prompts/turn-eval-system-prompt.md
   promptDirectory: ./prompts/eval
-context:
-  projectId: ${PROJECT_ID}
 report:
   enabled: true
-  environment: prod
-  projectName: Demo Project
+  environment: ${TEST_ENV}
+  projectName: ${PROJECT_NAME}
   runtime:
     agentUrl: ${AGENT_URL}
     agentId: ${AGENT_ID}
@@ -113,190 +99,316 @@ report:
     appBaseUrl: ${APP_BASE_URL}
 ```
-What this does:
-- `transport`: tells Karrot how to talk to the assistant
-- `evaluation`: points to the turn-eval rubric and any extra project-specific dimension prompts
-- `context`: makes resolved values available to scenarios
-- `report`: controls run metadata written into reports
+How to get the values:
-### 2. Scenario module
+- `AGENT_URL`, `AGENT_ID`, `WS_URL`, `WS_TOPIC`: from your agent platform, runtime discovery API, or test environment configuration
+- `AUTH_TOKEN`, `ACCOUNT_ID`, `PROJECT_ID`: from your product login/auth setup or CI secrets
+- `APP_BASE_URL`, `TEST_ENV`, `PROJECT_NAME`: from your test environment
+- `OPENAI_API_KEY`: from the environment, only needed for `aiAssert`, `eval`, and `aiGen`
-A scenario module defines the multi-turn tests that Karrot will run.
+Put secrets in environment variables or CI secrets, not in the YAML file.
-### 3. Run script
+### Generic HTTP agent
-Use a small script to resolve variables and point Karrot at the scenario file.
+Use `ag-ui-post` when the agent is triggered by HTTP and optionally observed by polling.
-```ts
-import { execute } from '@huydao/karrot';
-await execute('./karrot.config.yml', {
-  variables: {
-    PROJECT_ID: process.env.PROJECT_ID,
-    JWT: process.env.JWT,
-    ACCOUNT_ID: process.env.ACCOUNT_ID,
-    AGENT_URL: process.env.AGENT_URL,
-    AGENT_ID: process.env.AGENT_ID,
-    WS_URL: process.env.WS_URL,
-    WS_TOPIC: process.env.WS_TOPIC,
-    WS_ORIGIN: process.env.WS_ORIGIN,
-    APP_BASE_URL: process.env.APP_BASE_URL,
-  },
-  scenario: {
-    file: './src/scenarios/basic-two-turn-demo.ts',
-    ids: ['BASIC-2T'],
-  },
-});
-```
+```yml
+version: 1
-## Scenario Structure
+transport:
+  type: ag-ui-post
+  injectMessage: true
+  run:
+    url: ${AGENT_RUN_URL}
+    headers:
+      Authorization: Bearer ${AUTH_TOKEN}
+      Content-Type: application/json
+    payload:
+      body:
+        threadId: ${THREAD_ID}
+        messages: []
+  processTimeoutMs: 120000
-A scenario module exports:
-- `scenarioSet`
-- `buildScenarioContext(baseContext)`
+artifacts:
+  directory: ./artifacts/karrot
-Minimal example:
+context:
+  appBaseUrl: ${APP_BASE_URL}
+report:
+  enabled: true
+  environment: ${TEST_ENV}
+  projectName: ${PROJECT_NAME}
+  runtime:
+    agentUrl: ${AGENT_RUN_URL}
+    agentId: ${AGENT_ID}
+    wsUrl: ""
+    wsTopic: ""
+    accountId: ${ACCOUNT_ID}
+    projectId: ${PROJECT_ID}
+    appBaseUrl: ${APP_BASE_URL}
+```
+## 2. Run Script
+The run script is the boundary between your product and Karrot. It should collect runtime values, pass them as variables, select the scenario file, and set a non-zero exit code on failure.
 ```ts
-import { AiScenarioSet, type AiScenario, type BaseAiScenarioContext } from '@huydao/karrot';
+import { execute, getScenarioRunStatus } from '@huydao/karrot';
-const scenarios: AiScenario<BaseAiScenarioContext>[] = [
-  {
-    id: 'BASIC-2T',
-    name: 'Basic Two-Turn Demo',
-    turns: [
-      {
-        label: 'Turn 1',
-        message: () => 'Hello. What can you help me with in Katalon AI?',
-      },
-      {
-        label: 'Turn 2',
-        message: () => 'Give me 3 short example prompts I can ask next.',
-      },
-    ],
-  },
-];
+function required(name: string): string {
+  const value = process.env[name]?.trim();
-export const scenarioSet = new AiScenarioSet(scenarios);
+  if (!value) {
+    throw new Error(`Missing required environment variable: ${name}`);
+  }
-export function buildScenarioContext(baseContext: BaseAiScenarioContext): BaseAiScenarioContext {
-  return { ...baseContext };
+  return value;
 }
-```
-### Scenario shape
-Each scenario typically contains:
-- `id`: stable scenario identifier
-- `name`: human-readable scenario name
-- `turns`: ordered list of user turns to execute
+async function main(): Promise<void> {
+  const execution = await execute('./karrot.config.yml', {
+    variables: {
+      TEST_ENV: process.env.TEST_ENV ?? 'local',
+      PROJECT_NAME: process.env.PROJECT_NAME ?? 'Demo Agent',
+      APP_BASE_URL: required('APP_BASE_URL'),
+      AGENT_URL: required('AGENT_URL'),
+      AGENT_ID: required('AGENT_ID'),
+      WS_URL: required('WS_URL'),
+      WS_TOPIC: required('WS_TOPIC'),
+      AUTH_TOKEN: required('AUTH_TOKEN'),
+      ACCOUNT_ID: process.env.ACCOUNT_ID ?? '',
+      PROJECT_ID: process.env.PROJECT_ID ?? '',
+    },
+    scenario: {
+      file: './src/scenarios/basic.ts',
+      ids: process.env.SCENARIO_IDS?.split(',').map((id) => id.trim()).filter(Boolean),
+    },
+  });
+  const status = getScenarioRunStatus(execution.results);
+  console.log(
+    [
+      `Status: ${status}`,
+      `Artifacts: ${execution.outputDirectory}`,
+      `JSON report: ${execution.reportPaths?.jsonPath ?? '-'}`,
+      `HTML report: ${execution.reportPaths?.htmlPath ?? '-'}`,
+    ].join('\n'),
+  );
+  if (status === 'FAIL') {
+    process.exitCode = 1;
+  }
+}
-Each turn supports:
-- `label`: display label in reports
-- `message`: the user message to send
-- `idleTimeoutMs`: optional wait limit for message inactivity
-- `processTimeoutMs`: optional hard timeout for the turn
-- `assertions`: pass/fail checks for the turn output
-- `eval`: quality scoring dimensions for the turn output
-- `onComplete`: optional callback for turn-level post-processing
+main().catch((error) => {
+  console.error(error instanceof Error ? error.message : error);
+  process.exitCode = 1;
+});
+```
-### Message options
+Run it:
-`message` can be:
-- a function `(context) => string`
-- `aiGen.fromPreviousContext()`
-- `aiGen.fromGuidance(guidance)`
-- `aiGen.fromContent(content)`
+```bash
+TEST_ENV=qa npx tsx ./src/run-karrot.ts
+```
-This gives you a few common scenario authoring patterns:
-- fixed prompts for deterministic tests
-- context-aware prompts that use scenario data
-- generated user prompts for more adaptive multi-turn flows
+## 3. Basic Scenario
-Example with assertions and eval on a turn:
+A scenario module must export `scenarioSet`. Export `buildScenarioContext(baseContext)` when the scenario needs typed or derived context.
 ```ts
-import { AiScenarioSet, aiGen, type AiScenario, type BaseAiScenarioContext } from '@huydao/karrot';
+import {
+  aiGen,
+  AiScenarioSet,
+  type AiScenario,
+  type BaseAiScenarioContext,
+} from '@huydao/karrot';
+type DemoContext = BaseAiScenarioContext & {
+  appBaseUrl: string;
+};
+export function buildScenarioContext(baseContext: BaseAiScenarioContext): DemoContext {
+  return {
+    ...baseContext,
+    appBaseUrl: String(baseContext.appBaseUrl ?? ''),
+  };
+}
-const scenarios: AiScenario<BaseAiScenarioContext>[] = [
+const scenarios: AiScenario<DemoContext>[] = [
   {
-    id: 'FOLLOW-UP-1',
-    name: 'Follow-up prompt generation',
+    id: 'BASIC-CHAT-01',
+    name: 'Agent answers and suggests next steps',
     turns: [
       {
-        label: 'Ask for next prompts',
+        label: 'Ask what the agent can do',
+        message: () => 'What can you help me do in this product?',
+        assertions: [
+          {
+            assert: { hasText: 'help' },
+            description: 'The response should explain useful capabilities',
+          },
+          {
+            aiAssert: {
+              hasContent: 'The answer names at least one concrete task the user can perform.',
+            },
+            description: 'The response should be actionable',
+          },
+        ],
+        eval: ['correctness', 'helpfulness', 'clarity'],
+      },
+      {
+        label: 'Ask for follow-up prompts',
         message: aiGen.fromGuidance(
-          'Ask for 3 concise follow-up prompts the user can send next based on the previous answer.',
+          'Ask for three short follow-up prompts based on the previous answer.',
         ),
         assertions: [
-          { assert: { hasText: 'prompt' } },
+          {
+            assert: { toolcall: [] },
+            description: 'The answer should not call tools for this simple prompt request',
+          },
+          {
+            aiAssert: {
+              hasContent: 'The answer provides three concise follow-up prompt ideas.',
+            },
+          },
+        ],
+        eval: [
+          'relevance',
+          {
+            dimension: 'nextStepQuality',
+            guidance: 'Score whether the suggested next prompts are specific and usable.',
+          },
         ],
-        eval: ['correctness', 'helpfulness', 'relevance'],
       },
     ],
   },
 ];
 export const scenarioSet = new AiScenarioSet(scenarios);
-export function buildScenarioContext(baseContext: BaseAiScenarioContext): BaseAiScenarioContext {
-  return { ...baseContext };
-}
 ```
-## Assertions
+## Scenario Details
+### Scenario elements
+Each `AiScenario` has:
+- `id`: stable ID used by CLI filters, CI, and reports
+- `name`: readable scenario name shown in reports
+- `turns`: ordered user messages sent to the same conversation thread
+- `continueOnAssertionFailure`: optional scenario-level flag that keeps later turns running after an assertion failure
-Karrot supports two assertion styles.
+Keep scenario IDs stable. Reports and CI filters depend on them.
-Use assertions for pass/fail requirements. If a turn must contain or avoid something specific, assertions are the right tool.
+### Turn elements
-Direct assertions:
+Each turn has:
+- `label`: readable turn name shown in reports
+- `message`: user input to send to the agent
+- `idleTimeoutMs`: optional timeout for waiting on assistant activity
+- `processTimeoutMs`: optional hard timeout for the turn
+- `assertions`: pass/fail checks for required behavior
+- `eval`: quality scoring dimensions for the assistant response
+- `continueOnAssertionFailure`: optional turn-level override
+- `onComplete`: optional callback after the turn returns output
+`message` can be:
+- `(context) => string`: deterministic message with access to scenario context
+- `aiGen.fromPreviousContext()`: generate the next user message from conversation context
+- `aiGen.fromGuidance(guidance)`: generate a user message from instructions
+- `aiGen.fromContent(content)`: generate a user message from supplied source content
+Use deterministic messages for regression tests. Use `aiGen` when the next user turn should adapt to the previous answer.
+## Turn Assertions
+Assertions decide whether a turn passes or fails. Use them for required behavior.
+### `assert`
+Use `assert` when the expected result can be checked directly.
 ```ts
 assertions: [
-  { assert: { hasText: 'Katalon AI' } },
+  { assert: { hasText: 'created successfully' } },
+  { assert: { toolcall: ['create_test_case'] } },
   { assert: { toolcall: [] } },
+  {
+    assert: {
+      toolcallWithContent: {
+        name: 'create_test_case',
+        hasText: ['login', 'password'],
+        hasProperties: {
+          priority: 'High',
+        },
+      },
+    },
+  },
 ]
 ```
-AI assertions:
+Supported direct assertions:
+- `assert.hasText`: response text contains the expected string
+- `assert.toolcall`: exact expected tool-call names; use `[]` when no tool calls should happen
+- `assert.toolcallWithContent`: a named tool call exists and contains expected text or structured properties
+### `aiAssert`
+Use `aiAssert` when the requirement is semantic and exact string matching would be brittle.
 ```ts
 assertions: [
-  { aiAssert: { hasContent: 'The answer explains what Katalon AI can do.' } },
-  { aiAssert: { notHasContent: 'The answer invents unsupported product features.' } },
+  {
+    aiAssert: {
+      hasContent: 'The answer explains the next action and why it is needed.',
+    },
+  },
+  {
+    aiAssert: {
+      notHasContent: 'The answer invents product capabilities not present in the prompt.',
+    },
+  },
 ]
 ```
-Assertion guidance:
-- Use direct assertions when the expected output is deterministic enough to check literally.
-- Use AI assertions when the requirement is semantic and cannot be captured safely with exact string matching.
-- Use assertions to decide whether the turn satisfied a contract, not to measure answer quality.
+Supported AI assertions:
+- `aiAssert.hasContent`: semantic requirement must be present
+- `aiAssert.notHasContent`: semantic problem must be absent
+`aiAssert` requires `OPENAI_API_KEY`.
-## Evaluations
+## Eval
-Turn evals score the assistant response for named dimensions.
-Karrot applies a CheckEval-inspired evaluation rubric: broad dimensions are decomposed into concrete checklist-style checks before assigning a final score, which improves consistency and makes explanations more traceable.
+Eval is separate from assertions.
+- Assertion: pass/fail requirement, for example "must call `create_test_case`"
+- Eval: quality score, for example "how helpful and complete was the answer"
 ```ts
 eval: ['correctness', 'coverage', 'helpfulness']
 ```
-Custom dimensions are also supported:
+You can add inline guidance:
 ```ts
 eval: [
   'correctness',
   {
     dimension: 'productFit',
-    guidance: 'Judge whether the answer is specifically useful for a Katalon AI user.',
+    guidance: 'Score whether the answer is specifically useful for users of this product.',
   },
 ]
 ```
-Use eval when you want a quality score rather than a hard pass/fail rule.
+Common dimensions:
-Built-in dimensions commonly used by Karrot:
 - `correctness`
 - `coverage`
 - `helpfulness`
@@ -309,99 +421,240 @@ Built-in dimensions commonly used by Karrot:
 - `consistency`
 - `safety`
-Project-level eval prompts can be configured through:
-- `evaluation.systemPromptPath`
-- `evaluation.promptDirectory`
+Project-level eval prompts can live in a directory:
-That lets the project define rubric files without repeating inline guidance in every scenario.
+```yml
+evaluation:
+  promptDirectory: ./prompts/eval
+```
-Use:
-- `systemPromptPath` when you want to replace the whole turn-eval rubric
-- `promptDirectory` when you want to add custom project-specific dimensions
+Then create files such as:
-Eval guidance:
-- Use assertions for required behavior.
-- Use eval for quality measurement across dimensions.
-- Prefer a small number of dimensions that reflect the goal of the turn.
-- Because Karrot applies CheckEval-style scoring, dimensions like `relevance` and `consistency` are judged through concrete sub-checks instead of a vague overall impression.
+```text
+prompts/eval/product-fit.md
+prompts/eval/next-step-quality.md
+```
-## AI-Generated User Messages
+Scenario authors can then use only the dimension names:
-Karrot can generate a user turn message before sending it to the target assistant.
+```ts
+eval: ['correctness', 'productFit', 'nextStepQuality']
+```
-Available helpers:
-- `aiGen.fromPreviousContext()`
-- `aiGen.fromGuidance(guidance)`
-- `aiGen.fromContent(content)`
+Use `evaluation.systemPromptPath` only when you need to replace the full turn-eval rubric.
-Example:
+## Config Reference
-```ts
-import { aiGen } from '@huydao/karrot';
+Top-level config keys:
-message: aiGen.fromGuidance(
-  'Ask for 3 concise follow-up prompts the user can send next based on the previous answer.',
-)
-```
+- `version`: currently `1`
+- `transport`: agent transport, currently `ag-ui-wss` or `ag-ui-post`
+- `artifacts.directory`: output directory for raw events and reports
+- `execution.stopOnFailure`: stop remaining scenarios after a failure
+- `execution.concurrency`: number of scenarios to run in parallel
+- `context`: values available to `buildScenarioContext`
+- `evaluation.systemPromptPath`: full eval prompt override
+- `evaluation.promptDirectory`: additional project-specific eval dimension prompts
+- `report.enabled`: set `false` to skip reports
+- `report.environment`, `report.projectName`, `report.runtime`: metadata written to reports
+- `report.scenarioContext`: extra metadata written to reports
-This requires `OPENAI_API_KEY`.
+## Reports And Artifacts
-## Config Overview
+Each run creates an artifact directory under `artifacts/<timestamp>` or the configured `artifacts.directory`.
-Karrot config currently supports:
-- `transport`
-- `artifacts.directory`
-- `execution.stopOnFailure`
-- `evaluation.systemPromptPath`
-- `evaluation.promptDirectory`
-- `context`
-- `report`
+Typical outputs:
-Important design choice:
-- config and scenario are separate
-- one transport config can be reused across many scenario files
+- raw transport logs, such as `.jsonl` or `.sse`
+- generated-message traces
+- AI assertion traces
+- JSON report
+- HTML report
-## Reports and Artifacts
+## How To Add Karrot To An Existing Playwright Framework
-Each `execute()` run creates:
-- a run artifact directory under `artifacts/<timestamp>`
-- raw transport logs such as `.jsonl` or `.sse`
-- a JSON run report
-- an HTML run report
+1. Install `@huydao/karrot` in the Playwright project.
+2. Add a config file, for example `scripts/ai-full-flow.karrot.yml`.
+3. Add a runner script, for example `scripts/run-ai-full-flow.ts`.
+4. Put scenario files under a stable folder, for example `data/ai-scenarios`.
+5. Reuse Playwright auth helpers to discover runtime values, then pass those values to `execute()`.
+6. Add npm scripts that run the Karrot runner with `tsx`.
+7. Store reports under a predictable artifact directory for CI upload.
-## Environment Variables
+Minimal `package.json` script:
-Common variables:
-- `OPENAI_API_KEY`
-- `OPENAI_BASE_URL`
-- `OPENAI_EVAL_MODEL`
-- `OPENAI_MESSAGE_GEN_MODEL`
+```json
+{
+  "scripts": {
+    "ai:full-flow": "tsx ./scripts/run-ai-full-flow.ts"
+  }
+}
+```
-Transport-specific variables depend on the integration project.
+Example command:
-## Package Structure
+```bash
+TEST_ENV=qa npm run ai:full-flow -- --scenario-file data/ai-scenarios/basic.ts
+```
-- `assertions/`: direct assertions and turn evaluation
-- `executors/`: transport runners and scenario execution
-- `reports/`: JSON and HTML reporting
-- `scenarios/`: scenario types, loaders, generated-message helpers
-- `utils/`: config loading, artifacts, OpenAI helpers
-- `prompts/`: built-in prompt files used by the package
+### Using Karrot inside a Playwright test
-## AI-Friendly Guide
+Use `execute()` when the scenario file owns the whole conversation. Karrot automatically starts a thread on the first turn and reuses that thread for the following turns in the same scenario.
-For a fuller operational guide intended for both humans and AI agents, read [GUIDE.md](./GUIDE.md).
+```ts
+import path from 'node:path';
+import { test, expect } from '@playwright/test';
+import { execute, getScenarioRunStatus } from '@huydao/karrot';
+test('agent completes the basic flow', async () => {
+  const execution = await execute(path.resolve(__dirname, '../karrot.config.yml'), {
+    variables: {
+      TEST_ENV: process.env.TEST_ENV ?? 'qa',
+      PROJECT_NAME: 'Demo Agent',
+      APP_BASE_URL: process.env.APP_BASE_URL,
+      AGENT_URL: process.env.AGENT_URL,
+      AGENT_ID: process.env.AGENT_ID,
+      WS_URL: process.env.WS_URL,
+      WS_TOPIC: process.env.WS_TOPIC,
+      AUTH_TOKEN: process.env.AUTH_TOKEN,
+      ACCOUNT_ID: process.env.ACCOUNT_ID,
+      PROJECT_ID: process.env.PROJECT_ID,
+    },
+    scenario: {
+      file: path.resolve(__dirname, '../src/scenarios/basic.ts'),
+      ids: ['BASIC-CHAT-01'],
+    },
+  });
+  expect(getScenarioRunStatus(execution.results)).toBe('PASS');
+});
+```
-## Build
+Use `runScenario()` when the Playwright test needs to create or recall an existing agent session itself. Pass the known thread/conversation ID as `initialThreadId`. Keep `concurrency: 1`; a single existing session cannot be shared safely across parallel scenarios.
-```bash
-cd karrot
-npx tsc -p tsconfig.json
+```ts
+import { test, expect } from '@playwright/test';
+import {
+  AiScenarioSet,
+  createRunArtifactDirectory,
+  runScenario,
+  type AiScenario,
+  type BaseAiScenarioContext,
+} from '@huydao/karrot';
+import { runAgUiMessage } from '@huydao/karrot/adapters/ag-ui';
+type SessionContext = BaseAiScenarioContext & {
+  projectId: string;
+};
+const scenarios: AiScenario<SessionContext>[] = [
+  {
+    id: 'RESUME-SESSION-01',
+    name: 'Continue an existing assistant session',
+    turns: [
+      {
+        label: 'Recall current context',
+        message: ({ projectId }) =>
+          `Continue in project ${projectId}. Summarize what we have already discussed and suggest the next action.`,
+        assertions: [
+          {
+            aiAssert: {
+              hasContent: 'The answer uses the existing conversation context.',
+            },
+          },
+        ],
+        eval: ['relevance', 'helpfulness'],
+      },
+    ],
+  },
+];
+test('continues an existing agent session', async () => {
+  const initialThreadId = process.env.KARROT_THREAD_ID;
+  if (!initialThreadId) {
+    throw new Error('KARROT_THREAD_ID is required to resume an existing session.');
+  }
+  const outputDirectory = await createRunArtifactDirectory('./artifacts/karrot-playwright');
+  const env = {
+    ...process.env,
+    AGENT_URL: process.env.AGENT_URL ?? '',
+    AGENT_ID: process.env.AGENT_ID ?? '',
+    WS_URL: process.env.WS_URL ?? '',
+    WS_TOPIC: process.env.WS_TOPIC ?? '',
+    AUTH_TOKEN: process.env.AUTH_TOKEN ?? '',
+    WS_STOMP_HEADERS: `Authorization:${process.env.AUTH_TOKEN ?? ''}`,
+    WS_HEADERS: `Origin:${process.env.APP_BASE_URL ?? ''},User-Agent:Mozilla/5.0`,
+  };
+  const scenarioSet = new AiScenarioSet(scenarios);
+  const [result] = await runScenario(scenarioSet.select(['RESUME-SESSION-01']), {
+    context: {
+      projectId: process.env.PROJECT_ID ?? '',
+    },
+    env,
+    outputDirectory,
+    initialThreadId,
+    concurrency: 1,
+    messageRunner: async ({ message, outputDirectory, threadId, processTimeoutMs }) =>
+      await runAgUiMessage({
+        message,
+        env,
+        outputDirectory,
+        threadId,
+        processTimeoutMs,
+      }),
+  });
+  expect(result.status).toBe('PASS');
+  expect(result.threadId).toBe(initialThreadId);
+});
+```
+The `threadId` returned in `execution.results[*].threadId` or `result.threadId` is the value to store when a later Playwright test needs to continue the same assistant session.
+For an init-then-recall flow in the same Playwright script, run once without `initialThreadId`, then pass the returned thread into the next `runScenario()` call:
+```ts
+const [createdSession] = await runScenario(initialScenarioSet.select(['INIT-SESSION-01']), {
+  context,
+  env,
+  outputDirectory,
+  concurrency: 1,
+  messageRunner,
+});
+const threadId = createdSession.threadId;
+if (!threadId) {
+  throw new Error('Initial scenario did not return a threadId.');
+}
+const [continuedSession] = await runScenario(recallScenarioSet.select(['RECALL-SESSION-01']), {
+  context,
+  env,
+  outputDirectory,
+  initialThreadId: threadId,
+  concurrency: 1,
+  messageRunner,
+});
 ```
-## Publish
+## Package Structure
+- `assertions/`: direct assertions and AI assertions
+- `executors/`: scenario execution and transport runners
+- `reports/`: JSON and HTML reporting
+- `scenarios/`: scenario types, loaders, and generated-message helpers
+- `utils/`: config loading, variable resolution, artifacts, and OpenAI helpers
+- `prompts/`: built-in prompts used by the package
+## Build
 ```bash
 cd karrot
-npm publish
+npm run build
 ```
+For a fuller operational reference, read [GUIDE.md](./GUIDE.md).