npm - promptfoo - Versions diffs - 0.6.0 → 0.8.0 - Mend

promptfoo 0.6.0 → 0.8.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (55) hide show

package/README.md +137 -74
package/dist/assertions.d.ts +4 -10
package/dist/assertions.d.ts.map +1 -1
package/dist/assertions.js +126 -20
package/dist/assertions.js.map +1 -1
package/dist/cache.d.ts +8 -0
package/dist/cache.d.ts.map +1 -0
package/dist/cache.js +78 -0
package/dist/cache.js.map +1 -0
package/dist/evaluator.d.ts +2 -2
package/dist/evaluator.d.ts.map +1 -1
package/dist/evaluator.js +73 -40
package/dist/evaluator.js.map +1 -1
package/dist/index.d.ts +6 -4
package/dist/index.d.ts.map +1 -1
package/dist/index.js +8 -21
package/dist/index.js.map +1 -1
package/dist/main.js +92 -80
package/dist/main.js.map +1 -1
package/dist/onboarding.d.ts +4 -0
package/dist/onboarding.d.ts.map +1 -0
package/dist/onboarding.js +63 -0
package/dist/onboarding.js.map +1 -0
package/dist/providers/localai.d.ts.map +1 -1
package/dist/providers/localai.js +7 -9
package/dist/providers/localai.js.map +1 -1
package/dist/providers/openai.d.ts.map +1 -1
package/dist/providers/openai.js +31 -38
package/dist/providers/openai.js.map +1 -1
package/dist/providers.d.ts +1 -0
package/dist/providers.d.ts.map +1 -1
package/dist/providers.js +11 -1
package/dist/providers.js.map +1 -1
package/dist/types.d.ts +46 -13
package/dist/types.d.ts.map +1 -1
package/dist/util.d.ts +6 -3
package/dist/util.d.ts.map +1 -1
package/dist/util.js +73 -2
package/dist/util.js.map +1 -1
package/dist/web/server.d.ts.map +1 -1
package/dist/web/server.js +0 -11
package/dist/web/server.js.map +1 -1
package/package.json +6 -2
package/src/assertions.ts +141 -28
package/src/cache.ts +90 -0
package/src/evaluator.ts +89 -43
package/src/index.ts +14 -26
package/src/main.ts +117 -99
package/src/onboarding.ts +61 -0
package/src/providers/localai.ts +9 -11
package/src/providers/openai.ts +34 -42
package/src/providers.ts +9 -0
package/src/types.ts +95 -16
package/src/util.ts +90 -4
package/src/web/server.ts +0 -18

package/README.md CHANGED Viewed

@@ -9,31 +9,44 @@ With promptfoo, you can:
 - **Test multiple prompts** against predefined test cases
 - **Evaluate quality and catch regressions** by comparing LLM outputs side-by-side
-- **Speed up evaluations** by running tests concurrently
+- **Speed up evaluations** with caching and concurrent tests
 - **Flag bad outputs automatically** by setting "expectations"
 - Use as a command line tool, or integrate into your workflow as a library
 - Use OpenAI models, open-source models like Llama and Vicuna, or integrate custom API providers for any LLM API
+The goal: **test-driven prompt engineering**, rather than trial-and-error.
 # [» View full documentation «](https://promptfoo.dev/docs/intro)
-promptfoo produces matrix views that allow you to quickly review prompt outputs across many inputs. The goal: tune prompts systematically across all relevant test cases, instead of testing prompts by trial and error.
+promptfoo produces matrix views that let you quickly evaluate outputs across many prompts.
 Here's an example of a side-by-side comparison of multiple prompts and inputs:
 ![Prompt evaluation matrix - web viewer](https://github.com/typpo/promptfoo/assets/310310/ddcd77df-2783-425e-ade9-1a20dd0b6cd2)
 It works on the command line too:
 ![Prompt evaluation](https://user-images.githubusercontent.com/310310/235529431-f4d5c395-d569-448e-9697-cd637e0372a5.gif)
-## Usage (command line & web viewer)
+## Workflow
+Start by establishing a handful of test cases - core use cases and failure cases that you want to ensure your prompt can handle.
+As you explore modifications to the prompt, use `promptfoo eval` to rate all outputs. This ensures the prompt is actually improving overall.
+As you collect more examples and establish a user feedback loop, continue to build the pool of test cases.
-To get started, run the following command:
+<img width="772" alt="LLM ops" src="https://github.com/typpo/promptfoo/assets/310310/cf0461a7-2832-4362-9fbb-4ebd911d06ff">
+## Usage
+To get started, run this command:
 ```
 npx promptfoo init
 ```
-This will create some templates in your current directory: `prompts.txt`, `vars.csv`, and `promptfooconfig.js`.
+This will create some placeholders in your current directory: `prompts.txt` and `promptfooconfig.yaml`.
 After editing the prompts and variables to your liking, run the eval command to kick off an evaluation:
@@ -41,20 +54,75 @@ After editing the prompts and variables to your liking, run the eval command to
 npx promptfoo eval
 ```
-If you're looking to customize your usage, you have a wide set of parameters at your disposal. See the [Configuration docs](https://www.promptfoo.dev/docs/configuration/parameters) for more detail:
-| Option                              | Description                                                                                                                                                                                    |
-| ----------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
-| `-p, --prompts <paths...>`          | Paths to prompt files, directory, or glob                                                                                                                                                      |
-| `-r, --providers <name or path...>` | One of: openai:chat, openai:completion, openai:model-name, localai:chat:model-name, localai:completion:model-name. See [API providers](https://www.promptfoo.dev/docs/configuration/providers) |
-| `-o, --output <path>`               | Path to output file (csv, json, yaml, html)                                                                                                                                                    |
-| `-v, --vars <path>`                 | Path to file with prompt variables (csv, json, yaml)                                                                                                                                           |
-| `-c, --config <path>`               | Path to configuration file. `promptfooconfig.js[on]` is automatically loaded if present                                                                                                        |
-| `-j, --max-concurrency <number>`    | Maximum number of concurrent API calls                                                                                                                                                         |
-| `--table-cell-max-length <number>`  | Truncate console table cells to this length                                                                                                                                                    |
-| `--prompt-prefix <path>`            | This prefix is prepended to every prompt                                                                                                                                                       |
-| `--prompt-suffix <path>`            | This suffix is append to every prompt                                                                                                                                                          |
-| `--grader`                          | Provider that will grade outputs, if you are using [LLM grading](https://www.promptfoo.dev/docs/configuration/expected-outputs)                                                                |
+### Configuration
+The YAML configuration format runs each prompt through a series of example inputs (aka "test case") and checks if they meet requirements (aka "assert").
+See the [Configuration docs](https://www.promptfoo.dev/docs/configuration/guide) for a detailed guide.
+```yaml
+prompts: [prompts.txt]
+providers: [openai:gpt-3.5-turbo]
+tests:
+  - description: First test case - automatic review
+    vars:
+      var1: first variable's value
+      var2: another value
+      var3: some other value
+    assert:
+      - type: equality
+        value: expected LLM output goes here
+      - type: function
+        value: output.includes('some text')
+  - description: Second test case - manual review
+    # Test cases don't need assertions if you prefer to review the output yourself
+    vars:
+      var1: new value
+      var2: another value
+      var3: third value
+  - description: Third test case - other types of automatic review
+    vars:
+      var1: yet another value
+      var2: and another
+      var3: dear llm, please output your response in json format
+    assert:
+      - type: contains-json
+      - type: similarity
+        value: ensures that output is semantically similar to this text
+      - type: llm-rubric
+        value: ensure that output contains a reference to X
+```
+### Tests on spreadsheet
+Some people prefer to configure their LLM tests in a CSV. In that case, the config is pretty simple:
+```yaml
+prompts: [prompts.txt]
+providers: [openai:gpt-3.5-turbo]
+tests: tests.csv
+```
+See [example CSV](https://github.com/typpo/promptfoo/blob/main/examples/simple-test/tests.csv).
+### Command-line
+If you're looking to customize your usage, you have a wide set of parameters at your disposal.
+| Option                              | Description                                                                                                                                                                                                            |
+| ----------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `-p, --prompts <paths...>`          | Paths to [prompt files](https://promptfoo.dev/docs/configuration/parameters#prompt-files), directory, or glob                                                                                                          |
+| `-r, --providers <name or path...>` | One of: openai:chat, openai:completion, openai:model-name, localai:chat:model-name, localai:completion:model-name. See [API providers](https://promptfoo.dev/docs/configuration/providers)                             |
+| `-o, --output <path>`               | Path to [output file](https://promptfoo.dev/docs/configuration/parameters#output-file) (csv, json, yaml, html)                                                                                                         |
+| `--tests <path>`                    | Path to [external test file](https://promptfoo.dev/docs/configurationexpected-outputsassertions#load-an-external-tests-file)                                                                                           |
+| `-c, --config <path>`               | Path to [configuration file](https://promptfoo.dev/docs/configuration/guide). `promptfooconfig.js/json/yaml` is automatically loaded if present                                                                        |
+| `-j, --max-concurrency <number>`    | Maximum number of concurrent API calls                                                                                                                                                                                 |
+| `--table-cell-max-length <number>`  | Truncate console table cells to this length                                                                                                                                                                            |
+| `--prompt-prefix <path>`            | This prefix is prepended to every prompt                                                                                                                                                                               |
+| `--prompt-suffix <path>`            | This suffix is append to every prompt                                                                                                                                                                                  |
+| `--grader`                          | [Provider](https://promptfoo.dev/docs/configuration/providers) that will conduct the evaluation, if you are [using LLM to grade your output](https://promptfoo.dev/docs/configuration/expected-outputs#llm-evaluation) |
 After running an eval, you may optionally use the `view` command to open the web viewer:
@@ -66,10 +134,10 @@ npx promptfoo view
 #### Prompt quality
-In this example, we evaluate whether adding adjectives to the personality of an assistant bot affects the responses:
+In [this example](https://github.com/typpo/promptfoo/tree/main/examples/assistant-cli), we evaluate whether adding adjectives to the personality of an assistant bot affects the responses:
 ```bash
-npx promptfoo eval -p prompts.txt -v vars.csv -r openai:gpt-3.5-turbo
+npx promptfoo eval -p prompts.txt -r openai:gpt-3.5-turbo -t tests.csv
 ```
 <!--
@@ -80,15 +148,13 @@ npx promptfoo eval -p prompts.txt -v vars.csv -r openai:gpt-3.5-turbo
 This command will evaluate the prompts in `prompts.txt`, substituing the variable values from `vars.csv`, and output results in your terminal.
-Have a look at the setup and full output [here](https://github.com/typpo/promptfoo/tree/main/examples/assistant-cli).
 You can also output a nice [spreadsheet](https://docs.google.com/spreadsheets/d/1nanoj3_TniWrDl1Sj-qYqIMD6jwm5FBy15xPFdUTsmI/edit?usp=sharing), [JSON](https://github.com/typpo/promptfoo/blob/main/examples/simple-cli/output.json), YAML, or an HTML file:
 ![Table output](https://user-images.githubusercontent.com/310310/235483444-4ddb832d-e103-4b9c-a862-b0d6cc11cdc0.png)
 #### Model quality
-In this example, we evaluate the difference between GPT 3 and GPT 4 outputs for a given prompt:
+In the [next example](https://github.com/typpo/promptfoo/tree/main/examples/gpt-3.5-vs-4), we evaluate the difference between GPT 3 and GPT 4 outputs for a given prompt:
 ```bash
 npx promptfoo eval -p prompts.txt -r openai:gpt-3.5-turbo openai:gpt-4 -o output.html
@@ -98,19 +164,46 @@ Produces this HTML table:
 ![Side-by-side evaluation of LLM model quality, gpt3 vs gpt4, html output](https://user-images.githubusercontent.com/310310/235490527-e0c31f40-00a0-493a-8afc-8ed6322bb5ca.png)
-Full setup and output [here](https://github.com/typpo/promptfoo/tree/main/examples/gpt-3.5-vs-4).
 ## Usage (node package)
 You can also use `promptfoo` as a library in your project by importing the `evaluate` function. The function takes the following parameters:
-- `providers`: a list of provider strings or `ApiProvider` objects, or just a single string or `ApiProvider`.
-- `options`: the prompts and variables you want to test:
+- `testSuite`: the Javascript equivalent of the promptfooconfig.yaml
   ```typescript
-  {
-    prompts: string[];
+  interface TestSuiteConfig {
+    providers: string[]; // Valid provider name (e.g. openai:gpt-3.5-turbo)
+    prompts: string[]; // List of prompts
+    tests: string | TestCase[]; // Path to a CSV file, or list of test cases
+    defaultTest?: Omit<TestCase, 'description'>; // Optional: add default vars and assertions on test case
+    outputPath?: string; // Optional: write results to file
+  }
+  interface TestCase {
+    description?: string;
     vars?: Record<string, string>;
+    assert?: Assertion[];
+    prompt?: PromptConfig;
+    grading?: GradingConfig;
+  }
+  interface Assertion {
+    type: 'equality' | 'is-json' | 'contains-json' | 'function' | 'similarity' | 'llm-rubric';
+    value?: string;
+    threshold?: number; // For similarity assertions
+    provider?: ApiProvider; // For assertions that require an LLM provider
+  }
+  ```
+- `options`: misc options related to how the tests are run
+  ```typescript
+  interface EvaluateOptions {
+    maxConcurrency?: number;
+    showProgressBar?: boolean;
+    generateSuggestions?: boolean;
   }
   ```
@@ -121,61 +214,31 @@ You can also use `promptfoo` as a library in your project by importing the `eval
 ```js
 import promptfoo from 'promptfoo';
-const options = {
+const results = await promptfoo.evaluate({
   prompts: ['Rephrase this in French: {{body}}', 'Rephrase this like a pirate: {{body}}'],
-  vars: [{ body: 'Hello world' }, { body: "I'm hungry" }],
-};
-(async () => {
-  const summary = await promptfoo.evaluate('openai:gpt-3.5-turbo', options);
-  console.log(summary);
-})();
-```
-This code imports the `promptfoo` library, defines the evaluation options, and then calls the `evaluate` function with these options. The results are logged to the console:
-```js
-{
-  "results": [
+  providers: ['openai:gpt-3.5-turbo'],
+  tests: [
     {
-      "prompt": {
-        "raw": "Rephrase this in French: Hello world",
-        "display": "Rephrase this in French: {{body}}"
+      vars: {
+        body: 'Hello world',
       },
-      "vars": {
-        "body": "Hello world"
+    },
+    {
+      vars: {
+        body: "I'm hungry",
       },
-      "response": {
-        "output": "Bonjour le monde",
-        "tokenUsage": {
-          "total": 19,
-          "prompt": 16,
-          "completion": 3
-        }
-      }
     },
-    // ...
   ],
-  "stats": {
-    "successes": 4,
-    "failures": 0,
-    "tokenUsage": {
-      "total": 120,
-      "prompt": 72,
-      "completion": 48
-    }
-  },
-  "table": [
-    // ...
-  ]
-}
+});
 ```
-[See full example here](https://github.com/typpo/promptfoo/tree/main/examples/simple-import)
+This code imports the `promptfoo` library, defines the evaluation options, and then calls the `evaluate` function with these options.
+See the full example [here](https://github.com/typpo/promptfoo/tree/main/examples/simple-import), which includes an example results object.
 ## Configuration
-- **[Setting up an eval](https://promptfoo.dev/docs/configuration/parameters)**: Learn more about how to set up prompt files, vars file, output, etc.
+- **[Main guide](https://promptfoo.dev/docs/configuration/guide)**: Learn about how to configure your YAML file, setup prompt files, etc.
 - **[Configuring test cases](https://promptfoo.dev/docs/configuration/expected-outputs)**: Learn more about how to configure expected outputs and test assertions.
 ## Installation

package/dist/assertions.d.ts CHANGED Viewed

@@ -1,15 +1,9 @@
-import type { EvaluateOptions, GradingConfig, TokenUsage } from './types.js';
-interface GradingResult {
-    pass: boolean;
-    reason: string;
-    tokensUsed: TokenUsage;
-}
-export declare function matchesExpectedValue(expected: string, output: string, options: EvaluateOptions): Promise<{
-    pass: boolean;
-    reason?: string;
-}>;
+import type { Assertion, GradingConfig, TestCase, GradingResult } from './types.js';
+export declare function runAssertions(test: TestCase, output: string): Promise<GradingResult>;
+export declare function runAssertion(assertion: Assertion, test: TestCase, output: string): Promise<GradingResult>;
 export declare function matchesSimilarity(expected: string, output: string, threshold: number): Promise<GradingResult>;
 export declare function matchesLlmRubric(expected: string, output: string, options?: GradingConfig): Promise<GradingResult>;
+export declare function assertionFromString(expected: string): Assertion;
 declare const _default: {
     matchesSimilarity: typeof matchesSimilarity;
     matchesLlmRubric: typeof matchesLlmRubric;

package/dist/assertions.d.ts.map CHANGED Viewed

	@@ -1 +1 @@
1	- {"version":3,"file":"assertions.d.ts","sourceRoot":"","sources":["../src/assertions.ts"],"names":[],"mappings":"~~AAOA~~,OAAO,KAAK,EAAE,~~eAAe~~,EAAE,aAAa,EAAE,~~UAAU~~,EAAE,MAAM,YAAY,CAAC;~~AAE7E~~,~~UAAU~~,aAAa~~;IACrB~~,IAAI,EAAE,~~OAAO~~,~~CAAC;IACd~~,MAAM,EAAE,MAAM,~~CAAC;IACf~~,~~UAAU~~,~~EAAE~~,~~UAAU~~,CAAC;~~CACxB;AAMD~~,wBAAsB,~~oBAAoB~~,~~CACxC~~,~~QAAQ~~,EAAE,~~MAAM~~,~~EAChB~~,~~MAAM~~,EAAE,~~MAAM~~,EACd,~~OAAO~~,EAAE,~~eAAe~~,~~GACvB~~,OAAO,CAAC~~;IAAE~~,~~IAAI~~,~~EAAE,OAAO,~~CAAC~~;IAAC~~,~~MAAM,CAAC,EAAE,MAAM,CAAA~~;~~CAAE~~,~~CAAC,CAuB7C;AAED,~~wBAAsB,iBAAiB,CACrC,QAAQ,EAAE,MAAM,EAChB,MAAM,EAAE,MAAM,EACd,SAAS,EAAE,MAAM,GAChB,OAAO,CAAC,aAAa,CAAC,CA0CxB;AAED,wBAAsB,gBAAgB,CACpC,QAAQ,EAAE,MAAM,EAChB,MAAM,EAAE,MAAM,EACd,OAAO,CAAC,EAAE,aAAa,GACtB,OAAO,CAAC,aAAa,CAAC,CAgDxB;;;;;AAED,wBAGE"}
1	+ {"version":3,"file":"assertions.d.ts","sourceRoot":"","sources":["../src/assertions.ts"],"names":[],"mappings":"AAQA,OAAO,KAAK,EAAE,SAAS,EAAE,aAAa,EAAE,QAAQ,EAAE,aAAa,EAAE,MAAM,YAAY,CAAC;AAMpF,wBAAsB,aAAa,CAAC,IAAI,EAAE,QAAQ,EAAE,MAAM,EAAE,MAAM,GAAG,OAAO,CAAC,aAAa,CAAC,CAyB1F;AAED,wBAAsB,YAAY,CAChC,SAAS,EAAE,SAAS,EACpB,IAAI,EAAE,QAAQ,EACd,MAAM,EAAE,MAAM,GACb,OAAO,CAAC,aAAa,CAAC,CA2DxB;AAoBD,wBAAsB,iBAAiB,CACrC,QAAQ,EAAE,MAAM,EAChB,MAAM,EAAE,MAAM,EACd,SAAS,EAAE,MAAM,GAChB,OAAO,CAAC,aAAa,CAAC,CA0CxB;AAED,wBAAsB,gBAAgB,CACpC,QAAQ,EAAE,MAAM,EAChB,MAAM,EAAE,MAAM,EACd,OAAO,CAAC,EAAE,aAAa,GACtB,OAAO,CAAC,aAAa,CAAC,CAgDxB;AAED,wBAAgB,mBAAmB,CAAC,QAAQ,EAAE,MAAM,GAAG,SAAS,CAmC/D;;;;;AAED,wBAGE"}

package/dist/assertions.js CHANGED Viewed

@@ -3,7 +3,8 @@ var __importDefault = (this && this.__importDefault) || function (mod) {
     return (mod && mod.__esModule) ? mod : { "default": mod };
 };
 Object.defineProperty(exports, "__esModule", { value: true });
-exports.matchesLlmRubric = exports.matchesSimilarity = exports.matchesExpectedValue = void 0;
+exports.assertionFromString = exports.matchesLlmRubric = exports.matchesSimilarity = exports.runAssertion = exports.runAssertions = void 0;
+const tiny_invariant_1 = __importDefault(require("tiny-invariant"));
 const nunjucks_1 = __importDefault(require("nunjucks"));
 const openai_js_1 = require("./providers/openai.js");
 const util_js_1 = require("./util.js");
@@ -11,32 +12,100 @@ const providers_js_1 = require("./providers.js");
 const prompts_js_1 = require("./prompts.js");
 const SIMILAR_REGEX = /similar(?::|\((\d+(\.\d+)?)\):)/;
 const DEFAULT_SEMANTIC_SIMILARITY_THRESHOLD = 0.8;
-async function matchesExpectedValue(expected, output, options) {
-    const match = expected.match(SIMILAR_REGEX);
-    if (match) {
-        const threshold = parseFloat(match[1]) || DEFAULT_SEMANTIC_SIMILARITY_THRESHOLD;
-        const rest = expected.replace(SIMILAR_REGEX, '').trim();
-        return matchesSimilarity(rest, output, threshold);
+async function runAssertions(test, output) {
+    const tokensUsed = {
+        total: 0,
+        prompt: 0,
+        completion: 0,
+    };
+    if (!test.assert) {
+        return { pass: true, reason: 'No assertions', tokensUsed };
     }
-    else if (expected.startsWith('fn:') || expected.startsWith('eval:')) {
-        // TODO(1.0): delete eval: legacy option
-        const sliceLength = expected.startsWith('fn:') ? 'fn:'.length : 'eval:'.length;
-        const functionBody = expected.slice(sliceLength);
-        const customFunction = new Function('output', `return ${functionBody}`);
-        return { pass: customFunction(output) };
+    for (const assertion of test.assert) {
+        const result = await runAssertion(assertion, test, output);
+        if (!result.pass) {
+            return result;
+        }
+        if (result.tokensUsed) {
+            tokensUsed.total += result.tokensUsed.total;
+            tokensUsed.prompt += result.tokensUsed.prompt;
+            tokensUsed.completion += result.tokensUsed.completion;
+        }
+    }
+    return { pass: true, reason: 'All assertions passed', tokensUsed };
+}
+exports.runAssertions = runAssertions;
+async function runAssertion(assertion, test, output) {
+    let pass = false;
+    if (assertion.type === 'equals') {
+        pass = assertion.value === output;
+        return {
+            pass,
+            reason: pass ? 'Assertion passed' : `Expected output "${assertion.value}"`,
+        };
+    }
+    if (assertion.type === 'is-json') {
+        try {
+            JSON.parse(output);
+            return { pass: true, reason: 'Assertion passed' };
+        }
+        catch (err) {
+            return {
+                pass: false,
+                reason: `Expected output to be valid JSON, but it isn't.\nError: ${err}`,
+            };
+        }
     }
-    else if (expected.startsWith('grade:')) {
-        return matchesLlmRubric(expected.slice(6), output, options.grading);
+    if (assertion.type === 'contains-json') {
+        const pass = containsJSON(output);
+        return {
+            pass,
+            reason: pass ? 'Assertion passed' : 'Expected output to contain valid JSON',
+        };
     }
-    else {
-        const pass = expected === output;
+    if (assertion.type === 'javascript') {
+        try {
+            const customFunction = new Function('output', `return ${assertion.value}`);
+            pass = customFunction(output);
+        }
+        catch (err) {
+            return {
+                pass: false,
+                reason: `Custom function threw error: ${err.message}`,
+            };
+        }
         return {
             pass,
-            reason: pass ? undefined : `Expected: ${expected}, Output: ${output}`,
+            reason: pass ? 'Assertion passed' : `Custom function returned false`,
         };
     }
+    if (assertion.type === 'similar') {
+        (0, tiny_invariant_1.default)(assertion.value, 'Similarity assertion must have a string value');
+        (0, tiny_invariant_1.default)(assertion.threshold, 'Similarity assertion must have a threshold');
+        return matchesSimilarity(assertion.value, output, assertion.threshold);
+    }
+    if (assertion.type === 'llm-rubric') {
+        (0, tiny_invariant_1.default)(assertion.value, 'Similarity assertion must have a string value');
+        return matchesLlmRubric(assertion.value, output, test.options);
+    }
+    throw new Error('Unknown assertion type: ' + assertion.type);
+}
+exports.runAssertion = runAssertion;
+function containsJSON(str) {
+    // Regular expression to check for JSON-like pattern
+    const jsonPattern = /({[\s\S]*}|\[[\s\S]*])/;
+    const match = str.match(jsonPattern);
+    if (!match) {
+        return false;
+    }
+    try {
+        JSON.parse(match[0]);
+        return true;
+    }
+    catch (error) {
+        return false;
+    }
 }
-exports.matchesExpectedValue = matchesExpectedValue;
 async function matchesSimilarity(expected, output, threshold) {
     const expectedEmbedding = await openai_js_1.DefaultEmbeddingProvider.callEmbeddingApi(expected);
     const outputEmbedding = await openai_js_1.DefaultEmbeddingProvider.callEmbeddingApi(output);
@@ -79,7 +148,7 @@ async function matchesLlmRubric(expected, output, options) {
     if (!options) {
         throw new Error('Cannot grade output without grading config. Specify --grader option or grading config.');
     }
-    const prompt = nunjucks_1.default.renderString(options.prompt || prompts_js_1.DEFAULT_GRADING_PROMPT, {
+    const prompt = nunjucks_1.default.renderString(options.rubricPrompt || prompts_js_1.DEFAULT_GRADING_PROMPT, {
         content: output,
         rubric: expected,
     });
@@ -121,6 +190,43 @@ async function matchesLlmRubric(expected, output, options) {
     }
 }
 exports.matchesLlmRubric = matchesLlmRubric;
+function assertionFromString(expected) {
+    const match = expected.match(SIMILAR_REGEX);
+    if (match) {
+        const threshold = parseFloat(match[1]) || DEFAULT_SEMANTIC_SIMILARITY_THRESHOLD;
+        const rest = expected.replace(SIMILAR_REGEX, '').trim();
+        return {
+            type: 'similar',
+            value: rest,
+            threshold,
+        };
+    }
+    if (expected.startsWith('fn:') || expected.startsWith('eval:')) {
+        // TODO(1.0): delete eval: legacy option
+        const sliceLength = expected.startsWith('fn:') ? 'fn:'.length : 'eval:'.length;
+        const functionBody = expected.slice(sliceLength);
+        return {
+            type: 'javascript',
+            value: functionBody,
+        };
+    }
+    if (expected.startsWith('grade:')) {
+        return {
+            type: 'llm-rubric',
+            value: expected.slice(6),
+        };
+    }
+    if (expected === 'is-json' || expected === 'contains-json') {
+        return {
+            type: expected,
+        };
+    }
+    return {
+        type: 'equals',
+        value: expected,
+    };
+}
+exports.assertionFromString = assertionFromString;
 exports.default = {
     matchesSimilarity,
     matchesLlmRubric,

package/dist/assertions.js.map CHANGED Viewed

	@@ -1 +1 @@
1	- {"version":3,"file":"assertions.js","sourceRoot":"","sources":["../src/assertions.ts"],"names":[],"mappings":";;;;;;AAAA,wDAAgC;AAEhC,qDAAyF;AACzF,uCAA6C;AAC7C,iDAAiD;AACjD,6CAAsD;~~AAUtD~~,MAAM,aAAa,GAAG,iCAAiC,CAAC;AAExD,MAAM,qCAAqC,GAAG,GAAG,CAAC;AAE3C,KAAK,UAAU,~~oBAAoB~~,~~CACxC~~,~~QAAgB~~,~~EAChB~~,MAAc~~,EACd,OAAwB~~;~~IAExB~~,MAAM,~~KAAK~~,GAAG,~~QAAQ~~,CAAC,~~KAAK~~,CAAC,~~aAAa~~,CAAC,CAAC;~~IAE5C~~,IAAI,~~KAAK~~,EAAE;~~QACT~~,MAAM,~~SAAS~~,~~GAAG~~,UAAU,CAAC,KAAK,CAAC,CAAC,CAAC,CAAC,IAAI,~~qCAAqC~~,CAAC;~~QAChF~~,MAAM,IAAI,~~GAAG~~,~~QAAQ~~,CAAC,~~OAAO~~,CAAC,~~aAAa~~,~~EAAE~~,~~EAAE~~,CAAC,CAAC,IAAI,~~EAAE~~,CAAC;~~QACxD~~,~~OAAO~~,~~iBAAiB~~,CAAC,IAAI,EAAE,MAAM,EAAE,~~SAAS~~,CAAC,CAAC;~~KACnD~~;~~SAAM~~,IAAI,~~QAAQ~~,~~CAAC~~,~~UAAU~~,CAAC,~~KAAK~~,CAAC,IAAI,QAAQ,~~CAAC~~,~~UAAU~~,CAAC,~~OAAO~~,CAAC,~~EAAE~~;~~QACrE~~,~~wCAAwC~~;~~QACxC~~,MAAM,~~WAAW~~,~~GAAG~~,~~QAAQ~~,CAAC,~~UAAU~~,CAAC,~~KAAK~~,CAAC,CAAC,CAAC,CAAC,KAAK,CAAC,~~MAAM~~,CAAC,CAAC,CAAC,OAAO,~~CAAC~~,MAAM,CAAC;~~QAC/E~~,MAAM,~~YAAY~~,GAAG,~~QAAQ~~,CAAC,KAAK,CAAC,~~WAAW~~,CAAC,CAAC;~~QAEjD~~,MAAM,cAAc,GAAG,IAAI,QAAQ,CAAC,QAAQ,EAAE,UAAU,~~YAAY~~,EAAE,CAAC,CAAC;~~QACxE~~,OAAO,EAAE,IAAI,EAAE,~~cAAc~~,CAAC,~~MAAM~~,CAAC,EAAE,CAAC;~~KACzC~~;~~SAAM~~,IAAI,~~QAAQ~~,CAAC,~~UAAU~~,CAAC,~~QAAQ~~,CAAC,EAAE;~~QACxC~~,OAAO,~~gBAAgB~~,CAAC,~~QAAQ~~,CAAC,KAAK,CAAC,CAAC,CAAC,EAAE,~~MAAM~~,EAAE,OAAO,CAAC,OAAO,CAAC,CAAC;~~KACrE~~;~~SAAM;QACL~~,MAAM,IAAI,GAAG,~~QAAQ~~,~~KAAK~~,~~MAAM~~,CAAC;~~QACjC~~,~~OAAO~~;~~YACL~~,~~IAAI~~;~~YACJ~~,MAAM,EAAE,IAAI,CAAC,CAAC,CAAC,~~SAAS~~,CAAC,CAAC,CAAC,~~aAAa~~,~~QAAQ~~,~~aAAa~~,~~MAAM~~,EAAE;~~SACtE~~,CAAC;~~KACH~~;AACH,CAAC;~~AA3BD,oDA2BC;~~AAEM,KAAK,UAAU,iBAAiB,CACrC,QAAgB,EAChB,MAAc,EACd,SAAiB;IAEjB,MAAM,iBAAiB,GAAG,MAAM,oCAAwB,CAAC,gBAAgB,CAAC,QAAQ,CAAC,CAAC;IACpF,MAAM,eAAe,GAAG,MAAM,oCAAwB,CAAC,gBAAgB,CAAC,MAAM,CAAC,CAAC;IAEhF,MAAM,UAAU,GAAG;QACjB,KAAK,EAAE,CAAC,iBAAiB,CAAC,UAAU,EAAE,KAAK,IAAI,CAAC,CAAC,GAAG,CAAC,eAAe,CAAC,UAAU,EAAE,KAAK,IAAI,CAAC,CAAC;QAC5F,MAAM,EAAE,CAAC,iBAAiB,CAAC,UAAU,EAAE,MAAM,IAAI,CAAC,CAAC,GAAG,CAAC,eAAe,CAAC,UAAU,EAAE,MAAM,IAAI,CAAC,CAAC;QAC/F,UAAU,EACR,CAAC,iBAAiB,CAAC,UAAU,EAAE,UAAU,IAAI,CAAC,CAAC;YAC/C,CAAC,eAAe,CAAC,UAAU,EAAE,UAAU,IAAI,CAAC,CAAC;KAChD,CAAC;IAEF,IAAI,iBAAiB,CAAC,KAAK,IAAI,eAAe,CAAC,KAAK,EAAE;QACpD,OAAO;YACL,IAAI,EAAE,KAAK;YACX,MAAM,EACJ,iBAAiB,CAAC,KAAK,IAAI,eAAe,CAAC,KAAK,IAAI,mCAAmC;YACzF,UAAU;SACX,CAAC;KACH;IAED,IAAI,CAAC,iBAAiB,CAAC,SAAS,IAAI,CAAC,eAAe,CAAC,SAAS,EAAE;QAC9D,OAAO;YACL,IAAI,EAAE,KAAK;YACX,MAAM,EAAE,qBAAqB;YAC7B,UAAU;SACX,CAAC;KACH;IAED,MAAM,UAAU,GAAG,IAAA,0BAAgB,EAAC,iBAAiB,CAAC,SAAS,EAAE,eAAe,CAAC,SAAS,CAAC,CAAC;IAC5F,IAAI,UAAU,GAAG,SAAS,EAAE;QAC1B,OAAO;YACL,IAAI,EAAE,KAAK;YACX,MAAM,EAAE,cAAc,UAAU,2BAA2B,SAAS,EAAE;YACtE,UAAU;SACX,CAAC;KACH;IACD,OAAO;QACL,IAAI,EAAE,IAAI;QACV,MAAM,EAAE,cAAc,UAAU,8BAA8B,SAAS,EAAE;QACzE,UAAU;KACX,CAAC;AACJ,CAAC;AA9CD,8CA8CC;AAEM,KAAK,UAAU,gBAAgB,CACpC,QAAgB,EAChB,MAAc,EACd,OAAuB;IAEvB,IAAI,CAAC,OAAO,EAAE;QACZ,MAAM,IAAI,KAAK,CACb,wFAAwF,CACzF,CAAC;KACH;IAED,MAAM,MAAM,GAAG,kBAAQ,CAAC,YAAY,CAAC,OAAO,CAAC,~~MAAM~~,IAAI,mCAAsB,EAAE;~~QAC7E~~,OAAO,EAAE,MAAM;QACf,MAAM,EAAE,QAAQ;KACjB,CAAC,CAAC;IAEH,IAAI,QAAQ,GAAG,OAAO,CAAC,QAAQ,IAAI,kCAAsB,CAAC;IAC1D,IAAI,OAAO,QAAQ,KAAK,QAAQ,EAAE;QAChC,QAAQ,GAAG,MAAM,IAAA,8BAAe,EAAC,QAAQ,CAAC,CAAC;KAC5C;IACD,MAAM,IAAI,GAAG,MAAM,QAAQ,CAAC,OAAO,CAAC,MAAM,CAAC,CAAC;IAC5C,IAAI,IAAI,CAAC,KAAK,IAAI,CAAC,IAAI,CAAC,MAAM,EAAE;QAC9B,OAAO;YACL,IAAI,EAAE,KAAK;YACX,MAAM,EAAE,IAAI,CAAC,KAAK,IAAI,WAAW;YACjC,UAAU,EAAE;gBACV,KAAK,EAAE,IAAI,CAAC,UAAU,EAAE,KAAK,IAAI,CAAC;gBAClC,MAAM,EAAE,IAAI,CAAC,UAAU,EAAE,MAAM,IAAI,CAAC;gBACpC,UAAU,EAAE,IAAI,CAAC,UAAU,EAAE,UAAU,IAAI,CAAC;aAC7C;SACF,CAAC;KACH;IAED,IAAI;QACF,MAAM,MAAM,GAAG,IAAI,CAAC,KAAK,CAAC,IAAI,CAAC,MAAM,CAAkB,CAAC;QACxD,MAAM,CAAC,UAAU,GAAG;YAClB,KAAK,EAAE,IAAI,CAAC,UAAU,EAAE,KAAK,IAAI,CAAC;YAClC,MAAM,EAAE,IAAI,CAAC,UAAU,EAAE,MAAM,IAAI,CAAC;YACpC,UAAU,EAAE,IAAI,CAAC,UAAU,EAAE,UAAU,IAAI,CAAC;SAC7C,CAAC;QACF,OAAO,MAAM,CAAC;KACf;IAAC,OAAO,GAAG,EAAE;QACZ,OAAO;YACL,IAAI,EAAE,KAAK;YACX,MAAM,EAAE,6BAA6B,IAAI,CAAC,MAAM,EAAE;YAClD,UAAU,EAAE;gBACV,KAAK,EAAE,IAAI,CAAC,UAAU,EAAE,KAAK,IAAI,CAAC;gBAClC,MAAM,EAAE,IAAI,CAAC,UAAU,EAAE,MAAM,IAAI,CAAC;gBACpC,UAAU,EAAE,IAAI,CAAC,UAAU,EAAE,UAAU,IAAI,CAAC;aAC7C;SACF,CAAC;KACH;AACH,CAAC;AApDD,4CAoDC;AAED,kBAAe;IACb,iBAAiB;IACjB,gBAAgB;CACjB,CAAC"}
1	+ {"version":3,"file":"assertions.js","sourceRoot":"","sources":["../src/assertions.ts"],"names":[],"mappings":";;;;;;AAAA,oEAAuC;AACvC,wDAAgC;AAEhC,qDAAyF;AACzF,uCAA6C;AAC7C,iDAAiD;AACjD,6CAAsD;AAItD,MAAM,aAAa,GAAG,iCAAiC,CAAC;AAExD,MAAM,qCAAqC,GAAG,GAAG,CAAC;AAE3C,KAAK,UAAU,aAAa,CAAC,IAAc,EAAE,MAAc;IAChE,MAAM,UAAU,GAAG;QACjB,KAAK,EAAE,CAAC;QACR,MAAM,EAAE,CAAC;QACT,UAAU,EAAE,CAAC;KACd,CAAC;IAEF,IAAI,CAAC,IAAI,CAAC,MAAM,EAAE;QAChB,OAAO,EAAE,IAAI,EAAE,IAAI,EAAE,MAAM,EAAE,eAAe,EAAE,UAAU,EAAE,CAAC;KAC5D;IAED,KAAK,MAAM,SAAS,IAAI,IAAI,CAAC,MAAM,EAAE;QACnC,MAAM,MAAM,GAAG,MAAM,YAAY,CAAC,SAAS,EAAE,IAAI,EAAE,MAAM,CAAC,CAAC;QAC3D,IAAI,CAAC,MAAM,CAAC,IAAI,EAAE;YAChB,OAAO,MAAM,CAAC;SACf;QAED,IAAI,MAAM,CAAC,UAAU,EAAE;YACrB,UAAU,CAAC,KAAK,IAAI,MAAM,CAAC,UAAU,CAAC,KAAK,CAAC;YAC5C,UAAU,CAAC,MAAM,IAAI,MAAM,CAAC,UAAU,CAAC,MAAM,CAAC;YAC9C,UAAU,CAAC,UAAU,IAAI,MAAM,CAAC,UAAU,CAAC,UAAU,CAAC;SACvD;KACF;IAED,OAAO,EAAE,IAAI,EAAE,IAAI,EAAE,MAAM,EAAE,uBAAuB,EAAE,UAAU,EAAE,CAAC;AACrE,CAAC;AAzBD,sCAyBC;AAEM,KAAK,UAAU,YAAY,CAChC,SAAoB,EACpB,IAAc,EACd,MAAc;IAEd,IAAI,IAAI,GAAY,KAAK,CAAC;IAE1B,IAAI,SAAS,CAAC,IAAI,KAAK,QAAQ,EAAE;QAC/B,IAAI,GAAG,SAAS,CAAC,KAAK,KAAK,MAAM,CAAC;QAClC,OAAO;YACL,IAAI;YACJ,MAAM,EAAE,IAAI,CAAC,CAAC,CAAC,kBAAkB,CAAC,CAAC,CAAC,oBAAoB,SAAS,CAAC,KAAK,GAAG;SAC3E,CAAC;KACH;IAED,IAAI,SAAS,CAAC,IAAI,KAAK,SAAS,EAAE;QAChC,IAAI;YACF,IAAI,CAAC,KAAK,CAAC,MAAM,CAAC,CAAC;YACnB,OAAO,EAAE,IAAI,EAAE,IAAI,EAAE,MAAM,EAAE,kBAAkB,EAAE,CAAC;SACnD;QAAC,OAAO,GAAG,EAAE;YACZ,OAAO;gBACL,IAAI,EAAE,KAAK;gBACX,MAAM,EAAE,2DAA2D,GAAG,EAAE;aACzE,CAAC;SACH;KACF;IAED,IAAI,SAAS,CAAC,IAAI,KAAK,eAAe,EAAE;QACtC,MAAM,IAAI,GAAG,YAAY,CAAC,MAAM,CAAC,CAAC;QAClC,OAAO;YACL,IAAI;YACJ,MAAM,EAAE,IAAI,CAAC,CAAC,CAAC,kBAAkB,CAAC,CAAC,CAAC,uCAAuC;SAC5E,CAAC;KACH;IAED,IAAI,SAAS,CAAC,IAAI,KAAK,YAAY,EAAE;QACnC,IAAI;YACF,MAAM,cAAc,GAAG,IAAI,QAAQ,CAAC,QAAQ,EAAE,UAAU,SAAS,CAAC,KAAK,EAAE,CAAC,CAAC;YAC3E,IAAI,GAAG,cAAc,CAAC,MAAM,CAAC,CAAC;SAC/B;QAAC,OAAO,GAAG,EAAE;YACZ,OAAO;gBACL,IAAI,EAAE,KAAK;gBACX,MAAM,EAAE,gCAAiC,GAAa,CAAC,OAAO,EAAE;aACjE,CAAC;SACH;QACD,OAAO;YACL,IAAI;YACJ,MAAM,EAAE,IAAI,CAAC,CAAC,CAAC,kBAAkB,CAAC,CAAC,CAAC,gCAAgC;SACrE,CAAC;KACH;IAED,IAAI,SAAS,CAAC,IAAI,KAAK,SAAS,EAAE;QAChC,IAAA,wBAAS,EAAC,SAAS,CAAC,KAAK,EAAE,+CAA+C,CAAC,CAAC;QAC5E,IAAA,wBAAS,EAAC,SAAS,CAAC,SAAS,EAAE,4CAA4C,CAAC,CAAC;QAC7E,OAAO,iBAAiB,CAAC,SAAS,CAAC,KAAK,EAAE,MAAM,EAAE,SAAS,CAAC,SAAS,CAAC,CAAC;KACxE;IAED,IAAI,SAAS,CAAC,IAAI,KAAK,YAAY,EAAE;QACnC,IAAA,wBAAS,EAAC,SAAS,CAAC,KAAK,EAAE,+CAA+C,CAAC,CAAC;QAC5E,OAAO,gBAAgB,CAAC,SAAS,CAAC,KAAK,EAAE,MAAM,EAAE,IAAI,CAAC,OAAO,CAAC,CAAC;KAChE;IAED,MAAM,IAAI,KAAK,CAAC,0BAA0B,GAAG,SAAS,CAAC,IAAI,CAAC,CAAC;AAC/D,CAAC;AA/DD,oCA+DC;AAED,SAAS,YAAY,CAAC,GAAW;IAC/B,oDAAoD;IACpD,MAAM,WAAW,GAAG,wBAAwB,CAAC;IAE7C,MAAM,KAAK,GAAG,GAAG,CAAC,KAAK,CAAC,WAAW,CAAC,CAAC;IAErC,IAAI,CAAC,KAAK,EAAE;QACV,OAAO,KAAK,CAAC;KACd;IAED,IAAI;QACF,IAAI,CAAC,KAAK,CAAC,KAAK,CAAC,CAAC,CAAC,CAAC,CAAC;QACrB,OAAO,IAAI,CAAC;KACb;IAAC,OAAO,KAAK,EAAE;QACd,OAAO,KAAK,CAAC;KACd;AACH,CAAC;AAEM,KAAK,UAAU,iBAAiB,CACrC,QAAgB,EAChB,MAAc,EACd,SAAiB;IAEjB,MAAM,iBAAiB,GAAG,MAAM,oCAAwB,CAAC,gBAAgB,CAAC,QAAQ,CAAC,CAAC;IACpF,MAAM,eAAe,GAAG,MAAM,oCAAwB,CAAC,gBAAgB,CAAC,MAAM,CAAC,CAAC;IAEhF,MAAM,UAAU,GAAG;QACjB,KAAK,EAAE,CAAC,iBAAiB,CAAC,UAAU,EAAE,KAAK,IAAI,CAAC,CAAC,GAAG,CAAC,eAAe,CAAC,UAAU,EAAE,KAAK,IAAI,CAAC,CAAC;QAC5F,MAAM,EAAE,CAAC,iBAAiB,CAAC,UAAU,EAAE,MAAM,IAAI,CAAC,CAAC,GAAG,CAAC,eAAe,CAAC,UAAU,EAAE,MAAM,IAAI,CAAC,CAAC;QAC/F,UAAU,EACR,CAAC,iBAAiB,CAAC,UAAU,EAAE,UAAU,IAAI,CAAC,CAAC;YAC/C,CAAC,eAAe,CAAC,UAAU,EAAE,UAAU,IAAI,CAAC,CAAC;KAChD,CAAC;IAEF,IAAI,iBAAiB,CAAC,KAAK,IAAI,eAAe,CAAC,KAAK,EAAE;QACpD,OAAO;YACL,IAAI,EAAE,KAAK;YACX,MAAM,EACJ,iBAAiB,CAAC,KAAK,IAAI,eAAe,CAAC,KAAK,IAAI,mCAAmC;YACzF,UAAU;SACX,CAAC;KACH;IAED,IAAI,CAAC,iBAAiB,CAAC,SAAS,IAAI,CAAC,eAAe,CAAC,SAAS,EAAE;QAC9D,OAAO;YACL,IAAI,EAAE,KAAK;YACX,MAAM,EAAE,qBAAqB;YAC7B,UAAU;SACX,CAAC;KACH;IAED,MAAM,UAAU,GAAG,IAAA,0BAAgB,EAAC,iBAAiB,CAAC,SAAS,EAAE,eAAe,CAAC,SAAS,CAAC,CAAC;IAC5F,IAAI,UAAU,GAAG,SAAS,EAAE;QAC1B,OAAO;YACL,IAAI,EAAE,KAAK;YACX,MAAM,EAAE,cAAc,UAAU,2BAA2B,SAAS,EAAE;YACtE,UAAU;SACX,CAAC;KACH;IACD,OAAO;QACL,IAAI,EAAE,IAAI;QACV,MAAM,EAAE,cAAc,UAAU,8BAA8B,SAAS,EAAE;QACzE,UAAU;KACX,CAAC;AACJ,CAAC;AA9CD,8CA8CC;AAEM,KAAK,UAAU,gBAAgB,CACpC,QAAgB,EAChB,MAAc,EACd,OAAuB;IAEvB,IAAI,CAAC,OAAO,EAAE;QACZ,MAAM,IAAI,KAAK,CACb,wFAAwF,CACzF,CAAC;KACH;IAED,MAAM,MAAM,GAAG,kBAAQ,CAAC,YAAY,CAAC,OAAO,CAAC,YAAY,IAAI,mCAAsB,EAAE;QACnF,OAAO,EAAE,MAAM;QACf,MAAM,EAAE,QAAQ;KACjB,CAAC,CAAC;IAEH,IAAI,QAAQ,GAAG,OAAO,CAAC,QAAQ,IAAI,kCAAsB,CAAC;IAC1D,IAAI,OAAO,QAAQ,KAAK,QAAQ,EAAE;QAChC,QAAQ,GAAG,MAAM,IAAA,8BAAe,EAAC,QAAQ,CAAC,CAAC;KAC5C;IACD,MAAM,IAAI,GAAG,MAAM,QAAQ,CAAC,OAAO,CAAC,MAAM,CAAC,CAAC;IAC5C,IAAI,IAAI,CAAC,KAAK,IAAI,CAAC,IAAI,CAAC,MAAM,EAAE;QAC9B,OAAO;YACL,IAAI,EAAE,KAAK;YACX,MAAM,EAAE,IAAI,CAAC,KAAK,IAAI,WAAW;YACjC,UAAU,EAAE;gBACV,KAAK,EAAE,IAAI,CAAC,UAAU,EAAE,KAAK,IAAI,CAAC;gBAClC,MAAM,EAAE,IAAI,CAAC,UAAU,EAAE,MAAM,IAAI,CAAC;gBACpC,UAAU,EAAE,IAAI,CAAC,UAAU,EAAE,UAAU,IAAI,CAAC;aAC7C;SACF,CAAC;KACH;IAED,IAAI;QACF,MAAM,MAAM,GAAG,IAAI,CAAC,KAAK,CAAC,IAAI,CAAC,MAAM,CAAkB,CAAC;QACxD,MAAM,CAAC,UAAU,GAAG;YAClB,KAAK,EAAE,IAAI,CAAC,UAAU,EAAE,KAAK,IAAI,CAAC;YAClC,MAAM,EAAE,IAAI,CAAC,UAAU,EAAE,MAAM,IAAI,CAAC;YACpC,UAAU,EAAE,IAAI,CAAC,UAAU,EAAE,UAAU,IAAI,CAAC;SAC7C,CAAC;QACF,OAAO,MAAM,CAAC;KACf;IAAC,OAAO,GAAG,EAAE;QACZ,OAAO;YACL,IAAI,EAAE,KAAK;YACX,MAAM,EAAE,6BAA6B,IAAI,CAAC,MAAM,EAAE;YAClD,UAAU,EAAE;gBACV,KAAK,EAAE,IAAI,CAAC,UAAU,EAAE,KAAK,IAAI,CAAC;gBAClC,MAAM,EAAE,IAAI,CAAC,UAAU,EAAE,MAAM,IAAI,CAAC;gBACpC,UAAU,EAAE,IAAI,CAAC,UAAU,EAAE,UAAU,IAAI,CAAC;aAC7C;SACF,CAAC;KACH;AACH,CAAC;AApDD,4CAoDC;AAED,SAAgB,mBAAmB,CAAC,QAAgB;IAClD,MAAM,KAAK,GAAG,QAAQ,CAAC,KAAK,CAAC,aAAa,CAAC,CAAC;IAC5C,IAAI,KAAK,EAAE;QACT,MAAM,SAAS,GAAG,UAAU,CAAC,KAAK,CAAC,CAAC,CAAC,CAAC,IAAI,qCAAqC,CAAC;QAChF,MAAM,IAAI,GAAG,QAAQ,CAAC,OAAO,CAAC,aAAa,EAAE,EAAE,CAAC,CAAC,IAAI,EAAE,CAAC;QACxD,OAAO;YACL,IAAI,EAAE,SAAS;YACf,KAAK,EAAE,IAAI;YACX,SAAS;SACV,CAAC;KACH;IACD,IAAI,QAAQ,CAAC,UAAU,CAAC,KAAK,CAAC,IAAI,QAAQ,CAAC,UAAU,CAAC,OAAO,CAAC,EAAE;QAC9D,wCAAwC;QACxC,MAAM,WAAW,GAAG,QAAQ,CAAC,UAAU,CAAC,KAAK,CAAC,CAAC,CAAC,CAAC,KAAK,CAAC,MAAM,CAAC,CAAC,CAAC,OAAO,CAAC,MAAM,CAAC;QAC/E,MAAM,YAAY,GAAG,QAAQ,CAAC,KAAK,CAAC,WAAW,CAAC,CAAC;QACjD,OAAO;YACL,IAAI,EAAE,YAAY;YAClB,KAAK,EAAE,YAAY;SACpB,CAAC;KACH;IACD,IAAI,QAAQ,CAAC,UAAU,CAAC,QAAQ,CAAC,EAAE;QACjC,OAAO;YACL,IAAI,EAAE,YAAY;YAClB,KAAK,EAAE,QAAQ,CAAC,KAAK,CAAC,CAAC,CAAC;SACzB,CAAC;KACH;IACD,IAAI,QAAQ,KAAK,SAAS,IAAI,QAAQ,KAAK,eAAe,EAAE;QAC1D,OAAO;YACL,IAAI,EAAE,QAAQ;SACf,CAAC;KACH;IACD,OAAO;QACL,IAAI,EAAE,QAAQ;QACd,KAAK,EAAE,QAAQ;KAChB,CAAC;AACJ,CAAC;AAnCD,kDAmCC;AAED,kBAAe;IACb,iBAAiB;IACjB,gBAAgB;CACjB,CAAC"}

package/dist/cache.d.ts ADDED Viewed

@@ -0,0 +1,8 @@
+import type { RequestInfo, RequestInit } from 'node-fetch';
+export declare function fetchJsonWithCache(url: RequestInfo, options: RequestInit | undefined, timeout: number): Promise<{
+    data: any;
+    cached: boolean;
+}>;
+export declare function enableCache(): void;
+export declare function disableCache(): void;
+//# sourceMappingURL=cache.d.ts.map

package/dist/cache.d.ts.map ADDED Viewed

	@@ -0,0 +1 @@
1	+ {"version":3,"file":"cache.d.ts","sourceRoot":"","sources":["../src/cache.ts"],"names":[],"mappings":"AASA,OAAO,KAAK,EAAE,WAAW,EAAE,WAAW,EAAE,MAAM,YAAY,CAAC;AA4B3D,wBAAsB,kBAAkB,CACtC,GAAG,EAAE,WAAW,EAChB,OAAO,yBAAkB,EACzB,OAAO,EAAE,MAAM,GACd,OAAO,CAAC;IAAE,IAAI,EAAE,GAAG,CAAC;IAAC,MAAM,EAAE,OAAO,CAAA;CAAE,CAAC,CAuCzC;AAED,wBAAgB,WAAW,SAE1B;AAED,wBAAgB,YAAY,SAG3B"}

package/dist/cache.js ADDED Viewed

@@ -0,0 +1,78 @@
+"use strict";
+var __importDefault = (this && this.__importDefault) || function (mod) {
+    return (mod && mod.__esModule) ? mod : { "default": mod };
+};
+Object.defineProperty(exports, "__esModule", { value: true });
+exports.disableCache = exports.enableCache = exports.fetchJsonWithCache = void 0;
+const node_path_1 = __importDefault(require("node:path"));
+const cache_manager_1 = __importDefault(require("cache-manager"));
+const cache_manager_fs_hash_1 = __importDefault(require("cache-manager-fs-hash"));
+const logger_js_1 = __importDefault(require("./logger.js"));
+const util_js_1 = require("./util.js");
+let cacheInstance;
+let enabled = typeof process.env.PROMPTFOO_CACHE_ENABLED === 'undefined'
+    ? true
+    : Boolean(process.env.PROMPTFOO_CACHE_ENABLED);
+const cacheType = process.env.PROMPTFOO_CACHE_TYPE || (process.env.NODE_ENV === 'test' ? 'memory' : 'disk');
+function getCache() {
+    if (!cacheInstance) {
+        cacheInstance = cache_manager_1.default.caching({
+            store: cacheType === 'disk' ? cache_manager_fs_hash_1.default : 'memory',
+            options: {
+                max: process.env.PROMPTFOO_CACHE_MAX_FILE_COUNT || 10000,
+                path: process.env.PROMPTFOO_CACHE_PATH || node_path_1.default.join((0, util_js_1.getConfigDirectoryPath)(), 'cache'),
+                ttl: process.env.PROMPTFOO_CACHE_TTL || 60 * 60 * 24 * 14,
+                maxsize: process.env.PROMPTFOO_CACHE_MAX_SIZE || 1e7, // in bytes, 10mb
+                //zip: true, // whether to use gzip compression
+            },
+        });
+    }
+    return cacheInstance;
+}
+async function fetchJsonWithCache(url, options = {}, timeout) {
+    if (!enabled) {
+        const resp = await (0, util_js_1.fetchWithTimeout)(url, options, timeout);
+        return {
+            cached: false,
+            data: await resp.json(),
+        };
+    }
+    const cache = await getCache();
+    const copy = Object.assign({}, options);
+    delete copy.headers;
+    const cacheKey = `fetch:${url}:${JSON.stringify(copy)}`;
+    // Try to get the cached response
+    const cachedResponse = await cache.get(cacheKey);
+    if (cachedResponse) {
+        logger_js_1.default.debug(`Returning cached response for ${url}: ${cachedResponse}`);
+        return {
+            cached: true,
+            data: JSON.parse(cachedResponse),
+        };
+    }
+    // Fetch the actual data and store it in the cache
+    const response = await (0, util_js_1.fetchWithTimeout)(url, options, timeout);
+    try {
+        const data = await response.json();
+        logger_js_1.default.debug(`Storing ${url} response in cache: ${data}`);
+        await cache.set(cacheKey, JSON.stringify(data));
+        return {
+            cached: false,
+            data,
+        };
+    }
+    catch (err) {
+        throw new Error(`Error parsing response from ${url}: ${err}`);
+    }
+}
+exports.fetchJsonWithCache = fetchJsonWithCache;
+function enableCache() {
+    enabled = true;
+}
+exports.enableCache = enableCache;
+function disableCache() {
+    logger_js_1.default.info('Cache is disabled.');
+    enabled = false;
+}
+exports.disableCache = disableCache;
+//# sourceMappingURL=cache.js.map

package/dist/cache.js.map ADDED Viewed

@@ -0,0 +1 @@

+ {"version":3,"file":"cache.js","sourceRoot":"","sources":["../src/cache.ts"],"names":[],"mappings":";;;;;;AAAA,0DAA6B;AAE7B,kEAAyC;AACzC,kFAA4C;AAE5C,4DAAiC;AACjC,uCAAqE;AAKrE,IAAI,aAAgC,CAAC;AAErC,IAAI,OAAO,GACT,OAAO,OAAO,CAAC,GAAG,CAAC,uBAAuB,KAAK,WAAW;IACxD,CAAC,CAAC,IAAI;IACN,CAAC,CAAC,OAAO,CAAC,OAAO,CAAC,GAAG,CAAC,uBAAuB,CAAC,CAAC;AAEnD,MAAM,SAAS,GACb,OAAO,CAAC,GAAG,CAAC,oBAAoB,IAAI,CAAC,OAAO,CAAC,GAAG,CAAC,QAAQ,KAAK,MAAM,CAAC,CAAC,CAAC,QAAQ,CAAC,CAAC,CAAC,MAAM,CAAC,CAAC;AAE5F,SAAS,QAAQ;IACf,IAAI,CAAC,aAAa,EAAE;QAClB,aAAa,GAAG,uBAAY,CAAC,OAAO,CAAC;YACnC,KAAK,EAAE,SAAS,KAAK,MAAM,CAAC,CAAC,CAAC,+BAAO,CAAC,CAAC,CAAC,QAAQ;YAChD,OAAO,EAAE;gBACP,GAAG,EAAE,OAAO,CAAC,GAAG,CAAC,8BAA8B,IAAI,KAAM;gBACzD,IAAI,EAAE,OAAO,CAAC,GAAG,CAAC,oBAAoB,IAAI,mBAAI,CAAC,IAAI,CAAC,IAAA,gCAAsB,GAAE,EAAE,OAAO,CAAC;gBACtF,GAAG,EAAE,OAAO,CAAC,GAAG,CAAC,mBAAmB,IAAI,EAAE,GAAG,EAAE,GAAG,EAAE,GAAG,EAAE;gBACzD,OAAO,EAAE,OAAO,CAAC,GAAG,CAAC,wBAAwB,IAAI,GAAG,EAAE,iBAAiB;gBACvE,+CAA+C;aAChD;SACF,CAAC,CAAC;KACJ;IACD,OAAO,aAAa,CAAC;AACvB,CAAC;AAEM,KAAK,UAAU,kBAAkB,CACtC,GAAgB,EAChB,UAAuB,EAAE,EACzB,OAAe;IAEf,IAAI,CAAC,OAAO,EAAE;QACZ,MAAM,IAAI,GAAG,MAAM,IAAA,0BAAgB,EAAC,GAAG,EAAE,OAAO,EAAE,OAAO,CAAC,CAAC;QAC3D,OAAO;YACL,MAAM,EAAE,KAAK;YACb,IAAI,EAAE,MAAM,IAAI,CAAC,IAAI,EAAE;SACxB,CAAC;KACH;IAED,MAAM,KAAK,GAAG,MAAM,QAAQ,EAAE,CAAC;IAE/B,MAAM,IAAI,GAAG,MAAM,CAAC,MAAM,CAAC,EAAE,EAAE,OAAO,CAAC,CAAC;IACxC,OAAO,IAAI,CAAC,OAAO,CAAC;IACpB,MAAM,QAAQ,GAAG,SAAS,GAAG,IAAI,IAAI,CAAC,SAAS,CAAC,IAAI,CAAC,EAAE,CAAC;IAExD,iCAAiC;IACjC,MAAM,cAAc,GAAG,MAAM,KAAK,CAAC,GAAG,CAAC,QAAQ,CAAC,CAAC;IAEjD,IAAI,cAAc,EAAE;QAClB,mBAAM,CAAC,KAAK,CAAC,iCAAiC,GAAG,KAAK,cAAc,EAAE,CAAC,CAAC;QACxE,OAAO;YACL,MAAM,EAAE,IAAI;YACZ,IAAI,EAAE,IAAI,CAAC,KAAK,CAAC,cAAwB,CAAC;SAC3C,CAAC;KACH;IAED,kDAAkD;IAClD,MAAM,QAAQ,GAAG,MAAM,IAAA,0BAAgB,EAAC,GAAG,EAAE,OAAO,EAAE,OAAO,CAAC,CAAC;IAC/D,IAAI;QACF,MAAM,IAAI,GAAG,MAAM,QAAQ,CAAC,IAAI,EAAE,CAAC;QACnC,mBAAM,CAAC,KAAK,CAAC,WAAW,GAAG,uBAAuB,IAAI,EAAE,CAAC,CAAC;QAC1D,MAAM,KAAK,CAAC,GAAG,CAAC,QAAQ,EAAE,IAAI,CAAC,SAAS,CAAC,IAAI,CAAC,CAAC,CAAC;QAChD,OAAO;YACL,MAAM,EAAE,KAAK;YACb,IAAI;SACL,CAAC;KACH;IAAC,OAAO,GAAG,EAAE;QACZ,MAAM,IAAI,KAAK,CAAC,+BAA+B,GAAG,KAAK,GAAG,EAAE,CAAC,CAAC;KAC/D;AACH,CAAC;AA3CD,gDA2CC;AAED,SAAgB,WAAW;IACzB,OAAO,GAAG,IAAI,CAAC;AACjB,CAAC;AAFD,kCAEC;AAED,SAAgB,YAAY;IAC1B,mBAAM,CAAC,IAAI,CAAC,oBAAoB,CAAC,CAAC;IAClC,OAAO,GAAG,KAAK,CAAC;AAClB,CAAC;AAHD,oCAGC"}

package/dist/evaluator.d.ts CHANGED Viewed

@@ -1,3 +1,3 @@
-import type { EvaluateOptions, EvaluateSummary } from './types.js';
-export declare function evaluate(options: EvaluateOptions): Promise<EvaluateSummary>;
+import type { EvaluateOptions, EvaluateSummary, TestSuite } from './types.js';
+export declare function evaluate(testSuite: TestSuite, options: EvaluateOptions): Promise<EvaluateSummary>;
 //# sourceMappingURL=evaluator.d.ts.map