promptfoo 0.6.0 → 0.8.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (55) hide show
  1. package/README.md +137 -74
  2. package/dist/assertions.d.ts +4 -10
  3. package/dist/assertions.d.ts.map +1 -1
  4. package/dist/assertions.js +126 -20
  5. package/dist/assertions.js.map +1 -1
  6. package/dist/cache.d.ts +8 -0
  7. package/dist/cache.d.ts.map +1 -0
  8. package/dist/cache.js +78 -0
  9. package/dist/cache.js.map +1 -0
  10. package/dist/evaluator.d.ts +2 -2
  11. package/dist/evaluator.d.ts.map +1 -1
  12. package/dist/evaluator.js +73 -40
  13. package/dist/evaluator.js.map +1 -1
  14. package/dist/index.d.ts +6 -4
  15. package/dist/index.d.ts.map +1 -1
  16. package/dist/index.js +8 -21
  17. package/dist/index.js.map +1 -1
  18. package/dist/main.js +92 -80
  19. package/dist/main.js.map +1 -1
  20. package/dist/onboarding.d.ts +4 -0
  21. package/dist/onboarding.d.ts.map +1 -0
  22. package/dist/onboarding.js +63 -0
  23. package/dist/onboarding.js.map +1 -0
  24. package/dist/providers/localai.d.ts.map +1 -1
  25. package/dist/providers/localai.js +7 -9
  26. package/dist/providers/localai.js.map +1 -1
  27. package/dist/providers/openai.d.ts.map +1 -1
  28. package/dist/providers/openai.js +31 -38
  29. package/dist/providers/openai.js.map +1 -1
  30. package/dist/providers.d.ts +1 -0
  31. package/dist/providers.d.ts.map +1 -1
  32. package/dist/providers.js +11 -1
  33. package/dist/providers.js.map +1 -1
  34. package/dist/types.d.ts +46 -13
  35. package/dist/types.d.ts.map +1 -1
  36. package/dist/util.d.ts +6 -3
  37. package/dist/util.d.ts.map +1 -1
  38. package/dist/util.js +73 -2
  39. package/dist/util.js.map +1 -1
  40. package/dist/web/server.d.ts.map +1 -1
  41. package/dist/web/server.js +0 -11
  42. package/dist/web/server.js.map +1 -1
  43. package/package.json +6 -2
  44. package/src/assertions.ts +141 -28
  45. package/src/cache.ts +90 -0
  46. package/src/evaluator.ts +89 -43
  47. package/src/index.ts +14 -26
  48. package/src/main.ts +117 -99
  49. package/src/onboarding.ts +61 -0
  50. package/src/providers/localai.ts +9 -11
  51. package/src/providers/openai.ts +34 -42
  52. package/src/providers.ts +9 -0
  53. package/src/types.ts +95 -16
  54. package/src/util.ts +90 -4
  55. package/src/web/server.ts +0 -18
package/README.md CHANGED
@@ -9,31 +9,44 @@ With promptfoo, you can:
9
9
 
10
10
  - **Test multiple prompts** against predefined test cases
11
11
  - **Evaluate quality and catch regressions** by comparing LLM outputs side-by-side
12
- - **Speed up evaluations** by running tests concurrently
12
+ - **Speed up evaluations** with caching and concurrent tests
13
13
  - **Flag bad outputs automatically** by setting "expectations"
14
14
  - Use as a command line tool, or integrate into your workflow as a library
15
15
  - Use OpenAI models, open-source models like Llama and Vicuna, or integrate custom API providers for any LLM API
16
16
 
17
+ The goal: **test-driven prompt engineering**, rather than trial-and-error.
18
+
17
19
  # [» View full documentation «](https://promptfoo.dev/docs/intro)
18
20
 
19
- promptfoo produces matrix views that allow you to quickly review prompt outputs across many inputs. The goal: tune prompts systematically across all relevant test cases, instead of testing prompts by trial and error.
21
+ promptfoo produces matrix views that let you quickly evaluate outputs across many prompts.
20
22
 
21
23
  Here's an example of a side-by-side comparison of multiple prompts and inputs:
22
24
 
23
25
  ![Prompt evaluation matrix - web viewer](https://github.com/typpo/promptfoo/assets/310310/ddcd77df-2783-425e-ade9-1a20dd0b6cd2)
24
26
 
25
27
  It works on the command line too:
28
+
26
29
  ![Prompt evaluation](https://user-images.githubusercontent.com/310310/235529431-f4d5c395-d569-448e-9697-cd637e0372a5.gif)
27
30
 
28
- ## Usage (command line & web viewer)
31
+ ## Workflow
32
+
33
+ Start by establishing a handful of test cases - core use cases and failure cases that you want to ensure your prompt can handle.
34
+
35
+ As you explore modifications to the prompt, use `promptfoo eval` to rate all outputs. This ensures the prompt is actually improving overall.
36
+
37
+ As you collect more examples and establish a user feedback loop, continue to build the pool of test cases.
29
38
 
30
- To get started, run the following command:
39
+ <img width="772" alt="LLM ops" src="https://github.com/typpo/promptfoo/assets/310310/cf0461a7-2832-4362-9fbb-4ebd911d06ff">
40
+
41
+ ## Usage
42
+
43
+ To get started, run this command:
31
44
 
32
45
  ```
33
46
  npx promptfoo init
34
47
  ```
35
48
 
36
- This will create some templates in your current directory: `prompts.txt`, `vars.csv`, and `promptfooconfig.js`.
49
+ This will create some placeholders in your current directory: `prompts.txt` and `promptfooconfig.yaml`.
37
50
 
38
51
  After editing the prompts and variables to your liking, run the eval command to kick off an evaluation:
39
52
 
@@ -41,20 +54,75 @@ After editing the prompts and variables to your liking, run the eval command to
41
54
  npx promptfoo eval
42
55
  ```
43
56
 
44
- If you're looking to customize your usage, you have a wide set of parameters at your disposal. See the [Configuration docs](https://www.promptfoo.dev/docs/configuration/parameters) for more detail:
45
-
46
- | Option | Description |
47
- | ----------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
48
- | `-p, --prompts <paths...>` | Paths to prompt files, directory, or glob |
49
- | `-r, --providers <name or path...>` | One of: openai:chat, openai:completion, openai:model-name, localai:chat:model-name, localai:completion:model-name. See [API providers](https://www.promptfoo.dev/docs/configuration/providers) |
50
- | `-o, --output <path>` | Path to output file (csv, json, yaml, html) |
51
- | `-v, --vars <path>` | Path to file with prompt variables (csv, json, yaml) |
52
- | `-c, --config <path>` | Path to configuration file. `promptfooconfig.js[on]` is automatically loaded if present |
53
- | `-j, --max-concurrency <number>` | Maximum number of concurrent API calls |
54
- | `--table-cell-max-length <number>` | Truncate console table cells to this length |
55
- | `--prompt-prefix <path>` | This prefix is prepended to every prompt |
56
- | `--prompt-suffix <path>` | This suffix is append to every prompt |
57
- | `--grader` | Provider that will grade outputs, if you are using [LLM grading](https://www.promptfoo.dev/docs/configuration/expected-outputs) |
57
+ ### Configuration
58
+
59
+ The YAML configuration format runs each prompt through a series of example inputs (aka "test case") and checks if they meet requirements (aka "assert").
60
+
61
+ See the [Configuration docs](https://www.promptfoo.dev/docs/configuration/guide) for a detailed guide.
62
+
63
+ ```yaml
64
+ prompts: [prompts.txt]
65
+ providers: [openai:gpt-3.5-turbo]
66
+ tests:
67
+ - description: First test case - automatic review
68
+ vars:
69
+ var1: first variable's value
70
+ var2: another value
71
+ var3: some other value
72
+ assert:
73
+ - type: equality
74
+ value: expected LLM output goes here
75
+ - type: function
76
+ value: output.includes('some text')
77
+
78
+ - description: Second test case - manual review
79
+ # Test cases don't need assertions if you prefer to review the output yourself
80
+ vars:
81
+ var1: new value
82
+ var2: another value
83
+ var3: third value
84
+
85
+ - description: Third test case - other types of automatic review
86
+ vars:
87
+ var1: yet another value
88
+ var2: and another
89
+ var3: dear llm, please output your response in json format
90
+ assert:
91
+ - type: contains-json
92
+ - type: similarity
93
+ value: ensures that output is semantically similar to this text
94
+ - type: llm-rubric
95
+ value: ensure that output contains a reference to X
96
+ ```
97
+
98
+ ### Tests on spreadsheet
99
+
100
+ Some people prefer to configure their LLM tests in a CSV. In that case, the config is pretty simple:
101
+
102
+ ```yaml
103
+ prompts: [prompts.txt]
104
+ providers: [openai:gpt-3.5-turbo]
105
+ tests: tests.csv
106
+ ```
107
+
108
+ See [example CSV](https://github.com/typpo/promptfoo/blob/main/examples/simple-test/tests.csv).
109
+
110
+ ### Command-line
111
+
112
+ If you're looking to customize your usage, you have a wide set of parameters at your disposal.
113
+
114
+ | Option | Description |
115
+ | ----------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
116
+ | `-p, --prompts <paths...>` | Paths to [prompt files](https://promptfoo.dev/docs/configuration/parameters#prompt-files), directory, or glob |
117
+ | `-r, --providers <name or path...>` | One of: openai:chat, openai:completion, openai:model-name, localai:chat:model-name, localai:completion:model-name. See [API providers](https://promptfoo.dev/docs/configuration/providers) |
118
+ | `-o, --output <path>` | Path to [output file](https://promptfoo.dev/docs/configuration/parameters#output-file) (csv, json, yaml, html) |
119
+ | `--tests <path>` | Path to [external test file](https://promptfoo.dev/docs/configurationexpected-outputsassertions#load-an-external-tests-file) |
120
+ | `-c, --config <path>` | Path to [configuration file](https://promptfoo.dev/docs/configuration/guide). `promptfooconfig.js/json/yaml` is automatically loaded if present |
121
+ | `-j, --max-concurrency <number>` | Maximum number of concurrent API calls |
122
+ | `--table-cell-max-length <number>` | Truncate console table cells to this length |
123
+ | `--prompt-prefix <path>` | This prefix is prepended to every prompt |
124
+ | `--prompt-suffix <path>` | This suffix is append to every prompt |
125
+ | `--grader` | [Provider](https://promptfoo.dev/docs/configuration/providers) that will conduct the evaluation, if you are [using LLM to grade your output](https://promptfoo.dev/docs/configuration/expected-outputs#llm-evaluation) |
58
126
 
59
127
  After running an eval, you may optionally use the `view` command to open the web viewer:
60
128
 
@@ -66,10 +134,10 @@ npx promptfoo view
66
134
 
67
135
  #### Prompt quality
68
136
 
69
- In this example, we evaluate whether adding adjectives to the personality of an assistant bot affects the responses:
137
+ In [this example](https://github.com/typpo/promptfoo/tree/main/examples/assistant-cli), we evaluate whether adding adjectives to the personality of an assistant bot affects the responses:
70
138
 
71
139
  ```bash
72
- npx promptfoo eval -p prompts.txt -v vars.csv -r openai:gpt-3.5-turbo
140
+ npx promptfoo eval -p prompts.txt -r openai:gpt-3.5-turbo -t tests.csv
73
141
  ```
74
142
 
75
143
  <!--
@@ -80,15 +148,13 @@ npx promptfoo eval -p prompts.txt -v vars.csv -r openai:gpt-3.5-turbo
80
148
 
81
149
  This command will evaluate the prompts in `prompts.txt`, substituing the variable values from `vars.csv`, and output results in your terminal.
82
150
 
83
- Have a look at the setup and full output [here](https://github.com/typpo/promptfoo/tree/main/examples/assistant-cli).
84
-
85
151
  You can also output a nice [spreadsheet](https://docs.google.com/spreadsheets/d/1nanoj3_TniWrDl1Sj-qYqIMD6jwm5FBy15xPFdUTsmI/edit?usp=sharing), [JSON](https://github.com/typpo/promptfoo/blob/main/examples/simple-cli/output.json), YAML, or an HTML file:
86
152
 
87
153
  ![Table output](https://user-images.githubusercontent.com/310310/235483444-4ddb832d-e103-4b9c-a862-b0d6cc11cdc0.png)
88
154
 
89
155
  #### Model quality
90
156
 
91
- In this example, we evaluate the difference between GPT 3 and GPT 4 outputs for a given prompt:
157
+ In the [next example](https://github.com/typpo/promptfoo/tree/main/examples/gpt-3.5-vs-4), we evaluate the difference between GPT 3 and GPT 4 outputs for a given prompt:
92
158
 
93
159
  ```bash
94
160
  npx promptfoo eval -p prompts.txt -r openai:gpt-3.5-turbo openai:gpt-4 -o output.html
@@ -98,19 +164,46 @@ Produces this HTML table:
98
164
 
99
165
  ![Side-by-side evaluation of LLM model quality, gpt3 vs gpt4, html output](https://user-images.githubusercontent.com/310310/235490527-e0c31f40-00a0-493a-8afc-8ed6322bb5ca.png)
100
166
 
101
- Full setup and output [here](https://github.com/typpo/promptfoo/tree/main/examples/gpt-3.5-vs-4).
102
-
103
167
  ## Usage (node package)
104
168
 
105
169
  You can also use `promptfoo` as a library in your project by importing the `evaluate` function. The function takes the following parameters:
106
170
 
107
- - `providers`: a list of provider strings or `ApiProvider` objects, or just a single string or `ApiProvider`.
108
- - `options`: the prompts and variables you want to test:
171
+ - `testSuite`: the Javascript equivalent of the promptfooconfig.yaml
109
172
 
110
173
  ```typescript
111
- {
112
- prompts: string[];
174
+ interface TestSuiteConfig {
175
+ providers: string[]; // Valid provider name (e.g. openai:gpt-3.5-turbo)
176
+ prompts: string[]; // List of prompts
177
+ tests: string | TestCase[]; // Path to a CSV file, or list of test cases
178
+
179
+ defaultTest?: Omit<TestCase, 'description'>; // Optional: add default vars and assertions on test case
180
+ outputPath?: string; // Optional: write results to file
181
+ }
182
+
183
+ interface TestCase {
184
+ description?: string;
113
185
  vars?: Record<string, string>;
186
+ assert?: Assertion[];
187
+
188
+ prompt?: PromptConfig;
189
+ grading?: GradingConfig;
190
+ }
191
+
192
+ interface Assertion {
193
+ type: 'equality' | 'is-json' | 'contains-json' | 'function' | 'similarity' | 'llm-rubric';
194
+ value?: string;
195
+ threshold?: number; // For similarity assertions
196
+ provider?: ApiProvider; // For assertions that require an LLM provider
197
+ }
198
+ ```
199
+
200
+ - `options`: misc options related to how the tests are run
201
+
202
+ ```typescript
203
+ interface EvaluateOptions {
204
+ maxConcurrency?: number;
205
+ showProgressBar?: boolean;
206
+ generateSuggestions?: boolean;
114
207
  }
115
208
  ```
116
209
 
@@ -121,61 +214,31 @@ You can also use `promptfoo` as a library in your project by importing the `eval
121
214
  ```js
122
215
  import promptfoo from 'promptfoo';
123
216
 
124
- const options = {
217
+ const results = await promptfoo.evaluate({
125
218
  prompts: ['Rephrase this in French: {{body}}', 'Rephrase this like a pirate: {{body}}'],
126
- vars: [{ body: 'Hello world' }, { body: "I'm hungry" }],
127
- };
128
-
129
- (async () => {
130
- const summary = await promptfoo.evaluate('openai:gpt-3.5-turbo', options);
131
- console.log(summary);
132
- })();
133
- ```
134
-
135
- This code imports the `promptfoo` library, defines the evaluation options, and then calls the `evaluate` function with these options. The results are logged to the console:
136
-
137
- ```js
138
- {
139
- "results": [
219
+ providers: ['openai:gpt-3.5-turbo'],
220
+ tests: [
140
221
  {
141
- "prompt": {
142
- "raw": "Rephrase this in French: Hello world",
143
- "display": "Rephrase this in French: {{body}}"
222
+ vars: {
223
+ body: 'Hello world',
144
224
  },
145
- "vars": {
146
- "body": "Hello world"
225
+ },
226
+ {
227
+ vars: {
228
+ body: "I'm hungry",
147
229
  },
148
- "response": {
149
- "output": "Bonjour le monde",
150
- "tokenUsage": {
151
- "total": 19,
152
- "prompt": 16,
153
- "completion": 3
154
- }
155
- }
156
230
  },
157
- // ...
158
231
  ],
159
- "stats": {
160
- "successes": 4,
161
- "failures": 0,
162
- "tokenUsage": {
163
- "total": 120,
164
- "prompt": 72,
165
- "completion": 48
166
- }
167
- },
168
- "table": [
169
- // ...
170
- ]
171
- }
232
+ });
172
233
  ```
173
234
 
174
- [See full example here](https://github.com/typpo/promptfoo/tree/main/examples/simple-import)
235
+ This code imports the `promptfoo` library, defines the evaluation options, and then calls the `evaluate` function with these options.
236
+
237
+ See the full example [here](https://github.com/typpo/promptfoo/tree/main/examples/simple-import), which includes an example results object.
175
238
 
176
239
  ## Configuration
177
240
 
178
- - **[Setting up an eval](https://promptfoo.dev/docs/configuration/parameters)**: Learn more about how to set up prompt files, vars file, output, etc.
241
+ - **[Main guide](https://promptfoo.dev/docs/configuration/guide)**: Learn about how to configure your YAML file, setup prompt files, etc.
179
242
  - **[Configuring test cases](https://promptfoo.dev/docs/configuration/expected-outputs)**: Learn more about how to configure expected outputs and test assertions.
180
243
 
181
244
  ## Installation
@@ -1,15 +1,9 @@
1
- import type { EvaluateOptions, GradingConfig, TokenUsage } from './types.js';
2
- interface GradingResult {
3
- pass: boolean;
4
- reason: string;
5
- tokensUsed: TokenUsage;
6
- }
7
- export declare function matchesExpectedValue(expected: string, output: string, options: EvaluateOptions): Promise<{
8
- pass: boolean;
9
- reason?: string;
10
- }>;
1
+ import type { Assertion, GradingConfig, TestCase, GradingResult } from './types.js';
2
+ export declare function runAssertions(test: TestCase, output: string): Promise<GradingResult>;
3
+ export declare function runAssertion(assertion: Assertion, test: TestCase, output: string): Promise<GradingResult>;
11
4
  export declare function matchesSimilarity(expected: string, output: string, threshold: number): Promise<GradingResult>;
12
5
  export declare function matchesLlmRubric(expected: string, output: string, options?: GradingConfig): Promise<GradingResult>;
6
+ export declare function assertionFromString(expected: string): Assertion;
13
7
  declare const _default: {
14
8
  matchesSimilarity: typeof matchesSimilarity;
15
9
  matchesLlmRubric: typeof matchesLlmRubric;
@@ -1 +1 @@
1
- {"version":3,"file":"assertions.d.ts","sourceRoot":"","sources":["../src/assertions.ts"],"names":[],"mappings":"AAOA,OAAO,KAAK,EAAE,eAAe,EAAE,aAAa,EAAE,UAAU,EAAE,MAAM,YAAY,CAAC;AAE7E,UAAU,aAAa;IACrB,IAAI,EAAE,OAAO,CAAC;IACd,MAAM,EAAE,MAAM,CAAC;IACf,UAAU,EAAE,UAAU,CAAC;CACxB;AAMD,wBAAsB,oBAAoB,CACxC,QAAQ,EAAE,MAAM,EAChB,MAAM,EAAE,MAAM,EACd,OAAO,EAAE,eAAe,GACvB,OAAO,CAAC;IAAE,IAAI,EAAE,OAAO,CAAC;IAAC,MAAM,CAAC,EAAE,MAAM,CAAA;CAAE,CAAC,CAuB7C;AAED,wBAAsB,iBAAiB,CACrC,QAAQ,EAAE,MAAM,EAChB,MAAM,EAAE,MAAM,EACd,SAAS,EAAE,MAAM,GAChB,OAAO,CAAC,aAAa,CAAC,CA0CxB;AAED,wBAAsB,gBAAgB,CACpC,QAAQ,EAAE,MAAM,EAChB,MAAM,EAAE,MAAM,EACd,OAAO,CAAC,EAAE,aAAa,GACtB,OAAO,CAAC,aAAa,CAAC,CAgDxB;;;;;AAED,wBAGE"}
1
+ {"version":3,"file":"assertions.d.ts","sourceRoot":"","sources":["../src/assertions.ts"],"names":[],"mappings":"AAQA,OAAO,KAAK,EAAE,SAAS,EAAE,aAAa,EAAE,QAAQ,EAAE,aAAa,EAAE,MAAM,YAAY,CAAC;AAMpF,wBAAsB,aAAa,CAAC,IAAI,EAAE,QAAQ,EAAE,MAAM,EAAE,MAAM,GAAG,OAAO,CAAC,aAAa,CAAC,CAyB1F;AAED,wBAAsB,YAAY,CAChC,SAAS,EAAE,SAAS,EACpB,IAAI,EAAE,QAAQ,EACd,MAAM,EAAE,MAAM,GACb,OAAO,CAAC,aAAa,CAAC,CA2DxB;AAoBD,wBAAsB,iBAAiB,CACrC,QAAQ,EAAE,MAAM,EAChB,MAAM,EAAE,MAAM,EACd,SAAS,EAAE,MAAM,GAChB,OAAO,CAAC,aAAa,CAAC,CA0CxB;AAED,wBAAsB,gBAAgB,CACpC,QAAQ,EAAE,MAAM,EAChB,MAAM,EAAE,MAAM,EACd,OAAO,CAAC,EAAE,aAAa,GACtB,OAAO,CAAC,aAAa,CAAC,CAgDxB;AAED,wBAAgB,mBAAmB,CAAC,QAAQ,EAAE,MAAM,GAAG,SAAS,CAmC/D;;;;;AAED,wBAGE"}
@@ -3,7 +3,8 @@ var __importDefault = (this && this.__importDefault) || function (mod) {
3
3
  return (mod && mod.__esModule) ? mod : { "default": mod };
4
4
  };
5
5
  Object.defineProperty(exports, "__esModule", { value: true });
6
- exports.matchesLlmRubric = exports.matchesSimilarity = exports.matchesExpectedValue = void 0;
6
+ exports.assertionFromString = exports.matchesLlmRubric = exports.matchesSimilarity = exports.runAssertion = exports.runAssertions = void 0;
7
+ const tiny_invariant_1 = __importDefault(require("tiny-invariant"));
7
8
  const nunjucks_1 = __importDefault(require("nunjucks"));
8
9
  const openai_js_1 = require("./providers/openai.js");
9
10
  const util_js_1 = require("./util.js");
@@ -11,32 +12,100 @@ const providers_js_1 = require("./providers.js");
11
12
  const prompts_js_1 = require("./prompts.js");
12
13
  const SIMILAR_REGEX = /similar(?::|\((\d+(\.\d+)?)\):)/;
13
14
  const DEFAULT_SEMANTIC_SIMILARITY_THRESHOLD = 0.8;
14
- async function matchesExpectedValue(expected, output, options) {
15
- const match = expected.match(SIMILAR_REGEX);
16
- if (match) {
17
- const threshold = parseFloat(match[1]) || DEFAULT_SEMANTIC_SIMILARITY_THRESHOLD;
18
- const rest = expected.replace(SIMILAR_REGEX, '').trim();
19
- return matchesSimilarity(rest, output, threshold);
15
+ async function runAssertions(test, output) {
16
+ const tokensUsed = {
17
+ total: 0,
18
+ prompt: 0,
19
+ completion: 0,
20
+ };
21
+ if (!test.assert) {
22
+ return { pass: true, reason: 'No assertions', tokensUsed };
20
23
  }
21
- else if (expected.startsWith('fn:') || expected.startsWith('eval:')) {
22
- // TODO(1.0): delete eval: legacy option
23
- const sliceLength = expected.startsWith('fn:') ? 'fn:'.length : 'eval:'.length;
24
- const functionBody = expected.slice(sliceLength);
25
- const customFunction = new Function('output', `return ${functionBody}`);
26
- return { pass: customFunction(output) };
24
+ for (const assertion of test.assert) {
25
+ const result = await runAssertion(assertion, test, output);
26
+ if (!result.pass) {
27
+ return result;
28
+ }
29
+ if (result.tokensUsed) {
30
+ tokensUsed.total += result.tokensUsed.total;
31
+ tokensUsed.prompt += result.tokensUsed.prompt;
32
+ tokensUsed.completion += result.tokensUsed.completion;
33
+ }
34
+ }
35
+ return { pass: true, reason: 'All assertions passed', tokensUsed };
36
+ }
37
+ exports.runAssertions = runAssertions;
38
+ async function runAssertion(assertion, test, output) {
39
+ let pass = false;
40
+ if (assertion.type === 'equals') {
41
+ pass = assertion.value === output;
42
+ return {
43
+ pass,
44
+ reason: pass ? 'Assertion passed' : `Expected output "${assertion.value}"`,
45
+ };
46
+ }
47
+ if (assertion.type === 'is-json') {
48
+ try {
49
+ JSON.parse(output);
50
+ return { pass: true, reason: 'Assertion passed' };
51
+ }
52
+ catch (err) {
53
+ return {
54
+ pass: false,
55
+ reason: `Expected output to be valid JSON, but it isn't.\nError: ${err}`,
56
+ };
57
+ }
27
58
  }
28
- else if (expected.startsWith('grade:')) {
29
- return matchesLlmRubric(expected.slice(6), output, options.grading);
59
+ if (assertion.type === 'contains-json') {
60
+ const pass = containsJSON(output);
61
+ return {
62
+ pass,
63
+ reason: pass ? 'Assertion passed' : 'Expected output to contain valid JSON',
64
+ };
30
65
  }
31
- else {
32
- const pass = expected === output;
66
+ if (assertion.type === 'javascript') {
67
+ try {
68
+ const customFunction = new Function('output', `return ${assertion.value}`);
69
+ pass = customFunction(output);
70
+ }
71
+ catch (err) {
72
+ return {
73
+ pass: false,
74
+ reason: `Custom function threw error: ${err.message}`,
75
+ };
76
+ }
33
77
  return {
34
78
  pass,
35
- reason: pass ? undefined : `Expected: ${expected}, Output: ${output}`,
79
+ reason: pass ? 'Assertion passed' : `Custom function returned false`,
36
80
  };
37
81
  }
82
+ if (assertion.type === 'similar') {
83
+ (0, tiny_invariant_1.default)(assertion.value, 'Similarity assertion must have a string value');
84
+ (0, tiny_invariant_1.default)(assertion.threshold, 'Similarity assertion must have a threshold');
85
+ return matchesSimilarity(assertion.value, output, assertion.threshold);
86
+ }
87
+ if (assertion.type === 'llm-rubric') {
88
+ (0, tiny_invariant_1.default)(assertion.value, 'Similarity assertion must have a string value');
89
+ return matchesLlmRubric(assertion.value, output, test.options);
90
+ }
91
+ throw new Error('Unknown assertion type: ' + assertion.type);
92
+ }
93
+ exports.runAssertion = runAssertion;
94
+ function containsJSON(str) {
95
+ // Regular expression to check for JSON-like pattern
96
+ const jsonPattern = /({[\s\S]*}|\[[\s\S]*])/;
97
+ const match = str.match(jsonPattern);
98
+ if (!match) {
99
+ return false;
100
+ }
101
+ try {
102
+ JSON.parse(match[0]);
103
+ return true;
104
+ }
105
+ catch (error) {
106
+ return false;
107
+ }
38
108
  }
39
- exports.matchesExpectedValue = matchesExpectedValue;
40
109
  async function matchesSimilarity(expected, output, threshold) {
41
110
  const expectedEmbedding = await openai_js_1.DefaultEmbeddingProvider.callEmbeddingApi(expected);
42
111
  const outputEmbedding = await openai_js_1.DefaultEmbeddingProvider.callEmbeddingApi(output);
@@ -79,7 +148,7 @@ async function matchesLlmRubric(expected, output, options) {
79
148
  if (!options) {
80
149
  throw new Error('Cannot grade output without grading config. Specify --grader option or grading config.');
81
150
  }
82
- const prompt = nunjucks_1.default.renderString(options.prompt || prompts_js_1.DEFAULT_GRADING_PROMPT, {
151
+ const prompt = nunjucks_1.default.renderString(options.rubricPrompt || prompts_js_1.DEFAULT_GRADING_PROMPT, {
83
152
  content: output,
84
153
  rubric: expected,
85
154
  });
@@ -121,6 +190,43 @@ async function matchesLlmRubric(expected, output, options) {
121
190
  }
122
191
  }
123
192
  exports.matchesLlmRubric = matchesLlmRubric;
193
+ function assertionFromString(expected) {
194
+ const match = expected.match(SIMILAR_REGEX);
195
+ if (match) {
196
+ const threshold = parseFloat(match[1]) || DEFAULT_SEMANTIC_SIMILARITY_THRESHOLD;
197
+ const rest = expected.replace(SIMILAR_REGEX, '').trim();
198
+ return {
199
+ type: 'similar',
200
+ value: rest,
201
+ threshold,
202
+ };
203
+ }
204
+ if (expected.startsWith('fn:') || expected.startsWith('eval:')) {
205
+ // TODO(1.0): delete eval: legacy option
206
+ const sliceLength = expected.startsWith('fn:') ? 'fn:'.length : 'eval:'.length;
207
+ const functionBody = expected.slice(sliceLength);
208
+ return {
209
+ type: 'javascript',
210
+ value: functionBody,
211
+ };
212
+ }
213
+ if (expected.startsWith('grade:')) {
214
+ return {
215
+ type: 'llm-rubric',
216
+ value: expected.slice(6),
217
+ };
218
+ }
219
+ if (expected === 'is-json' || expected === 'contains-json') {
220
+ return {
221
+ type: expected,
222
+ };
223
+ }
224
+ return {
225
+ type: 'equals',
226
+ value: expected,
227
+ };
228
+ }
229
+ exports.assertionFromString = assertionFromString;
124
230
  exports.default = {
125
231
  matchesSimilarity,
126
232
  matchesLlmRubric,
@@ -1 +1 @@
1
- {"version":3,"file":"assertions.js","sourceRoot":"","sources":["../src/assertions.ts"],"names":[],"mappings":";;;;;;AAAA,wDAAgC;AAEhC,qDAAyF;AACzF,uCAA6C;AAC7C,iDAAiD;AACjD,6CAAsD;AAUtD,MAAM,aAAa,GAAG,iCAAiC,CAAC;AAExD,MAAM,qCAAqC,GAAG,GAAG,CAAC;AAE3C,KAAK,UAAU,oBAAoB,CACxC,QAAgB,EAChB,MAAc,EACd,OAAwB;IAExB,MAAM,KAAK,GAAG,QAAQ,CAAC,KAAK,CAAC,aAAa,CAAC,CAAC;IAE5C,IAAI,KAAK,EAAE;QACT,MAAM,SAAS,GAAG,UAAU,CAAC,KAAK,CAAC,CAAC,CAAC,CAAC,IAAI,qCAAqC,CAAC;QAChF,MAAM,IAAI,GAAG,QAAQ,CAAC,OAAO,CAAC,aAAa,EAAE,EAAE,CAAC,CAAC,IAAI,EAAE,CAAC;QACxD,OAAO,iBAAiB,CAAC,IAAI,EAAE,MAAM,EAAE,SAAS,CAAC,CAAC;KACnD;SAAM,IAAI,QAAQ,CAAC,UAAU,CAAC,KAAK,CAAC,IAAI,QAAQ,CAAC,UAAU,CAAC,OAAO,CAAC,EAAE;QACrE,wCAAwC;QACxC,MAAM,WAAW,GAAG,QAAQ,CAAC,UAAU,CAAC,KAAK,CAAC,CAAC,CAAC,CAAC,KAAK,CAAC,MAAM,CAAC,CAAC,CAAC,OAAO,CAAC,MAAM,CAAC;QAC/E,MAAM,YAAY,GAAG,QAAQ,CAAC,KAAK,CAAC,WAAW,CAAC,CAAC;QAEjD,MAAM,cAAc,GAAG,IAAI,QAAQ,CAAC,QAAQ,EAAE,UAAU,YAAY,EAAE,CAAC,CAAC;QACxE,OAAO,EAAE,IAAI,EAAE,cAAc,CAAC,MAAM,CAAC,EAAE,CAAC;KACzC;SAAM,IAAI,QAAQ,CAAC,UAAU,CAAC,QAAQ,CAAC,EAAE;QACxC,OAAO,gBAAgB,CAAC,QAAQ,CAAC,KAAK,CAAC,CAAC,CAAC,EAAE,MAAM,EAAE,OAAO,CAAC,OAAO,CAAC,CAAC;KACrE;SAAM;QACL,MAAM,IAAI,GAAG,QAAQ,KAAK,MAAM,CAAC;QACjC,OAAO;YACL,IAAI;YACJ,MAAM,EAAE,IAAI,CAAC,CAAC,CAAC,SAAS,CAAC,CAAC,CAAC,aAAa,QAAQ,aAAa,MAAM,EAAE;SACtE,CAAC;KACH;AACH,CAAC;AA3BD,oDA2BC;AAEM,KAAK,UAAU,iBAAiB,CACrC,QAAgB,EAChB,MAAc,EACd,SAAiB;IAEjB,MAAM,iBAAiB,GAAG,MAAM,oCAAwB,CAAC,gBAAgB,CAAC,QAAQ,CAAC,CAAC;IACpF,MAAM,eAAe,GAAG,MAAM,oCAAwB,CAAC,gBAAgB,CAAC,MAAM,CAAC,CAAC;IAEhF,MAAM,UAAU,GAAG;QACjB,KAAK,EAAE,CAAC,iBAAiB,CAAC,UAAU,EAAE,KAAK,IAAI,CAAC,CAAC,GAAG,CAAC,eAAe,CAAC,UAAU,EAAE,KAAK,IAAI,CAAC,CAAC;QAC5F,MAAM,EAAE,CAAC,iBAAiB,CAAC,UAAU,EAAE,MAAM,IAAI,CAAC,CAAC,GAAG,CAAC,eAAe,CAAC,UAAU,EAAE,MAAM,IAAI,CAAC,CAAC;QAC/F,UAAU,EACR,CAAC,iBAAiB,CAAC,UAAU,EAAE,UAAU,IAAI,CAAC,CAAC;YAC/C,CAAC,eAAe,CAAC,UAAU,EAAE,UAAU,IAAI,CAAC,CAAC;KAChD,CAAC;IAEF,IAAI,iBAAiB,CAAC,KAAK,IAAI,eAAe,CAAC,KAAK,EAAE;QACpD,OAAO;YACL,IAAI,EAAE,KAAK;YACX,MAAM,EACJ,iBAAiB,CAAC,KAAK,IAAI,eAAe,CAAC,KAAK,IAAI,mCAAmC;YACzF,UAAU;SACX,CAAC;KACH;IAED,IAAI,CAAC,iBAAiB,CAAC,SAAS,IAAI,CAAC,eAAe,CAAC,SAAS,EAAE;QAC9D,OAAO;YACL,IAAI,EAAE,KAAK;YACX,MAAM,EAAE,qBAAqB;YAC7B,UAAU;SACX,CAAC;KACH;IAED,MAAM,UAAU,GAAG,IAAA,0BAAgB,EAAC,iBAAiB,CAAC,SAAS,EAAE,eAAe,CAAC,SAAS,CAAC,CAAC;IAC5F,IAAI,UAAU,GAAG,SAAS,EAAE;QAC1B,OAAO;YACL,IAAI,EAAE,KAAK;YACX,MAAM,EAAE,cAAc,UAAU,2BAA2B,SAAS,EAAE;YACtE,UAAU;SACX,CAAC;KACH;IACD,OAAO;QACL,IAAI,EAAE,IAAI;QACV,MAAM,EAAE,cAAc,UAAU,8BAA8B,SAAS,EAAE;QACzE,UAAU;KACX,CAAC;AACJ,CAAC;AA9CD,8CA8CC;AAEM,KAAK,UAAU,gBAAgB,CACpC,QAAgB,EAChB,MAAc,EACd,OAAuB;IAEvB,IAAI,CAAC,OAAO,EAAE;QACZ,MAAM,IAAI,KAAK,CACb,wFAAwF,CACzF,CAAC;KACH;IAED,MAAM,MAAM,GAAG,kBAAQ,CAAC,YAAY,CAAC,OAAO,CAAC,MAAM,IAAI,mCAAsB,EAAE;QAC7E,OAAO,EAAE,MAAM;QACf,MAAM,EAAE,QAAQ;KACjB,CAAC,CAAC;IAEH,IAAI,QAAQ,GAAG,OAAO,CAAC,QAAQ,IAAI,kCAAsB,CAAC;IAC1D,IAAI,OAAO,QAAQ,KAAK,QAAQ,EAAE;QAChC,QAAQ,GAAG,MAAM,IAAA,8BAAe,EAAC,QAAQ,CAAC,CAAC;KAC5C;IACD,MAAM,IAAI,GAAG,MAAM,QAAQ,CAAC,OAAO,CAAC,MAAM,CAAC,CAAC;IAC5C,IAAI,IAAI,CAAC,KAAK,IAAI,CAAC,IAAI,CAAC,MAAM,EAAE;QAC9B,OAAO;YACL,IAAI,EAAE,KAAK;YACX,MAAM,EAAE,IAAI,CAAC,KAAK,IAAI,WAAW;YACjC,UAAU,EAAE;gBACV,KAAK,EAAE,IAAI,CAAC,UAAU,EAAE,KAAK,IAAI,CAAC;gBAClC,MAAM,EAAE,IAAI,CAAC,UAAU,EAAE,MAAM,IAAI,CAAC;gBACpC,UAAU,EAAE,IAAI,CAAC,UAAU,EAAE,UAAU,IAAI,CAAC;aAC7C;SACF,CAAC;KACH;IAED,IAAI;QACF,MAAM,MAAM,GAAG,IAAI,CAAC,KAAK,CAAC,IAAI,CAAC,MAAM,CAAkB,CAAC;QACxD,MAAM,CAAC,UAAU,GAAG;YAClB,KAAK,EAAE,IAAI,CAAC,UAAU,EAAE,KAAK,IAAI,CAAC;YAClC,MAAM,EAAE,IAAI,CAAC,UAAU,EAAE,MAAM,IAAI,CAAC;YACpC,UAAU,EAAE,IAAI,CAAC,UAAU,EAAE,UAAU,IAAI,CAAC;SAC7C,CAAC;QACF,OAAO,MAAM,CAAC;KACf;IAAC,OAAO,GAAG,EAAE;QACZ,OAAO;YACL,IAAI,EAAE,KAAK;YACX,MAAM,EAAE,6BAA6B,IAAI,CAAC,MAAM,EAAE;YAClD,UAAU,EAAE;gBACV,KAAK,EAAE,IAAI,CAAC,UAAU,EAAE,KAAK,IAAI,CAAC;gBAClC,MAAM,EAAE,IAAI,CAAC,UAAU,EAAE,MAAM,IAAI,CAAC;gBACpC,UAAU,EAAE,IAAI,CAAC,UAAU,EAAE,UAAU,IAAI,CAAC;aAC7C;SACF,CAAC;KACH;AACH,CAAC;AApDD,4CAoDC;AAED,kBAAe;IACb,iBAAiB;IACjB,gBAAgB;CACjB,CAAC"}
1
+ {"version":3,"file":"assertions.js","sourceRoot":"","sources":["../src/assertions.ts"],"names":[],"mappings":";;;;;;AAAA,oEAAuC;AACvC,wDAAgC;AAEhC,qDAAyF;AACzF,uCAA6C;AAC7C,iDAAiD;AACjD,6CAAsD;AAItD,MAAM,aAAa,GAAG,iCAAiC,CAAC;AAExD,MAAM,qCAAqC,GAAG,GAAG,CAAC;AAE3C,KAAK,UAAU,aAAa,CAAC,IAAc,EAAE,MAAc;IAChE,MAAM,UAAU,GAAG;QACjB,KAAK,EAAE,CAAC;QACR,MAAM,EAAE,CAAC;QACT,UAAU,EAAE,CAAC;KACd,CAAC;IAEF,IAAI,CAAC,IAAI,CAAC,MAAM,EAAE;QAChB,OAAO,EAAE,IAAI,EAAE,IAAI,EAAE,MAAM,EAAE,eAAe,EAAE,UAAU,EAAE,CAAC;KAC5D;IAED,KAAK,MAAM,SAAS,IAAI,IAAI,CAAC,MAAM,EAAE;QACnC,MAAM,MAAM,GAAG,MAAM,YAAY,CAAC,SAAS,EAAE,IAAI,EAAE,MAAM,CAAC,CAAC;QAC3D,IAAI,CAAC,MAAM,CAAC,IAAI,EAAE;YAChB,OAAO,MAAM,CAAC;SACf;QAED,IAAI,MAAM,CAAC,UAAU,EAAE;YACrB,UAAU,CAAC,KAAK,IAAI,MAAM,CAAC,UAAU,CAAC,KAAK,CAAC;YAC5C,UAAU,CAAC,MAAM,IAAI,MAAM,CAAC,UAAU,CAAC,MAAM,CAAC;YAC9C,UAAU,CAAC,UAAU,IAAI,MAAM,CAAC,UAAU,CAAC,UAAU,CAAC;SACvD;KACF;IAED,OAAO,EAAE,IAAI,EAAE,IAAI,EAAE,MAAM,EAAE,uBAAuB,EAAE,UAAU,EAAE,CAAC;AACrE,CAAC;AAzBD,sCAyBC;AAEM,KAAK,UAAU,YAAY,CAChC,SAAoB,EACpB,IAAc,EACd,MAAc;IAEd,IAAI,IAAI,GAAY,KAAK,CAAC;IAE1B,IAAI,SAAS,CAAC,IAAI,KAAK,QAAQ,EAAE;QAC/B,IAAI,GAAG,SAAS,CAAC,KAAK,KAAK,MAAM,CAAC;QAClC,OAAO;YACL,IAAI;YACJ,MAAM,EAAE,IAAI,CAAC,CAAC,CAAC,kBAAkB,CAAC,CAAC,CAAC,oBAAoB,SAAS,CAAC,KAAK,GAAG;SAC3E,CAAC;KACH;IAED,IAAI,SAAS,CAAC,IAAI,KAAK,SAAS,EAAE;QAChC,IAAI;YACF,IAAI,CAAC,KAAK,CAAC,MAAM,CAAC,CAAC;YACnB,OAAO,EAAE,IAAI,EAAE,IAAI,EAAE,MAAM,EAAE,kBAAkB,EAAE,CAAC;SACnD;QAAC,OAAO,GAAG,EAAE;YACZ,OAAO;gBACL,IAAI,EAAE,KAAK;gBACX,MAAM,EAAE,2DAA2D,GAAG,EAAE;aACzE,CAAC;SACH;KACF;IAED,IAAI,SAAS,CAAC,IAAI,KAAK,eAAe,EAAE;QACtC,MAAM,IAAI,GAAG,YAAY,CAAC,MAAM,CAAC,CAAC;QAClC,OAAO;YACL,IAAI;YACJ,MAAM,EAAE,IAAI,CAAC,CAAC,CAAC,kBAAkB,CAAC,CAAC,CAAC,uCAAuC;SAC5E,CAAC;KACH;IAED,IAAI,SAAS,CAAC,IAAI,KAAK,YAAY,EAAE;QACnC,IAAI;YACF,MAAM,cAAc,GAAG,IAAI,QAAQ,CAAC,QAAQ,EAAE,UAAU,SAAS,CAAC,KAAK,EAAE,CAAC,CAAC;YAC3E,IAAI,GAAG,cAAc,CAAC,MAAM,CAAC,CAAC;SAC/B;QAAC,OAAO,GAAG,EAAE;YACZ,OAAO;gBACL,IAAI,EAAE,KAAK;gBACX,MAAM,EAAE,gCAAiC,GAAa,CAAC,OAAO,EAAE;aACjE,CAAC;SACH;QACD,OAAO;YACL,IAAI;YACJ,MAAM,EAAE,IAAI,CAAC,CAAC,CAAC,kBAAkB,CAAC,CAAC,CAAC,gCAAgC;SACrE,CAAC;KACH;IAED,IAAI,SAAS,CAAC,IAAI,KAAK,SAAS,EAAE;QAChC,IAAA,wBAAS,EAAC,SAAS,CAAC,KAAK,EAAE,+CAA+C,CAAC,CAAC;QAC5E,IAAA,wBAAS,EAAC,SAAS,CAAC,SAAS,EAAE,4CAA4C,CAAC,CAAC;QAC7E,OAAO,iBAAiB,CAAC,SAAS,CAAC,KAAK,EAAE,MAAM,EAAE,SAAS,CAAC,SAAS,CAAC,CAAC;KACxE;IAED,IAAI,SAAS,CAAC,IAAI,KAAK,YAAY,EAAE;QACnC,IAAA,wBAAS,EAAC,SAAS,CAAC,KAAK,EAAE,+CAA+C,CAAC,CAAC;QAC5E,OAAO,gBAAgB,CAAC,SAAS,CAAC,KAAK,EAAE,MAAM,EAAE,IAAI,CAAC,OAAO,CAAC,CAAC;KAChE;IAED,MAAM,IAAI,KAAK,CAAC,0BAA0B,GAAG,SAAS,CAAC,IAAI,CAAC,CAAC;AAC/D,CAAC;AA/DD,oCA+DC;AAED,SAAS,YAAY,CAAC,GAAW;IAC/B,oDAAoD;IACpD,MAAM,WAAW,GAAG,wBAAwB,CAAC;IAE7C,MAAM,KAAK,GAAG,GAAG,CAAC,KAAK,CAAC,WAAW,CAAC,CAAC;IAErC,IAAI,CAAC,KAAK,EAAE;QACV,OAAO,KAAK,CAAC;KACd;IAED,IAAI;QACF,IAAI,CAAC,KAAK,CAAC,KAAK,CAAC,CAAC,CAAC,CAAC,CAAC;QACrB,OAAO,IAAI,CAAC;KACb;IAAC,OAAO,KAAK,EAAE;QACd,OAAO,KAAK,CAAC;KACd;AACH,CAAC;AAEM,KAAK,UAAU,iBAAiB,CACrC,QAAgB,EAChB,MAAc,EACd,SAAiB;IAEjB,MAAM,iBAAiB,GAAG,MAAM,oCAAwB,CAAC,gBAAgB,CAAC,QAAQ,CAAC,CAAC;IACpF,MAAM,eAAe,GAAG,MAAM,oCAAwB,CAAC,gBAAgB,CAAC,MAAM,CAAC,CAAC;IAEhF,MAAM,UAAU,GAAG;QACjB,KAAK,EAAE,CAAC,iBAAiB,CAAC,UAAU,EAAE,KAAK,IAAI,CAAC,CAAC,GAAG,CAAC,eAAe,CAAC,UAAU,EAAE,KAAK,IAAI,CAAC,CAAC;QAC5F,MAAM,EAAE,CAAC,iBAAiB,CAAC,UAAU,EAAE,MAAM,IAAI,CAAC,CAAC,GAAG,CAAC,eAAe,CAAC,UAAU,EAAE,MAAM,IAAI,CAAC,CAAC;QAC/F,UAAU,EACR,CAAC,iBAAiB,CAAC,UAAU,EAAE,UAAU,IAAI,CAAC,CAAC;YAC/C,CAAC,eAAe,CAAC,UAAU,EAAE,UAAU,IAAI,CAAC,CAAC;KAChD,CAAC;IAEF,IAAI,iBAAiB,CAAC,KAAK,IAAI,eAAe,CAAC,KAAK,EAAE;QACpD,OAAO;YACL,IAAI,EAAE,KAAK;YACX,MAAM,EACJ,iBAAiB,CAAC,KAAK,IAAI,eAAe,CAAC,KAAK,IAAI,mCAAmC;YACzF,UAAU;SACX,CAAC;KACH;IAED,IAAI,CAAC,iBAAiB,CAAC,SAAS,IAAI,CAAC,eAAe,CAAC,SAAS,EAAE;QAC9D,OAAO;YACL,IAAI,EAAE,KAAK;YACX,MAAM,EAAE,qBAAqB;YAC7B,UAAU;SACX,CAAC;KACH;IAED,MAAM,UAAU,GAAG,IAAA,0BAAgB,EAAC,iBAAiB,CAAC,SAAS,EAAE,eAAe,CAAC,SAAS,CAAC,CAAC;IAC5F,IAAI,UAAU,GAAG,SAAS,EAAE;QAC1B,OAAO;YACL,IAAI,EAAE,KAAK;YACX,MAAM,EAAE,cAAc,UAAU,2BAA2B,SAAS,EAAE;YACtE,UAAU;SACX,CAAC;KACH;IACD,OAAO;QACL,IAAI,EAAE,IAAI;QACV,MAAM,EAAE,cAAc,UAAU,8BAA8B,SAAS,EAAE;QACzE,UAAU;KACX,CAAC;AACJ,CAAC;AA9CD,8CA8CC;AAEM,KAAK,UAAU,gBAAgB,CACpC,QAAgB,EAChB,MAAc,EACd,OAAuB;IAEvB,IAAI,CAAC,OAAO,EAAE;QACZ,MAAM,IAAI,KAAK,CACb,wFAAwF,CACzF,CAAC;KACH;IAED,MAAM,MAAM,GAAG,kBAAQ,CAAC,YAAY,CAAC,OAAO,CAAC,YAAY,IAAI,mCAAsB,EAAE;QACnF,OAAO,EAAE,MAAM;QACf,MAAM,EAAE,QAAQ;KACjB,CAAC,CAAC;IAEH,IAAI,QAAQ,GAAG,OAAO,CAAC,QAAQ,IAAI,kCAAsB,CAAC;IAC1D,IAAI,OAAO,QAAQ,KAAK,QAAQ,EAAE;QAChC,QAAQ,GAAG,MAAM,IAAA,8BAAe,EAAC,QAAQ,CAAC,CAAC;KAC5C;IACD,MAAM,IAAI,GAAG,MAAM,QAAQ,CAAC,OAAO,CAAC,MAAM,CAAC,CAAC;IAC5C,IAAI,IAAI,CAAC,KAAK,IAAI,CAAC,IAAI,CAAC,MAAM,EAAE;QAC9B,OAAO;YACL,IAAI,EAAE,KAAK;YACX,MAAM,EAAE,IAAI,CAAC,KAAK,IAAI,WAAW;YACjC,UAAU,EAAE;gBACV,KAAK,EAAE,IAAI,CAAC,UAAU,EAAE,KAAK,IAAI,CAAC;gBAClC,MAAM,EAAE,IAAI,CAAC,UAAU,EAAE,MAAM,IAAI,CAAC;gBACpC,UAAU,EAAE,IAAI,CAAC,UAAU,EAAE,UAAU,IAAI,CAAC;aAC7C;SACF,CAAC;KACH;IAED,IAAI;QACF,MAAM,MAAM,GAAG,IAAI,CAAC,KAAK,CAAC,IAAI,CAAC,MAAM,CAAkB,CAAC;QACxD,MAAM,CAAC,UAAU,GAAG;YAClB,KAAK,EAAE,IAAI,CAAC,UAAU,EAAE,KAAK,IAAI,CAAC;YAClC,MAAM,EAAE,IAAI,CAAC,UAAU,EAAE,MAAM,IAAI,CAAC;YACpC,UAAU,EAAE,IAAI,CAAC,UAAU,EAAE,UAAU,IAAI,CAAC;SAC7C,CAAC;QACF,OAAO,MAAM,CAAC;KACf;IAAC,OAAO,GAAG,EAAE;QACZ,OAAO;YACL,IAAI,EAAE,KAAK;YACX,MAAM,EAAE,6BAA6B,IAAI,CAAC,MAAM,EAAE;YAClD,UAAU,EAAE;gBACV,KAAK,EAAE,IAAI,CAAC,UAAU,EAAE,KAAK,IAAI,CAAC;gBAClC,MAAM,EAAE,IAAI,CAAC,UAAU,EAAE,MAAM,IAAI,CAAC;gBACpC,UAAU,EAAE,IAAI,CAAC,UAAU,EAAE,UAAU,IAAI,CAAC;aAC7C;SACF,CAAC;KACH;AACH,CAAC;AApDD,4CAoDC;AAED,SAAgB,mBAAmB,CAAC,QAAgB;IAClD,MAAM,KAAK,GAAG,QAAQ,CAAC,KAAK,CAAC,aAAa,CAAC,CAAC;IAC5C,IAAI,KAAK,EAAE;QACT,MAAM,SAAS,GAAG,UAAU,CAAC,KAAK,CAAC,CAAC,CAAC,CAAC,IAAI,qCAAqC,CAAC;QAChF,MAAM,IAAI,GAAG,QAAQ,CAAC,OAAO,CAAC,aAAa,EAAE,EAAE,CAAC,CAAC,IAAI,EAAE,CAAC;QACxD,OAAO;YACL,IAAI,EAAE,SAAS;YACf,KAAK,EAAE,IAAI;YACX,SAAS;SACV,CAAC;KACH;IACD,IAAI,QAAQ,CAAC,UAAU,CAAC,KAAK,CAAC,IAAI,QAAQ,CAAC,UAAU,CAAC,OAAO,CAAC,EAAE;QAC9D,wCAAwC;QACxC,MAAM,WAAW,GAAG,QAAQ,CAAC,UAAU,CAAC,KAAK,CAAC,CAAC,CAAC,CAAC,KAAK,CAAC,MAAM,CAAC,CAAC,CAAC,OAAO,CAAC,MAAM,CAAC;QAC/E,MAAM,YAAY,GAAG,QAAQ,CAAC,KAAK,CAAC,WAAW,CAAC,CAAC;QACjD,OAAO;YACL,IAAI,EAAE,YAAY;YAClB,KAAK,EAAE,YAAY;SACpB,CAAC;KACH;IACD,IAAI,QAAQ,CAAC,UAAU,CAAC,QAAQ,CAAC,EAAE;QACjC,OAAO;YACL,IAAI,EAAE,YAAY;YAClB,KAAK,EAAE,QAAQ,CAAC,KAAK,CAAC,CAAC,CAAC;SACzB,CAAC;KACH;IACD,IAAI,QAAQ,KAAK,SAAS,IAAI,QAAQ,KAAK,eAAe,EAAE;QAC1D,OAAO;YACL,IAAI,EAAE,QAAQ;SACf,CAAC;KACH;IACD,OAAO;QACL,IAAI,EAAE,QAAQ;QACd,KAAK,EAAE,QAAQ;KAChB,CAAC;AACJ,CAAC;AAnCD,kDAmCC;AAED,kBAAe;IACb,iBAAiB;IACjB,gBAAgB;CACjB,CAAC"}
@@ -0,0 +1,8 @@
1
+ import type { RequestInfo, RequestInit } from 'node-fetch';
2
+ export declare function fetchJsonWithCache(url: RequestInfo, options: RequestInit | undefined, timeout: number): Promise<{
3
+ data: any;
4
+ cached: boolean;
5
+ }>;
6
+ export declare function enableCache(): void;
7
+ export declare function disableCache(): void;
8
+ //# sourceMappingURL=cache.d.ts.map
@@ -0,0 +1 @@
1
+ {"version":3,"file":"cache.d.ts","sourceRoot":"","sources":["../src/cache.ts"],"names":[],"mappings":"AASA,OAAO,KAAK,EAAE,WAAW,EAAE,WAAW,EAAE,MAAM,YAAY,CAAC;AA4B3D,wBAAsB,kBAAkB,CACtC,GAAG,EAAE,WAAW,EAChB,OAAO,yBAAkB,EACzB,OAAO,EAAE,MAAM,GACd,OAAO,CAAC;IAAE,IAAI,EAAE,GAAG,CAAC;IAAC,MAAM,EAAE,OAAO,CAAA;CAAE,CAAC,CAuCzC;AAED,wBAAgB,WAAW,SAE1B;AAED,wBAAgB,YAAY,SAG3B"}
package/dist/cache.js ADDED
@@ -0,0 +1,78 @@
1
+ "use strict";
2
+ var __importDefault = (this && this.__importDefault) || function (mod) {
3
+ return (mod && mod.__esModule) ? mod : { "default": mod };
4
+ };
5
+ Object.defineProperty(exports, "__esModule", { value: true });
6
+ exports.disableCache = exports.enableCache = exports.fetchJsonWithCache = void 0;
7
+ const node_path_1 = __importDefault(require("node:path"));
8
+ const cache_manager_1 = __importDefault(require("cache-manager"));
9
+ const cache_manager_fs_hash_1 = __importDefault(require("cache-manager-fs-hash"));
10
+ const logger_js_1 = __importDefault(require("./logger.js"));
11
+ const util_js_1 = require("./util.js");
12
+ let cacheInstance;
13
+ let enabled = typeof process.env.PROMPTFOO_CACHE_ENABLED === 'undefined'
14
+ ? true
15
+ : Boolean(process.env.PROMPTFOO_CACHE_ENABLED);
16
+ const cacheType = process.env.PROMPTFOO_CACHE_TYPE || (process.env.NODE_ENV === 'test' ? 'memory' : 'disk');
17
+ function getCache() {
18
+ if (!cacheInstance) {
19
+ cacheInstance = cache_manager_1.default.caching({
20
+ store: cacheType === 'disk' ? cache_manager_fs_hash_1.default : 'memory',
21
+ options: {
22
+ max: process.env.PROMPTFOO_CACHE_MAX_FILE_COUNT || 10000,
23
+ path: process.env.PROMPTFOO_CACHE_PATH || node_path_1.default.join((0, util_js_1.getConfigDirectoryPath)(), 'cache'),
24
+ ttl: process.env.PROMPTFOO_CACHE_TTL || 60 * 60 * 24 * 14,
25
+ maxsize: process.env.PROMPTFOO_CACHE_MAX_SIZE || 1e7, // in bytes, 10mb
26
+ //zip: true, // whether to use gzip compression
27
+ },
28
+ });
29
+ }
30
+ return cacheInstance;
31
+ }
32
+ async function fetchJsonWithCache(url, options = {}, timeout) {
33
+ if (!enabled) {
34
+ const resp = await (0, util_js_1.fetchWithTimeout)(url, options, timeout);
35
+ return {
36
+ cached: false,
37
+ data: await resp.json(),
38
+ };
39
+ }
40
+ const cache = await getCache();
41
+ const copy = Object.assign({}, options);
42
+ delete copy.headers;
43
+ const cacheKey = `fetch:${url}:${JSON.stringify(copy)}`;
44
+ // Try to get the cached response
45
+ const cachedResponse = await cache.get(cacheKey);
46
+ if (cachedResponse) {
47
+ logger_js_1.default.debug(`Returning cached response for ${url}: ${cachedResponse}`);
48
+ return {
49
+ cached: true,
50
+ data: JSON.parse(cachedResponse),
51
+ };
52
+ }
53
+ // Fetch the actual data and store it in the cache
54
+ const response = await (0, util_js_1.fetchWithTimeout)(url, options, timeout);
55
+ try {
56
+ const data = await response.json();
57
+ logger_js_1.default.debug(`Storing ${url} response in cache: ${data}`);
58
+ await cache.set(cacheKey, JSON.stringify(data));
59
+ return {
60
+ cached: false,
61
+ data,
62
+ };
63
+ }
64
+ catch (err) {
65
+ throw new Error(`Error parsing response from ${url}: ${err}`);
66
+ }
67
+ }
68
+ exports.fetchJsonWithCache = fetchJsonWithCache;
69
+ function enableCache() {
70
+ enabled = true;
71
+ }
72
+ exports.enableCache = enableCache;
73
+ function disableCache() {
74
+ logger_js_1.default.info('Cache is disabled.');
75
+ enabled = false;
76
+ }
77
+ exports.disableCache = disableCache;
78
+ //# sourceMappingURL=cache.js.map
@@ -0,0 +1 @@
1
+ {"version":3,"file":"cache.js","sourceRoot":"","sources":["../src/cache.ts"],"names":[],"mappings":";;;;;;AAAA,0DAA6B;AAE7B,kEAAyC;AACzC,kFAA4C;AAE5C,4DAAiC;AACjC,uCAAqE;AAKrE,IAAI,aAAgC,CAAC;AAErC,IAAI,OAAO,GACT,OAAO,OAAO,CAAC,GAAG,CAAC,uBAAuB,KAAK,WAAW;IACxD,CAAC,CAAC,IAAI;IACN,CAAC,CAAC,OAAO,CAAC,OAAO,CAAC,GAAG,CAAC,uBAAuB,CAAC,CAAC;AAEnD,MAAM,SAAS,GACb,OAAO,CAAC,GAAG,CAAC,oBAAoB,IAAI,CAAC,OAAO,CAAC,GAAG,CAAC,QAAQ,KAAK,MAAM,CAAC,CAAC,CAAC,QAAQ,CAAC,CAAC,CAAC,MAAM,CAAC,CAAC;AAE5F,SAAS,QAAQ;IACf,IAAI,CAAC,aAAa,EAAE;QAClB,aAAa,GAAG,uBAAY,CAAC,OAAO,CAAC;YACnC,KAAK,EAAE,SAAS,KAAK,MAAM,CAAC,CAAC,CAAC,+BAAO,CAAC,CAAC,CAAC,QAAQ;YAChD,OAAO,EAAE;gBACP,GAAG,EAAE,OAAO,CAAC,GAAG,CAAC,8BAA8B,IAAI,KAAM;gBACzD,IAAI,EAAE,OAAO,CAAC,GAAG,CAAC,oBAAoB,IAAI,mBAAI,CAAC,IAAI,CAAC,IAAA,gCAAsB,GAAE,EAAE,OAAO,CAAC;gBACtF,GAAG,EAAE,OAAO,CAAC,GAAG,CAAC,mBAAmB,IAAI,EAAE,GAAG,EAAE,GAAG,EAAE,GAAG,EAAE;gBACzD,OAAO,EAAE,OAAO,CAAC,GAAG,CAAC,wBAAwB,IAAI,GAAG,EAAE,iBAAiB;gBACvE,+CAA+C;aAChD;SACF,CAAC,CAAC;KACJ;IACD,OAAO,aAAa,CAAC;AACvB,CAAC;AAEM,KAAK,UAAU,kBAAkB,CACtC,GAAgB,EAChB,UAAuB,EAAE,EACzB,OAAe;IAEf,IAAI,CAAC,OAAO,EAAE;QACZ,MAAM,IAAI,GAAG,MAAM,IAAA,0BAAgB,EAAC,GAAG,EAAE,OAAO,EAAE,OAAO,CAAC,CAAC;QAC3D,OAAO;YACL,MAAM,EAAE,KAAK;YACb,IAAI,EAAE,MAAM,IAAI,CAAC,IAAI,EAAE;SACxB,CAAC;KACH;IAED,MAAM,KAAK,GAAG,MAAM,QAAQ,EAAE,CAAC;IAE/B,MAAM,IAAI,GAAG,MAAM,CAAC,MAAM,CAAC,EAAE,EAAE,OAAO,CAAC,CAAC;IACxC,OAAO,IAAI,CAAC,OAAO,CAAC;IACpB,MAAM,QAAQ,GAAG,SAAS,GAAG,IAAI,IAAI,CAAC,SAAS,CAAC,IAAI,CAAC,EAAE,CAAC;IAExD,iCAAiC;IACjC,MAAM,cAAc,GAAG,MAAM,KAAK,CAAC,GAAG,CAAC,QAAQ,CAAC,CAAC;IAEjD,IAAI,cAAc,EAAE;QAClB,mBAAM,CAAC,KAAK,CAAC,iCAAiC,GAAG,KAAK,cAAc,EAAE,CAAC,CAAC;QACxE,OAAO;YACL,MAAM,EAAE,IAAI;YACZ,IAAI,EAAE,IAAI,CAAC,KAAK,CAAC,cAAwB,CAAC;SAC3C,CAAC;KACH;IAED,kDAAkD;IAClD,MAAM,QAAQ,GAAG,MAAM,IAAA,0BAAgB,EAAC,GAAG,EAAE,OAAO,EAAE,OAAO,CAAC,CAAC;IAC/D,IAAI;QACF,MAAM,IAAI,GAAG,MAAM,QAAQ,CAAC,IAAI,EAAE,CAAC;QACnC,mBAAM,CAAC,KAAK,CAAC,WAAW,GAAG,uBAAuB,IAAI,EAAE,CAAC,CAAC;QAC1D,MAAM,KAAK,CAAC,GAAG,CAAC,QAAQ,EAAE,IAAI,CAAC,SAAS,CAAC,IAAI,CAAC,CAAC,CAAC;QAChD,OAAO;YACL,MAAM,EAAE,KAAK;YACb,IAAI;SACL,CAAC;KACH;IAAC,OAAO,GAAG,EAAE;QACZ,MAAM,IAAI,KAAK,CAAC,+BAA+B,GAAG,KAAK,GAAG,EAAE,CAAC,CAAC;KAC/D;AACH,CAAC;AA3CD,gDA2CC;AAED,SAAgB,WAAW;IACzB,OAAO,GAAG,IAAI,CAAC;AACjB,CAAC;AAFD,kCAEC;AAED,SAAgB,YAAY;IAC1B,mBAAM,CAAC,IAAI,CAAC,oBAAoB,CAAC,CAAC;IAClC,OAAO,GAAG,KAAK,CAAC;AAClB,CAAC;AAHD,oCAGC"}
@@ -1,3 +1,3 @@
1
- import type { EvaluateOptions, EvaluateSummary } from './types.js';
2
- export declare function evaluate(options: EvaluateOptions): Promise<EvaluateSummary>;
1
+ import type { EvaluateOptions, EvaluateSummary, TestSuite } from './types.js';
2
+ export declare function evaluate(testSuite: TestSuite, options: EvaluateOptions): Promise<EvaluateSummary>;
3
3
  //# sourceMappingURL=evaluator.d.ts.map