langium-ai-tools 0.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (38) hide show
  1. package/README.md +83 -0
  2. package/dist/evaluator/chart.d.ts +32 -0
  3. package/dist/evaluator/chart.d.ts.map +1 -0
  4. package/dist/evaluator/chart.js +218 -0
  5. package/dist/evaluator/chart.js.map +1 -0
  6. package/dist/evaluator/edit-distance-evaluator.d.ts +8 -0
  7. package/dist/evaluator/edit-distance-evaluator.d.ts.map +1 -0
  8. package/dist/evaluator/edit-distance-evaluator.js +13 -0
  9. package/dist/evaluator/edit-distance-evaluator.js.map +1 -0
  10. package/dist/evaluator/eval-matrix.d.ts +95 -0
  11. package/dist/evaluator/eval-matrix.d.ts.map +1 -0
  12. package/dist/evaluator/eval-matrix.js +87 -0
  13. package/dist/evaluator/eval-matrix.js.map +1 -0
  14. package/dist/evaluator/evaluator.d.ts +64 -0
  15. package/dist/evaluator/evaluator.d.ts.map +1 -0
  16. package/dist/evaluator/evaluator.js +162 -0
  17. package/dist/evaluator/evaluator.js.map +1 -0
  18. package/dist/evaluator/index.d.ts +6 -0
  19. package/dist/evaluator/index.d.ts.map +1 -0
  20. package/dist/evaluator/index.js +6 -0
  21. package/dist/evaluator/index.js.map +1 -0
  22. package/dist/evaluator/langium-evaluator.d.ts +55 -0
  23. package/dist/evaluator/langium-evaluator.d.ts.map +1 -0
  24. package/dist/evaluator/langium-evaluator.js +78 -0
  25. package/dist/evaluator/langium-evaluator.js.map +1 -0
  26. package/dist/index.d.ts +3 -0
  27. package/dist/index.d.ts.map +1 -0
  28. package/dist/index.js +3 -0
  29. package/dist/index.js.map +1 -0
  30. package/dist/splitter/index.d.ts +2 -0
  31. package/dist/splitter/index.d.ts.map +1 -0
  32. package/dist/splitter/index.js +2 -0
  33. package/dist/splitter/index.js.map +1 -0
  34. package/dist/splitter/splitter.d.ts +21 -0
  35. package/dist/splitter/splitter.d.ts.map +1 -0
  36. package/dist/splitter/splitter.js +59 -0
  37. package/dist/splitter/splitter.js.map +1 -0
  38. package/package.json +61 -0
package/README.md ADDED
@@ -0,0 +1,83 @@
1
+ # Langium AI Tools
2
+
3
+ ## Overview
4
+
5
+ This project provides core tools that make it easier to build AI applications for Langium DSLs. These core tools help to solve the following problems around building AI applications by making it easier to:
6
+
7
+ - Determine which models work well for your DSL
8
+ - Evaluate which changes to your tooling actually improve your generation results
9
+ - How to process DSL documents in a way that makes sense for your DSL & target application
10
+
11
+ To solve these problems this package provides:
12
+
13
+ - Splitting Support: Using your DSL's parser to make it easier to pre-process documents before ingest (such as into a vector DB)
14
+ - Training & Evaluation Support: Assess the output of your model + RAG + whatever else you have in your stack with regards to a structured input/output evaluation phase.
15
+ - Constraint Support: Synthesize BNF-style grammars from your Langium grammar, which can be used to control the token output from an LLM to conform to your DSL's expected structure (this feature has been added directly into the **langium-cli** itself, as it has wider general applications).
16
+
17
+ What's also important is what is not provided:
18
+ - *We don't choose your model for you.* We believe this is your choice, and we don't want to presume we know best or lock you in. All we assume is that you have a model (or stack) that we can use. For tooling that leverages models directly, we'll be providing a separate package under Langium AI that will be separate from the core here.
19
+ - *We don't choose your stack for you.* There are many excellent choices for hosting providers, databases, caches, and other supporting services (local & remote). There's so many, and they change so often, that we decided it was best to not assume what works here, and rather support preparing information for whatever stack you choose.
20
+
21
+ LLMs (and transformers in general), are evolving quite rapidly. With this approach, these tools help you build your own specific approach, whilst letting you keep up with the latest and greatest in model developments.
22
+
23
+ ## Installation
24
+
25
+ You can install Langium AI Tools by running:
26
+
27
+ ```sh
28
+ npm i --save langium-ai-tools
29
+ ```
30
+
31
+ ## Usage
32
+
33
+ ### Splitting
34
+
35
+ Langium AI Tools presents various supporting behaviors for splitting.
36
+
37
+ The simplest approach is to, of course, not split at all. For smaller DSL programs this may be perfectly viable, but in all likelihood you're reading this to handle medium to large programs -- or a large quantity of smaller programs with overlapping constructs.
38
+
39
+ In most cases you can split by specific AST nodes. This will map directly to those types that are generated by your Langium grammar rules, and makes it easy to mark how you want to delineate.
40
+
41
+ ### Evaluation
42
+
43
+ Regardless of how you've sourced your model, you'll need a metric for determining the quality of your output.
44
+
45
+ For Langium DSLs, we provide an series of *evaluator* utilities to help in assessing the correctness of DSL output.
46
+
47
+ It's important to point out that evaluations are *not* tests, instead this is more similar to [OpenAI's evals framework](https://github.com/openai/evals). The idea is that we're grading or scoring outputs with regards to an expected output from a known input. This is a simple but effective approach to determining if your model is generally doing what you expect it to in a structured way, and *not* doing something else as well.
48
+
49
+ Take the following evaluator for example. Let's assume you have [Ollama](https://ollama.com/) running locally, and the [ollama-js](https://github.com/ollama/ollama-js) package installed. From a given base model you can define evaluatiosn like so.
50
+
51
+ ```ts
52
+ import { Evaluator, EvaluatorScore } from 'langium-ai-tools/evaluator';
53
+ import ollama from 'ollama';
54
+
55
+ // get your language's services
56
+ const services = createMyDSLServices(EmptyFileSystem).MyDSL;
57
+
58
+ // define an evaluator using your language's services
59
+ // this effectively uses your existing parser & validations to 'grade' the response
60
+ const evaluator = new LangiumEvaluator(services);
61
+
62
+ // make some prompt
63
+ const response = await ollama.chat({
64
+ 'llama3.2',
65
+ [{
66
+ role: 'user',
67
+ content: 'Write me a hello world program written in MyDSL.'
68
+ }]
69
+ });
70
+
71
+ const es: EvaluatorScore = evaluator.evaluate(response.message.content);
72
+
73
+ // print out your score!
74
+ console.log(es);
75
+ ```
76
+
77
+ You can also define custom evaluators that are more tuned to the needs of your DSL. This could be handling diagnostics in a very specific fashion, extracting code out of the response itself to check, using an evaluation model to grade the response, or using a combination of techniques to get a more accurate score for your model's output.
78
+
79
+ In general we stick to focusing on what Langium can do to help with evaluation, but leave the opportunity open for you to extend, supplement, or modify evaluation logic as you see fit.
80
+
81
+ ## Contributing
82
+
83
+ If you want to help feel free to open an issue or a PR. As a general note we're open to accept changes that focus on improving how we can support AI application development for Langium DSLs. But we don't want to provide explicit bindings to actual services/providers at this time, such as LLamaIndex, Ollama, LangChain, or others. Similarly this package doesn't provide direct bindings for AI providers such as OpenAI and Anthropic here. Instead these changes will go into a separate package under Langium AI that is intended for this purpose.
@@ -0,0 +1,32 @@
1
+ /**
2
+ * Generates & exports an HTML radar chart report using plotly JS
3
+ */
4
+ import { EvaluatorResult, EvaluatorResultData } from "./evaluator.js";
5
+ /**
6
+ * Generates an HTML radar chart from the provided data
7
+ * @param evalResults Evaluator results to chart
8
+ * @param dest Output file to write the chart to
9
+ * @param rFunc polar r function, used to extract the r values from the data
10
+ * @param theta theta values, i.e. the property names to use for the radar chart
11
+ */
12
+ export declare function generateRadarChart<T extends EvaluatorResultData>(chartName: string, evalResults: EvaluatorResult[], dest: string, rFunc: (d: T, metadata: Record<string, unknown>) => Record<string, unknown>, preprocess?: (arr: EvaluatorResult[]) => EvaluatorResult[]): void;
13
+ export declare function generateHistogram<T extends EvaluatorResultData>(chartName: string, evalResults: EvaluatorResult[], dest: string, dataFunc: (d: T, metadata: Record<string, unknown>) => Record<string, unknown>, preprocess?: (arr: EvaluatorResult[]) => EvaluatorResult[]): void;
14
+ /**
15
+ * Normalizes all numeric data entries in results (while also retaining non-numeric entries)
16
+ */
17
+ export declare function normalizeData(data: EvaluatorResult[]): EvaluatorResult[];
18
+ /**
19
+ * Generates a historical chart from the provided data, showing runners along the X, and their performance over time along the X axis
20
+ * @param chartName
21
+ * @param folder
22
+ * @param dest
23
+ * @param dataFunc
24
+ * @param options
25
+ */
26
+ export declare function generateHistoricalChart<T extends EvaluatorResultData>(chartName: string, folder: string, dest: string, dataFunc: (d: T, metadata: Record<string, unknown>) => number, options?: {
27
+ preprocess?: (arr: EvaluatorResult[]) => EvaluatorResult[];
28
+ filter?: (r: EvaluatorResult) => boolean;
29
+ take?: number;
30
+ chartType?: string;
31
+ }): void;
32
+ //# sourceMappingURL=chart.d.ts.map
@@ -0,0 +1 @@
1
+ {"version":3,"file":"chart.d.ts","sourceRoot":"","sources":["../../src/evaluator/chart.ts"],"names":[],"mappings":"AAAA;;GAEG;AAEH,OAAO,EAAE,eAAe,EAAE,mBAAmB,EAAoC,MAAM,gBAAgB,CAAC;AAIxG;;;;;;GAMG;AACH,wBAAgB,kBAAkB,CAAC,CAAC,SAAS,mBAAmB,EAC5D,SAAS,EAAE,MAAM,EACjB,WAAW,EAAE,eAAe,EAAE,EAC9B,IAAI,EAAE,MAAM,EACZ,KAAK,EAAE,CAAC,CAAC,EAAE,CAAC,EAAE,QAAQ,EAAE,MAAM,CAAC,MAAM,EAAE,OAAO,CAAC,KAAK,MAAM,CAAC,MAAM,EAAE,OAAO,CAAC,EAC3E,UAAU,CAAC,EAAE,CAAC,GAAG,EAAE,eAAe,EAAE,KAAK,eAAe,EAAE,GAC3D,IAAI,CAsDN;AAED,wBAAgB,iBAAiB,CAAC,CAAC,SAAS,mBAAmB,EAC3D,SAAS,EAAE,MAAM,EACjB,WAAW,EAAE,eAAe,EAAE,EAC9B,IAAI,EAAE,MAAM,EACZ,QAAQ,EAAE,CAAC,CAAC,EAAE,CAAC,EAAE,QAAQ,EAAE,MAAM,CAAC,MAAM,EAAE,OAAO,CAAC,KAAK,MAAM,CAAC,MAAM,EAAE,OAAO,CAAC,EAC9E,UAAU,CAAC,EAAE,CAAC,GAAG,EAAE,eAAe,EAAE,KAAK,eAAe,EAAE,QAgD7D;AAID;;GAEG;AACH,wBAAgB,aAAa,CAAC,IAAI,EAAE,eAAe,EAAE,GAAG,eAAe,EAAE,CA2BxE;AAED;;;;;;;GAOG;AACH,wBAAgB,uBAAuB,CAAC,CAAC,SAAS,mBAAmB,EACjE,SAAS,EAAE,MAAM,EACjB,MAAM,EAAE,MAAM,EACd,IAAI,EAAE,MAAM,EACZ,QAAQ,EAAE,CAAC,CAAC,EAAE,CAAC,EAAE,QAAQ,EAAE,MAAM,CAAC,MAAM,EAAE,OAAO,CAAC,KAAK,MAAM,EAC7D,OAAO,CAAC,EAAE;IACN,UAAU,CAAC,EAAE,CAAC,GAAG,EAAE,eAAe,EAAE,KAAK,eAAe,EAAE,CAAC;IAC3D,MAAM,CAAC,EAAE,CAAC,CAAC,EAAE,eAAe,KAAK,OAAO,CAAC;IACzC,IAAI,CAAC,EAAE,MAAM,CAAC;IACd,SAAS,CAAC,EAAE,MAAM,CAAA;CACrB,QA4FJ"}
@@ -0,0 +1,218 @@
1
+ /**
2
+ * Generates & exports an HTML radar chart report using plotly JS
3
+ */
4
+ import { averageAcrossRunners, loadReport } from "./evaluator.js";
5
+ import { writeFileSync, readdirSync } from 'fs';
6
+ import * as path from 'path';
7
+ /**
8
+ * Generates an HTML radar chart from the provided data
9
+ * @param evalResults Evaluator results to chart
10
+ * @param dest Output file to write the chart to
11
+ * @param rFunc polar r function, used to extract the r values from the data
12
+ * @param theta theta values, i.e. the property names to use for the radar chart
13
+ */
14
+ export function generateRadarChart(chartName, evalResults, dest, rFunc, preprocess) {
15
+ // process results first to average out data (either using the user supplied function, or defaulting to average across runners)
16
+ const processedResults = preprocess ? preprocess(evalResults) : averageAcrossRunners(evalResults);
17
+ const data = processedResults.map((result) => {
18
+ const resultData = result.data;
19
+ const rfuncResult = rFunc(resultData, result.metadata);
20
+ const theta = Object.keys(rfuncResult);
21
+ const r = Object.values(rfuncResult);
22
+ return {
23
+ type: 'scatterpolar',
24
+ r,
25
+ theta,
26
+ fill: 'toself',
27
+ name: result.name
28
+ };
29
+ });
30
+ const layout = {
31
+ title: chartName,
32
+ name: chartName,
33
+ polar: {
34
+ radialaxis: {
35
+ visible: true,
36
+ range: [0, 1]
37
+ }
38
+ },
39
+ showlegend: true,
40
+ width: 1000,
41
+ height: 800
42
+ };
43
+ const html = `
44
+ <!DOCTYPE html>
45
+ <html>
46
+ <head>
47
+ <title>${chartName}</title>
48
+ <script src="https://cdn.plot.ly/plotly-2.35.2.min.js" charset="utf-8"></script>
49
+ </head>
50
+ <body>
51
+ <div id="langium-ai-chart" style="width:1000px;height:1000px;margin:8px auto;"></div>
52
+ <script>
53
+ data = ${JSON.stringify(data)};
54
+ layout = ${JSON.stringify(layout)};
55
+ Plotly.newPlot("langium-ai-chart", data, layout);
56
+ </script>
57
+ </body>
58
+ </html>
59
+ `;
60
+ writeFileSync(dest, html);
61
+ console.log(`Radar chart report written to: ${dest}`);
62
+ }
63
+ export function generateHistogram(chartName, evalResults, dest, dataFunc, preprocess) {
64
+ // process results first to average out data (either using the user supplied function, or defaulting to average across runners)
65
+ const processedResults = preprocess ? preprocess(evalResults) : averageAcrossRunners(evalResults);
66
+ const data = processedResults.map((result) => {
67
+ const data = result.data;
68
+ const dd = dataFunc(data, result.metadata);
69
+ const yLabels = Object.keys(dd);
70
+ const xData = Object.values(dd);
71
+ return {
72
+ type: 'bar',
73
+ x: xData,
74
+ y: yLabels,
75
+ orientation: 'h',
76
+ name: result.name
77
+ };
78
+ });
79
+ const layout = {
80
+ title: chartName,
81
+ barmode: 'group',
82
+ showlegend: true,
83
+ width: 1000,
84
+ height: 800
85
+ };
86
+ const html = `
87
+ <!DOCTYPE html>
88
+ <html>
89
+ <head>
90
+ <title>${chartName}</title>
91
+ <script src="https://cdn.plot.ly/plotly-2.35.2.min.js" charset="utf-8"></script>
92
+ </head>
93
+ <body>
94
+ <div id="langium-ai-chart" style="width:1000px;height:1000px;margin:8px auto;"></div>
95
+ <script>
96
+ data = ${JSON.stringify(data)};
97
+ layout = ${JSON.stringify(layout)};
98
+ Plotly.newPlot("langium-ai-chart", data, layout);
99
+ </script>
100
+ </body>
101
+ </html>
102
+ `;
103
+ writeFileSync(dest, html);
104
+ console.log(`Histogram report written to: ${dest}`);
105
+ }
106
+ /**
107
+ * Normalizes all numeric data entries in results (while also retaining non-numeric entries)
108
+ */
109
+ export function normalizeData(data) {
110
+ const maxValues = new Map();
111
+ for (const result of data) {
112
+ const d = result.data;
113
+ for (const [key, value] of Object.entries(d)) {
114
+ if (typeof value !== 'number') {
115
+ continue;
116
+ }
117
+ const existingMax = maxValues.get(key) ?? 0;
118
+ if (value > existingMax) {
119
+ maxValues.set(key, value);
120
+ }
121
+ }
122
+ }
123
+ for (const result of data) {
124
+ const d = result.data;
125
+ for (const [key, value] of Object.entries(d)) {
126
+ if (typeof value === 'number') {
127
+ const max = maxValues.get(key) ?? 1;
128
+ d[key] = value / max;
129
+ }
130
+ }
131
+ }
132
+ return data;
133
+ }
134
+ /**
135
+ * Generates a historical chart from the provided data, showing runners along the X, and their performance over time along the X axis
136
+ * @param chartName
137
+ * @param folder
138
+ * @param dest
139
+ * @param dataFunc
140
+ * @param options
141
+ */
142
+ export function generateHistoricalChart(chartName, folder, dest, dataFunc, options) {
143
+ // generate a historical chart by calculating the average for runners in all previous reports, and organizing them in ascending date order
144
+ let files = readdirSync(folder).filter(f => f.endsWith('.json'));
145
+ // array of results, where each array of results is presumed to be a stream of results from a collection of historical runs
146
+ const runnerResultsMap = new Map();
147
+ // take the most recent files if take is set
148
+ if (options?.take) {
149
+ files = files.sort().slice(0, options.take);
150
+ }
151
+ for (const file of files) {
152
+ // retrieve results from this file
153
+ const report = loadReport(path.join(folder, file));
154
+ const results = report.results;
155
+ const date = report.date;
156
+ console.log(`Processing historical results from: ${date}`);
157
+ // process results first
158
+ let processedResults = options?.preprocess ? options.preprocess(results) : averageAcrossRunners(results);
159
+ // normalize
160
+ processedResults = normalizeData(processedResults);
161
+ // add to the map based by runner name
162
+ for (const result of processedResults) {
163
+ if (options?.filter && !options.filter(result)) {
164
+ // skip
165
+ continue;
166
+ }
167
+ const name = result.metadata.runner;
168
+ const existingResults = runnerResultsMap.get(name) ?? [];
169
+ const rc = {
170
+ ...result
171
+ };
172
+ rc.metadata.date = new Date(date).toISOString();
173
+ existingResults.push(result);
174
+ runnerResultsMap.set(name, existingResults);
175
+ }
176
+ }
177
+ const allData = [];
178
+ // organize by date in ascending order
179
+ for (let [name, results] of runnerResultsMap) {
180
+ results.sort((a, b) => {
181
+ return new Date(a.metadata.date).getTime() - new Date(b.metadata.date).getTime();
182
+ });
183
+ const runners = results.map(r => r.metadata.runner);
184
+ const data = results.map(r => dataFunc(r.data, r.metadata)).sort();
185
+ allData.push({
186
+ type: options?.chartType ? options.chartType : 'scatter',
187
+ x: runners,
188
+ y: data,
189
+ name
190
+ });
191
+ }
192
+ const layout = {
193
+ title: chartName,
194
+ showlegend: true,
195
+ width: 1000,
196
+ height: 1000
197
+ };
198
+ const html = `
199
+ <!DOCTYPE html>
200
+ <html>
201
+ <head>
202
+ <title>${chartName}</title>
203
+ <script src="https://cdn.plot.ly/plotly-2.35.2.min.js" charset="utf-8"></script>
204
+ </head>
205
+ <body>
206
+ <div id="langium-ai-chart" style="width:1000px;height:1000px;margin:8px auto;"></div>
207
+ <script>
208
+ data = ${JSON.stringify(allData)};
209
+ layout = ${JSON.stringify(layout)};
210
+ Plotly.newPlot("langium-ai-chart", data, layout);
211
+ </script>
212
+ </body>
213
+ </html>
214
+ `;
215
+ writeFileSync(dest, html);
216
+ console.log(`Historical report written to: ${dest}`);
217
+ }
218
+ //# sourceMappingURL=chart.js.map
@@ -0,0 +1 @@
1
+ {"version":3,"file":"chart.js","sourceRoot":"","sources":["../../src/evaluator/chart.ts"],"names":[],"mappings":"AAAA;;GAEG;AAEH,OAAO,EAAwC,oBAAoB,EAAE,UAAU,EAAE,MAAM,gBAAgB,CAAC;AACxG,OAAO,EAAE,aAAa,EAAE,WAAW,EAAgB,MAAM,IAAI,CAAC;AAC9D,OAAO,KAAK,IAAI,MAAM,MAAM,CAAC;AAE7B;;;;;;GAMG;AACH,MAAM,UAAU,kBAAkB,CAC9B,SAAiB,EACjB,WAA8B,EAC9B,IAAY,EACZ,KAA2E,EAC3E,UAA0D;IAG1D,+HAA+H;IAC/H,MAAM,gBAAgB,GAAG,UAAU,CAAC,CAAC,CAAC,UAAU,CAAC,WAAW,CAAC,CAAC,CAAC,CAAC,oBAAoB,CAAC,WAAW,CAAC,CAAC;IAElG,MAAM,IAAI,GAAG,gBAAgB,CAAC,GAAG,CAAC,CAAC,MAAM,EAAE,EAAE;QACzC,MAAM,UAAU,GAAG,MAAM,CAAC,IAAS,CAAC;QACpC,MAAM,WAAW,GAAG,KAAK,CAAC,UAAU,EAAE,MAAM,CAAC,QAAQ,CAAC,CAAC;QACvD,MAAM,KAAK,GAAG,MAAM,CAAC,IAAI,CAAC,WAAW,CAAC,CAAC;QACvC,MAAM,CAAC,GAAG,MAAM,CAAC,MAAM,CAAC,WAAW,CAAC,CAAC;QAErC,OAAO;YACH,IAAI,EAAE,cAAc;YACpB,CAAC;YACD,KAAK;YACL,IAAI,EAAE,QAAQ;YACd,IAAI,EAAE,MAAM,CAAC,IAAI;SACpB,CAAC;IACN,CAAC,CAAC,CAAC;IAEH,MAAM,MAAM,GAAG;QACX,KAAK,EAAE,SAAS;QAChB,IAAI,EAAE,SAAS;QACf,KAAK,EAAE;YACH,UAAU,EAAE;gBACR,OAAO,EAAE,IAAI;gBACb,KAAK,EAAE,CAAC,CAAC,EAAE,CAAC,CAAC;aAChB;SACJ;QACD,UAAU,EAAE,IAAI;QAChB,KAAK,EAAE,IAAI;QACX,MAAM,EAAE,GAAG;KACd,CAAC;IAEF,MAAM,IAAI,GAAG;;;;aAIJ,SAAS;;;;;;iBAML,IAAI,CAAC,SAAS,CAAC,IAAI,CAAC;mBAClB,IAAI,CAAC,SAAS,CAAC,MAAM,CAAC;;;;;SAKhC,CAAC;IAEN,aAAa,CAAC,IAAI,EAAE,IAAI,CAAC,CAAC;IAC1B,OAAO,CAAC,GAAG,CAAC,kCAAkC,IAAI,EAAE,CAAC,CAAC;AAC1D,CAAC;AAED,MAAM,UAAU,iBAAiB,CAC7B,SAAiB,EACjB,WAA8B,EAC9B,IAAY,EACZ,QAA8E,EAC9E,UAA0D;IAG1D,+HAA+H;IAC/H,MAAM,gBAAgB,GAAG,UAAU,CAAC,CAAC,CAAC,UAAU,CAAC,WAAW,CAAC,CAAC,CAAC,CAAC,oBAAoB,CAAC,WAAW,CAAC,CAAC;IAElG,MAAM,IAAI,GAAG,gBAAgB,CAAC,GAAG,CAAC,CAAC,MAAM,EAAE,EAAE;QACzC,MAAM,IAAI,GAAG,MAAM,CAAC,IAAS,CAAC;QAC9B,MAAM,EAAE,GAAG,QAAQ,CAAC,IAAI,EAAE,MAAM,CAAC,QAAQ,CAAC,CAAC;QAC3C,MAAM,OAAO,GAAG,MAAM,CAAC,IAAI,CAAC,EAAE,CAAC,CAAC;QAChC,MAAM,KAAK,GAAG,MAAM,CAAC,MAAM,CAAC,EAAE,CAAC,CAAC;QAChC,OAAO;YACH,IAAI,EAAE,KAAK;YACX,CAAC,EAAE,KAAK;YACR,CAAC,EAAE,OAAO;YACV,WAAW,EAAE,GAAG;YAChB,IAAI,EAAE,MAAM,CAAC,IAAI;SACpB,CAAC;IACN,CAAC,CAAC,CAAC;IAEH,MAAM,MAAM,GAAG;QACX,KAAK,EAAE,SAAS;QAChB,OAAO,EAAE,OAAO;QAChB,UAAU,EAAE,IAAI;QAChB,KAAK,EAAE,IAAI;QACX,MAAM,EAAE,GAAG;KACd,CAAC;IAEF,MAAM,IAAI,GAAG;;;;aAIJ,SAAS;;;;;;iBAML,IAAI,CAAC,SAAS,CAAC,IAAI,CAAC;mBAClB,IAAI,CAAC,SAAS,CAAC,MAAM,CAAC;;;;;SAKhC,CAAC;IAEN,aAAa,CAAC,IAAI,EAAE,IAAI,CAAC,CAAC;IAC1B,OAAO,CAAC,GAAG,CAAC,gCAAgC,IAAI,EAAE,CAAC,CAAC;AACxD,CAAC;AAID;;GAEG;AACH,MAAM,UAAU,aAAa,CAAC,IAAuB;IACjD,MAAM,SAAS,GAAG,IAAI,GAAG,EAAkB,CAAC;IAE5C,KAAK,MAAM,MAAM,IAAI,IAAI,EAAE,CAAC;QACxB,MAAM,CAAC,GAAG,MAAM,CAAC,IAA2B,CAAC;QAC7C,KAAK,MAAM,CAAC,GAAG,EAAE,KAAK,CAAC,IAAI,MAAM,CAAC,OAAO,CAAC,CAAC,CAAC,EAAE,CAAC;YAC3C,IAAI,OAAO,KAAK,KAAK,QAAQ,EAAE,CAAC;gBAC5B,SAAS;YACb,CAAC;YACD,MAAM,WAAW,GAAG,SAAS,CAAC,GAAG,CAAC,GAAG,CAAC,IAAI,CAAC,CAAC;YAC5C,IAAI,KAAK,GAAG,WAAW,EAAE,CAAC;gBACtB,SAAS,CAAC,GAAG,CAAC,GAAG,EAAE,KAAK,CAAC,CAAC;YAC9B,CAAC;QACL,CAAC;IACL,CAAC;IAED,KAAK,MAAM,MAAM,IAAI,IAAI,EAAE,CAAC;QACxB,MAAM,CAAC,GAAG,MAAM,CAAC,IAA2B,CAAC;QAC7C,KAAK,MAAM,CAAC,GAAG,EAAE,KAAK,CAAC,IAAI,MAAM,CAAC,OAAO,CAAC,CAAC,CAAC,EAAE,CAAC;YAC3C,IAAI,OAAO,KAAK,KAAK,QAAQ,EAAE,CAAC;gBAC5B,MAAM,GAAG,GAAG,SAAS,CAAC,GAAG,CAAC,GAAG,CAAC,IAAI,CAAC,CAAC;gBACpC,CAAC,CAAC,GAAG,CAAC,GAAG,KAAK,GAAG,GAAG,CAAC;YACzB,CAAC;QACL,CAAC;IACL,CAAC;IAED,OAAO,IAAI,CAAC;AAChB,CAAC;AAED;;;;;;;GAOG;AACH,MAAM,UAAU,uBAAuB,CACnC,SAAiB,EACjB,MAAc,EACd,IAAY,EACZ,QAA6D,EAC7D,OAKC;IAED,0IAA0I;IAC1I,IAAI,KAAK,GAAG,WAAW,CAAC,MAAM,CAAC,CAAC,MAAM,CAAC,CAAC,CAAC,EAAE,CAAC,CAAC,CAAC,QAAQ,CAAC,OAAO,CAAC,CAAC,CAAC;IAEjE,2HAA2H;IAC3H,MAAM,gBAAgB,GAAuC,IAAI,GAAG,EAAE,CAAC;IAEvE,4CAA4C;IAC5C,IAAI,OAAO,EAAE,IAAI,EAAE,CAAC;QAChB,KAAK,GAAG,KAAK,CAAC,IAAI,EAAE,CAAC,KAAK,CAAC,CAAC,EAAE,OAAO,CAAC,IAAI,CAAC,CAAC;IAChD,CAAC;IAED,KAAK,MAAM,IAAI,IAAI,KAAK,EAAE,CAAC;QACvB,kCAAkC;QAClC,MAAM,MAAM,GAAG,UAAU,CAAC,IAAI,CAAC,IAAI,CAAC,MAAM,EAAE,IAAI,CAAC,CAAC,CAAC;QACnD,MAAM,OAAO,GAAG,MAAM,CAAC,OAAO,CAAC;QAC/B,MAAM,IAAI,GAAW,MAAM,CAAC,IAAI,CAAC;QACjC,OAAO,CAAC,GAAG,CAAC,uCAAuC,IAAI,EAAE,CAAC,CAAC;QAE3D,wBAAwB;QACxB,IAAI,gBAAgB,GAAG,OAAO,EAAE,UAAU,CAAC,CAAC,CAAC,OAAO,CAAC,UAAU,CAAC,OAAO,CAAC,CAAC,CAAC,CAAC,oBAAoB,CAAC,OAAO,CAAC,CAAC;QACzG,YAAY;QACZ,gBAAgB,GAAG,aAAa,CAAC,gBAAgB,CAAC,CAAC;QAEnD,sCAAsC;QACtC,KAAK,MAAM,MAAM,IAAI,gBAAgB,EAAE,CAAC;YACpC,IAAI,OAAO,EAAE,MAAM,IAAI,CAAC,OAAO,CAAC,MAAM,CAAC,MAAM,CAAC,EAAE,CAAC;gBAC7C,OAAO;gBACP,SAAS;YACb,CAAC;YAED,MAAM,IAAI,GAAG,MAAM,CAAC,QAAQ,CAAC,MAAM,CAAC;YACpC,MAAM,eAAe,GAAG,gBAAgB,CAAC,GAAG,CAAC,IAAI,CAAC,IAAI,EAAE,CAAC;YAEzD,MAAM,EAAE,GAAG;gBACP,GAAG,MAAM;aACZ,CAAC;YACF,EAAE,CAAC,QAAQ,CAAC,IAAI,GAAG,IAAI,IAAI,CAAC,IAAI,CAAC,CAAC,WAAW,EAAE,CAAC;YAEhD,eAAe,CAAC,IAAI,CAAC,MAAM,CAAC,CAAC;YAC7B,gBAAgB,CAAC,GAAG,CAAC,IAAI,EAAE,eAAe,CAAC,CAAC;QAChD,CAAC;IACL,CAAC;IAED,MAAM,OAAO,GAAc,EAAE,CAAC;IAE9B,sCAAsC;IACtC,KAAK,IAAI,CAAC,IAAI,EAAE,OAAO,CAAC,IAAI,gBAAgB,EAAE,CAAC;QAC3C,OAAO,CAAC,IAAI,CAAC,CAAC,CAAC,EAAE,CAAC,EAAE,EAAE;YAClB,OAAO,IAAI,IAAI,CAAC,CAAC,CAAC,QAAQ,CAAC,IAAI,CAAC,CAAC,OAAO,EAAE,GAAG,IAAI,IAAI,CAAC,CAAC,CAAC,QAAQ,CAAC,IAAI,CAAC,CAAC,OAAO,EAAE,CAAC;QACrF,CAAC,CAAC,CAAC;QAEH,MAAM,OAAO,GAAG,OAAO,CAAC,GAAG,CAAC,CAAC,CAAC,EAAE,CAAC,CAAC,CAAC,QAAQ,CAAC,MAAM,CAAC,CAAC;QACpD,MAAM,IAAI,GAAG,OAAO,CAAC,GAAG,CAAC,CAAC,CAAC,EAAE,CAAC,QAAQ,CAAC,CAAC,CAAC,IAAS,EAAE,CAAC,CAAC,QAAQ,CAAC,CAAC,CAAC,IAAI,EAAE,CAAC;QAExE,OAAO,CAAC,IAAI,CAAC;YACT,IAAI,EAAE,OAAO,EAAE,SAAS,CAAC,CAAC,CAAC,OAAO,CAAC,SAAS,CAAC,CAAC,CAAC,SAAS;YACxD,CAAC,EAAE,OAAO;YACV,CAAC,EAAE,IAAI;YACP,IAAI;SACP,CAAC,CAAC;IACP,CAAC;IAED,MAAM,MAAM,GAAG;QACX,KAAK,EAAE,SAAS;QAChB,UAAU,EAAE,IAAI;QAChB,KAAK,EAAE,IAAI;QACX,MAAM,EAAE,IAAI;KACf,CAAC;IAEF,MAAM,IAAI,GAAG;;;;aAIJ,SAAS;;;;;;iBAML,IAAI,CAAC,SAAS,CAAC,OAAO,CAAC;mBACrB,IAAI,CAAC,SAAS,CAAC,MAAM,CAAC;;;;;SAKhC,CAAC;IAEN,aAAa,CAAC,IAAI,EAAE,IAAI,CAAC,CAAC;IAC1B,OAAO,CAAC,GAAG,CAAC,iCAAiC,IAAI,EAAE,CAAC,CAAC;AAEzD,CAAC"}
@@ -0,0 +1,8 @@
1
+ import { Evaluator, EvaluatorResult, EvaluatorResultData } from './evaluator.js';
2
+ export interface EditDistanceEvaluatorResultData extends EvaluatorResultData {
3
+ edit_distance: number;
4
+ }
5
+ export declare class EditDistanceEvaluator extends Evaluator {
6
+ evaluate(response: string, expected_response: string): Promise<Partial<EvaluatorResult>>;
7
+ }
8
+ //# sourceMappingURL=edit-distance-evaluator.d.ts.map
@@ -0,0 +1 @@
1
+ {"version":3,"file":"edit-distance-evaluator.d.ts","sourceRoot":"","sources":["../../src/evaluator/edit-distance-evaluator.ts"],"names":[],"mappings":"AACA,OAAO,EAAE,SAAS,EAAE,eAAe,EAAE,mBAAmB,EAAE,MAAM,gBAAgB,CAAC;AAEjF,MAAM,WAAW,+BAAgC,SAAQ,mBAAmB;IACxE,aAAa,EAAE,MAAM,CAAC;CACzB;AAED,qBAAa,qBAAsB,SAAQ,SAAS;IAC1C,QAAQ,CAAC,QAAQ,EAAE,MAAM,EAAE,iBAAiB,EAAE,MAAM,GAAG,OAAO,CAAC,OAAO,CAAC,eAAe,CAAC,CAAC;CAQjG"}
@@ -0,0 +1,13 @@
1
+ import { levenshteinEditDistance } from 'levenshtein-edit-distance';
2
+ import { Evaluator } from './evaluator.js';
3
+ export class EditDistanceEvaluator extends Evaluator {
4
+ async evaluate(response, expected_response) {
5
+ const distance = levenshteinEditDistance(response, expected_response);
6
+ return {
7
+ data: {
8
+ edit_distance: distance
9
+ }
10
+ };
11
+ }
12
+ }
13
+ //# sourceMappingURL=edit-distance-evaluator.js.map
@@ -0,0 +1 @@
1
+ {"version":3,"file":"edit-distance-evaluator.js","sourceRoot":"","sources":["../../src/evaluator/edit-distance-evaluator.ts"],"names":[],"mappings":"AAAA,OAAO,EAAC,uBAAuB,EAAC,MAAM,2BAA2B,CAAC;AAClE,OAAO,EAAE,SAAS,EAAwC,MAAM,gBAAgB,CAAC;AAMjF,MAAM,OAAO,qBAAsB,SAAQ,SAAS;IAChD,KAAK,CAAC,QAAQ,CAAC,QAAgB,EAAE,iBAAyB;QACtD,MAAM,QAAQ,GAAG,uBAAuB,CAAC,QAAQ,EAAE,iBAAiB,CAAC,CAAC;QACtE,OAAO;YACH,IAAI,EAAE;gBACF,aAAa,EAAE,QAAQ;aAC1B;SACJ,CAAC;IACN,CAAC;CACJ"}
@@ -0,0 +1,95 @@
1
+ import { Evaluator, EvaluatorResult } from "./evaluator.js";
2
+ /**
3
+ * Configuration for the evaluation matrix
4
+ */
5
+ export interface EvalMatrixConfig {
6
+ config: {
7
+ /**
8
+ * Name of the evaluation matrix
9
+ */
10
+ name: string;
11
+ /**
12
+ * Helpful description of the evaluation matrix
13
+ */
14
+ description: string;
15
+ /**
16
+ * Where to store run history
17
+ */
18
+ history_folder: string;
19
+ /**
20
+ * The number of runs to perform for each case
21
+ * Note this will trigger evaluation for all registered evaluators for each run
22
+ */
23
+ num_runs: number;
24
+ };
25
+ /**
26
+ * Runners to evaluate
27
+ */
28
+ runners: Runner[];
29
+ /**
30
+ * Evaluators to evaluate with
31
+ */
32
+ evaluators: NamedEvaluator[];
33
+ /**
34
+ * Cases to evaluate
35
+ */
36
+ cases: Case[];
37
+ }
38
+ /**
39
+ * Evaluation matrix for running multiple runners on multiple cases with multiple evaluators
40
+ */
41
+ export declare class EvalMatrix {
42
+ private config;
43
+ constructor(config: EvalMatrixConfig);
44
+ /**
45
+ * Run the evaluation matrix, getting all results back
46
+ */
47
+ run(): Promise<EvaluatorResult[]>;
48
+ }
49
+ /**
50
+ * General format for histories when prompting
51
+ */
52
+ export interface Message {
53
+ role: 'user' | 'system' | 'assistant';
54
+ content: string;
55
+ }
56
+ /**
57
+ * Runner interface for running a prompt against a mode, a service, or something else that provides a response
58
+ */
59
+ export interface Runner {
60
+ name: string;
61
+ runner: (prompt: string, messages: Message[]) => Promise<string>;
62
+ }
63
+ /**
64
+ * Generic evaluator interface w/ a name to identify it
65
+ */
66
+ export interface NamedEvaluator {
67
+ name: string;
68
+ eval: Evaluator;
69
+ }
70
+ /**
71
+ * Case interface for defining an evaluation case
72
+ */
73
+ export interface Case {
74
+ /**
75
+ * Name of the case
76
+ */
77
+ name: string;
78
+ /**
79
+ * Options Message history, used for system, user & assistant messages
80
+ */
81
+ history?: Message[];
82
+ /**
83
+ * Core prompt to run with
84
+ */
85
+ prompt: string;
86
+ /**
87
+ * Context for the prompt, used for RAG applications
88
+ */
89
+ context: string[];
90
+ /**
91
+ * Expected response
92
+ */
93
+ expected_response: string;
94
+ }
95
+ //# sourceMappingURL=eval-matrix.d.ts.map
@@ -0,0 +1 @@
1
+ {"version":3,"file":"eval-matrix.d.ts","sourceRoot":"","sources":["../../src/evaluator/eval-matrix.ts"],"names":[],"mappings":"AAAA,OAAO,EAAE,SAAS,EAAE,eAAe,EAAE,MAAM,gBAAgB,CAAC;AAI5D;;GAEG;AACH,MAAM,WAAW,gBAAgB;IAC7B,MAAM,EAAE;QACJ;;WAEG;QACH,IAAI,EAAE,MAAM,CAAC;QAEb;;WAEG;QACH,WAAW,EAAE,MAAM,CAAC;QAEpB;;WAEG;QACH,cAAc,EAAE,MAAM,CAAC;QAEvB;;;WAGG;QACH,QAAQ,EAAE,MAAM,CAAC;KACpB,CAAC;IAEF;;OAEG;IACH,OAAO,EAAE,MAAM,EAAE,CAAC;IAElB;;OAEG;IACH,UAAU,EAAE,cAAc,EAAE,CAAC;IAE7B;;OAEG;IACH,KAAK,EAAE,IAAI,EAAE,CAAC;CACjB;AAED;;GAEG;AACH,qBAAa,UAAU;IACnB,OAAO,CAAC,MAAM,CAAmB;gBAErB,MAAM,EAAE,gBAAgB;IAIpC;;OAEG;IACG,GAAG,IAAI,OAAO,CAAC,eAAe,EAAE,CAAC;CA2F1C;AAED;;GAEG;AACH,MAAM,WAAW,OAAO;IACpB,IAAI,EAAE,MAAM,GAAG,QAAQ,GAAG,WAAW,CAAC;IACtC,OAAO,EAAE,MAAM,CAAC;CACnB;AAED;;GAEG;AACH,MAAM,WAAW,MAAM;IACnB,IAAI,EAAE,MAAM,CAAC;IACb,MAAM,EAAE,CAAC,MAAM,EAAE,MAAM,EAAE,QAAQ,EAAE,OAAO,EAAE,KAAK,OAAO,CAAC,MAAM,CAAC,CAAC;CACpE;AAED;;GAEG;AACH,MAAM,WAAW,cAAc;IAC3B,IAAI,EAAE,MAAM,CAAC;IACb,IAAI,EAAE,SAAS,CAAC;CACnB;AAED;;GAEG;AACH,MAAM,WAAW,IAAI;IACjB;;OAEG;IACH,IAAI,EAAE,MAAM,CAAC;IAEb;;OAEG;IACH,OAAO,CAAC,EAAE,OAAO,EAAE,CAAC;IAEpB;;OAEG;IACH,MAAM,EAAE,MAAM,CAAC;IAEf;;OAEG;IACH,OAAO,EAAE,MAAM,EAAE,CAAC;IAElB;;OAEG;IACH,iBAAiB,EAAE,MAAM,CAAC;CAC7B"}
@@ -0,0 +1,87 @@
1
+ import fs from 'fs';
2
+ import * as path from 'path';
3
+ /**
4
+ * Evaluation matrix for running multiple runners on multiple cases with multiple evaluators
5
+ */
6
+ export class EvalMatrix {
7
+ constructor(config) {
8
+ this.config = config;
9
+ }
10
+ /**
11
+ * Run the evaluation matrix, getting all results back
12
+ */
13
+ async run() {
14
+ // get the current timestamp
15
+ const start = new Date();
16
+ const results = [];
17
+ // verify that all runners have unique names first
18
+ const runnerNames = this.config.runners.map(r => r.name);
19
+ const uniqueRunnerNames = new Set();
20
+ for (const name of runnerNames) {
21
+ if (uniqueRunnerNames.has(name)) {
22
+ throw new Error(`Runner names must be unique, found duplicate: ${name}`);
23
+ }
24
+ uniqueRunnerNames.add(name);
25
+ }
26
+ console.log(`Running evaluation matrix: ${this.config.config.name}`);
27
+ console.log(`Found ${this.config.runners.length * this.config.cases.length * this.config.evaluators.length} runner-evaluator-case combinations to handle`);
28
+ // run all runners
29
+ for (const runner of this.config.runners) {
30
+ console.log(`* Runner: ${runner.name}`);
31
+ // run all cases for this runner
32
+ for (const testCase of this.config.cases) {
33
+ console.log(` * Case: ${testCase.name}`);
34
+ const runCount = this.config.config.num_runs ?? 1;
35
+ for (let iteration = 0; iteration < runCount; iteration++) {
36
+ const runnerStartTime = new Date();
37
+ const response = await runner.runner(testCase.prompt, testCase.history ?? []);
38
+ const runnerEndTime = new Date();
39
+ // run all evaluators on this response
40
+ for (const evaluator of this.config.evaluators) {
41
+ console.log(` * Evaluator: ${evaluator.name} (run ${iteration + 1})`);
42
+ const result = await evaluator.eval.evaluate(response, testCase.expected_response);
43
+ if (!result.name) {
44
+ result.name = `${runner.name} - ${testCase.name} - ${evaluator.name}`;
45
+ }
46
+ // add runtime there too, so we have access to it
47
+ result.data._runtime = (runnerEndTime.getTime() - runnerStartTime.getTime()) / 1000.0; // in seconds
48
+ result.metadata = {
49
+ runner: runner.name,
50
+ evaluator: evaluator.name,
51
+ testCase: { ...testCase },
52
+ actual_response: response,
53
+ duration: (runnerEndTime.getTime() - runnerStartTime.getTime()) / 1000.0, // in seconds
54
+ run_count: iteration + 1
55
+ };
56
+ results.push(result);
57
+ }
58
+ }
59
+ }
60
+ }
61
+ // check if the folder exists first
62
+ if (!fs.existsSync(this.config.config.history_folder)) {
63
+ fs.mkdirSync(this.config.config.history_folder);
64
+ }
65
+ const dateStr = new Date().toISOString();
66
+ const sanitizedDateStr = dateStr.replace(/:/g, '-').replace(/\./g, '-');
67
+ let fileName = `${sanitizedDateStr}-${this.config.config.name.toLowerCase().replace(/\s+/g, '-')}.json`;
68
+ // escape any slashes too
69
+ fileName = fileName.replace(/\//g, '-');
70
+ console.log(`Writing results to file: ${path.join(this.config.config.history_folder, fileName)}`);
71
+ // run time in seconds
72
+ const runTime = (new Date().getTime() - start.getTime()) / 1000;
73
+ console.log(`Evaluation matrix completed in ${runTime} seconds (${runTime / 60} minutes)`);
74
+ // prepare & write results to file
75
+ const report = {
76
+ config: this.config.config,
77
+ date: dateStr,
78
+ runTime: `${runTime}s`,
79
+ results
80
+ };
81
+ fs.writeFileSync(path.join(this.config.config.history_folder, fileName), JSON.stringify(report, null, 2));
82
+ // write the name of this last report into last.txt
83
+ fs.writeFileSync(path.join(this.config.config.history_folder, 'last.txt'), fileName);
84
+ return results;
85
+ }
86
+ }
87
+ //# sourceMappingURL=eval-matrix.js.map
@@ -0,0 +1 @@
1
+ {"version":3,"file":"eval-matrix.js","sourceRoot":"","sources":["../../src/evaluator/eval-matrix.ts"],"names":[],"mappings":"AACA,OAAO,EAAE,MAAM,IAAI,CAAC;AACpB,OAAO,KAAK,IAAI,MAAM,MAAM,CAAC;AA6C7B;;GAEG;AACH,MAAM,OAAO,UAAU;IAGnB,YAAY,MAAwB;QAChC,IAAI,CAAC,MAAM,GAAG,MAAM,CAAC;IACzB,CAAC;IAED;;OAEG;IACH,KAAK,CAAC,GAAG;QAEL,4BAA4B;QAC5B,MAAM,KAAK,GAAG,IAAI,IAAI,EAAE,CAAC;QAEzB,MAAM,OAAO,GAAsB,EAAE,CAAC;QAEtC,kDAAkD;QAClD,MAAM,WAAW,GAAG,IAAI,CAAC,MAAM,CAAC,OAAO,CAAC,GAAG,CAAC,CAAC,CAAC,EAAE,CAAC,CAAC,CAAC,IAAI,CAAC,CAAC;QACzD,MAAM,iBAAiB,GAAG,IAAI,GAAG,EAAE,CAAC;QACpC,KAAK,MAAM,IAAI,IAAI,WAAW,EAAE,CAAC;YAC7B,IAAI,iBAAiB,CAAC,GAAG,CAAC,IAAI,CAAC,EAAE,CAAC;gBAC9B,MAAM,IAAI,KAAK,CAAC,iDAAiD,IAAI,EAAE,CAAC,CAAC;YAC7E,CAAC;YACD,iBAAiB,CAAC,GAAG,CAAC,IAAI,CAAC,CAAC;QAChC,CAAC;QAED,OAAO,CAAC,GAAG,CAAC,8BAA8B,IAAI,CAAC,MAAM,CAAC,MAAM,CAAC,IAAI,EAAE,CAAC,CAAC;QACrE,OAAO,CAAC,GAAG,CAAC,SAAS,IAAI,CAAC,MAAM,CAAC,OAAO,CAAC,MAAM,GAAG,IAAI,CAAC,MAAM,CAAC,KAAK,CAAC,MAAM,GAAG,IAAI,CAAC,MAAM,CAAC,UAAU,CAAC,MAAM,+CAA+C,CAAC,CAAC;QAE3J,kBAAkB;QAClB,KAAK,MAAM,MAAM,IAAI,IAAI,CAAC,MAAM,CAAC,OAAO,EAAE,CAAC;YAEvC,OAAO,CAAC,GAAG,CAAC,aAAa,MAAM,CAAC,IAAI,EAAE,CAAC,CAAC;YAExC,gCAAgC;YAChC,KAAK,MAAM,QAAQ,IAAI,IAAI,CAAC,MAAM,CAAC,KAAK,EAAE,CAAC;gBACvC,OAAO,CAAC,GAAG,CAAC,aAAa,QAAQ,CAAC,IAAI,EAAE,CAAC,CAAC;gBAE1C,MAAM,QAAQ,GAAG,IAAI,CAAC,MAAM,CAAC,MAAM,CAAC,QAAQ,IAAI,CAAC,CAAC;gBAClD,KAAK,IAAI,SAAS,GAAG,CAAC,EAAE,SAAS,GAAG,QAAQ,EAAE,SAAS,EAAE,EAAE,CAAC;oBACxD,MAAM,eAAe,GAAG,IAAI,IAAI,EAAE,CAAC;oBACnC,MAAM,QAAQ,GAAG,MAAM,MAAM,CAAC,MAAM,CAAC,QAAQ,CAAC,MAAM,EAAE,QAAQ,CAAC,OAAO,IAAI,EAAE,CAAC,CAAC;oBAC9E,MAAM,aAAa,GAAG,IAAI,IAAI,EAAE,CAAC;oBAEjC,sCAAsC;oBACtC,KAAK,MAAM,SAAS,IAAI,IAAI,CAAC,MAAM,CAAC,UAAU,EAAE,CAAC;wBAC7C,OAAO,CAAC,GAAG,CAAC,oBAAoB,SAAS,CAAC,IAAI,SAAS,SAAS,GAAG,CAAC,GAAG,CAAC,CAAC;wBACzE,MAAM,MAAM,GAAG,MAAM,SAAS,CAAC,IAAI,CAAC,QAAQ,CAAC,QAAQ,EAAE,QAAQ,CAAC,iBAAiB,CAAC,CAAC;wBACnF,IAAI,CAAC,MAAM,CAAC,IAAI,EAAE,CAAC;4BACf,MAAM,CAAC,IAAI,GAAG,GAAG,MAAM,CAAC,IAAI,MAAM,QAAQ,CAAC,IAAI,MAAM,SAAS,CAAC,IAAI,EAAE,CAAC;wBAC1E,CAAC;wBACD,iDAAiD;wBACjD,MAAM,CAAC,IAAK,CAAC,QAAQ,GAAG,CAAC,aAAa,CAAC,OAAO,EAAE,GAAG,eAAe,CAAC,OAAO,EAAE,CAAC,GAAG,MAAM,CAAC,CAAC,aAAa;wBAErG,MAAM,CAAC,QAAQ,GAAG;4BACd,MAAM,EAAE,MAAM,CAAC,IAAI;4BACnB,SAAS,EAAE,SAAS,CAAC,IAAI;4BACzB,QAAQ,EAAE,EAAE,GAAG,QAAQ,EAAE;4BACzB,eAAe,EAAE,QAAQ;4BACzB,QAAQ,EAAE,CAAC,aAAa,CAAC,OAAO,EAAE,GAAG,eAAe,CAAC,OAAO,EAAE,CAAC,GAAG,MAAM,EAAE,aAAa;4BACvF,SAAS,EAAE,SAAS,GAAG,CAAC;yBAC3B,CAAC;wBAEF,OAAO,CAAC,IAAI,CAAC,MAAyB,CAAC,CAAC;oBAC5C,CAAC;gBACL,CAAC;YACL,CAAC;QACL,CAAC;QAED,mCAAmC;QACnC,IAAI,CAAC,EAAE,CAAC,UAAU,CAAC,IAAI,CAAC,MAAM,CAAC,MAAM,CAAC,cAAc,CAAC,EAAE,CAAC;YACpD,EAAE,CAAC,SAAS,CAAC,IAAI,CAAC,MAAM,CAAC,MAAM,CAAC,cAAc,CAAC,CAAC;QACpD,CAAC;QAED,MAAM,OAAO,GAAG,IAAI,IAAI,EAAE,CAAC,WAAW,EAAE,CAAC;QACzC,MAAM,gBAAgB,GAAG,OAAO,CAAC,OAAO,CAAC,IAAI,EAAE,GAAG,CAAC,CAAC,OAAO,CAAC,KAAK,EAAE,GAAG,CAAC,CAAC;QACxE,IAAI,QAAQ,GAAG,GAAG,gBAAgB,IAAI,IAAI,CAAC,MAAM,CAAC,MAAM,CAAC,IAAI,CAAC,WAAW,EAAE,CAAC,OAAO,CAAC,MAAM,EAAE,GAAG,CAAC,OAAO,CAAC;QACxG,yBAAyB;QACzB,QAAQ,GAAG,QAAQ,CAAC,OAAO,CAAC,KAAK,EAAE,GAAG,CAAC,CAAC;QAExC,OAAO,CAAC,GAAG,CAAC,4BAA4B,IAAI,CAAC,IAAI,CAAC,IAAI,CAAC,MAAM,CAAC,MAAM,CAAC,cAAc,EAAE,QAAQ,CAAC,EAAE,CAAC,CAAC;QAElG,sBAAsB;QACtB,MAAM,OAAO,GAAG,CAAC,IAAI,IAAI,EAAE,CAAC,OAAO,EAAE,GAAG,KAAK,CAAC,OAAO,EAAE,CAAC,GAAG,IAAI,CAAC;QAChE,OAAO,CAAC,GAAG,CAAC,kCAAkC,OAAO,aAAa,OAAO,GAAG,EAAE,WAAW,CAAC,CAAC;QAE3F,kCAAkC;QAClC,MAAM,MAAM,GAAG;YACX,MAAM,EAAE,IAAI,CAAC,MAAM,CAAC,MAAM;YAC1B,IAAI,EAAE,OAAO;YACb,OAAO,EAAE,GAAG,OAAO,GAAG;YACtB,OAAO;SACV,CAAC;QACF,EAAE,CAAC,aAAa,CAAC,IAAI,CAAC,IAAI,CAAC,IAAI,CAAC,MAAM,CAAC,MAAM,CAAC,cAAc,EAAE,QAAQ,CAAC,EAAE,IAAI,CAAC,SAAS,CAAC,MAAM,EAAE,IAAI,EAAE,CAAC,CAAC,CAAC,CAAC;QAE1G,mDAAmD;QACnD,EAAE,CAAC,aAAa,CAAC,IAAI,CAAC,IAAI,CAAC,IAAI,CAAC,MAAM,CAAC,MAAM,CAAC,cAAc,EAAE,UAAU,CAAC,EAAE,QAAQ,CAAC,CAAC;QAErF,OAAO,OAAO,CAAC;IACnB,CAAC;CACJ"}
@@ -0,0 +1,64 @@
1
+ /**
2
+ * Baseline Validator Class
3
+ */
4
+ export type EvaluatorResultData = Record<string, unknown> & {
5
+ _runtime?: number;
6
+ };
7
+ /**
8
+ * Evaluator result type
9
+ */
10
+ export type EvaluatorResult = {
11
+ /**
12
+ * Name of this evaluation
13
+ */
14
+ name: string;
15
+ /**
16
+ * Optional metadata, can be used to store additional information
17
+ */
18
+ metadata: Record<string, any>;
19
+ /**
20
+ * Data for this evaluation
21
+ */
22
+ data: EvaluatorResultData;
23
+ };
24
+ /**
25
+ * Helper to process a set of results, averaging all runs of each runner-evaluator-case combination
26
+ */
27
+ export declare function averageAcrossCases(results: EvaluatorResult[]): EvaluatorResult[];
28
+ /**
29
+ * Averages all results across runners at the highest level, to get a single result for each runner
30
+ */
31
+ export declare function averageAcrossRunners(results: EvaluatorResult[]): EvaluatorResult[];
32
+ /**
33
+ * Report
34
+ */
35
+ export interface Report {
36
+ config: {
37
+ name: string;
38
+ description: string;
39
+ history_folder: string;
40
+ num_runs: number;
41
+ };
42
+ date: string;
43
+ runTime: string;
44
+ results: EvaluatorResult[];
45
+ }
46
+ /**
47
+ * Loads a specific report, containing evaluator results from a file & returns it
48
+ */
49
+ export declare function loadReport(file: string): Report;
50
+ /**
51
+ * Attempts to load the most recent evaluator results from the given file
52
+ */
53
+ export declare function loadLastResults(dir: string, take?: number): EvaluatorResult[];
54
+ /**
55
+ * Evaluator class for evaluating agent responses
56
+ */
57
+ export declare abstract class Evaluator {
58
+ /**
59
+ * Validate some agent response
60
+ */
61
+ abstract evaluate(response: string, expected_response: string): Promise<Partial<EvaluatorResult>>;
62
+ }
63
+ export declare function mergeEvaluators(...evaluators: Evaluator[]): Evaluator;
64
+ //# sourceMappingURL=evaluator.d.ts.map
@@ -0,0 +1 @@
1
+ {"version":3,"file":"evaluator.d.ts","sourceRoot":"","sources":["../../src/evaluator/evaluator.ts"],"names":[],"mappings":"AAAA;;GAEG;AAKH,MAAM,MAAM,mBAAmB,GAAG,MAAM,CAAC,MAAM,EAAE,OAAO,CAAC,GAAG;IACxD,QAAQ,CAAC,EAAE,MAAM,CAAC;CACrB,CAAC;AAEF;;GAEG;AACH,MAAM,MAAM,eAAe,GAAG;IAC1B;;OAEG;IACH,IAAI,EAAE,MAAM,CAAC;IAEb;;OAEG;IACH,QAAQ,EAAE,MAAM,CAAC,MAAM,EAAE,GAAG,CAAC,CAAC;IAE9B;;OAEG;IACH,IAAI,EAAE,mBAAmB,CAAC;CAE7B,CAAC;AAEF;;GAEG;AACH,wBAAgB,kBAAkB,CAAC,OAAO,EAAE,eAAe,EAAE,GAAG,eAAe,EAAE,CA4ChF;AAED;;GAEG;AACH,wBAAgB,oBAAoB,CAAC,OAAO,EAAE,eAAe,EAAE,GAAG,eAAe,EAAE,CAiDlF;AAED;;GAEG;AACH,MAAM,WAAW,MAAM;IACnB,MAAM,EAAE;QACJ,IAAI,EAAE,MAAM,CAAC;QACb,WAAW,EAAE,MAAM,CAAC;QACpB,cAAc,EAAE,MAAM,CAAC;QACvB,QAAQ,EAAE,MAAM,CAAC;KACpB,CAAC;IACF,IAAI,EAAE,MAAM,CAAC;IACb,OAAO,EAAE,MAAM,CAAC;IAChB,OAAO,EAAE,eAAe,EAAE,CAAC;CAC9B;AAED;;GAEG;AACH,wBAAgB,UAAU,CAAC,IAAI,EAAE,MAAM,GAAG,MAAM,CAE/C;AAED;;GAEG;AACH,wBAAgB,eAAe,CAAC,GAAG,EAAE,MAAM,EAAE,IAAI,CAAC,EAAE,MAAM,GAAG,eAAe,EAAE,CAmC7E;AAED;;GAEG;AACH,8BAAsB,SAAS;IAC3B;;OAEG;IACH,QAAQ,CAAC,QAAQ,CAAC,QAAQ,EAAE,MAAM,EAAE,iBAAiB,EAAE,MAAM,GAAG,OAAO,CAAC,OAAO,CAAC,eAAe,CAAC,CAAC;CAEpG;AAED,wBAAgB,eAAe,CAAC,GAAG,UAAU,EAAE,SAAS,EAAE,GAAG,SAAS,CAGrE"}
@@ -0,0 +1,162 @@
1
+ /**
2
+ * Baseline Validator Class
3
+ */
4
+ import { readFileSync, existsSync, readdirSync } from 'fs';
5
+ import * as path from 'path';
6
+ /**
7
+ * Helper to process a set of results, averaging all runs of each runner-evaluator-case combination
8
+ */
9
+ export function averageAcrossCases(results) {
10
+ const mappedResults = new Map();
11
+ const averagedResults = [];
12
+ // collect like-results
13
+ for (const result of results) {
14
+ // add this result to the map (grouping by runner & case)
15
+ const name = result.name;
16
+ const existingResult = mappedResults.get(name) ?? [];
17
+ existingResult.push(result);
18
+ mappedResults.set(name, existingResult);
19
+ }
20
+ // average the results
21
+ for (const [_key, groupedResults] of mappedResults) {
22
+ const avgData = groupedResults[0].data;
23
+ // sum all results except the first
24
+ for (const result of groupedResults.slice(1)) {
25
+ const resultData = result.data;
26
+ for (const [key, value] of Object.entries(resultData)) {
27
+ if (typeof value === 'number') {
28
+ avgData[key] = (avgData[key] ?? 0) + value;
29
+ }
30
+ }
31
+ }
32
+ // lastly, divide each entry by the number of 'groupedResults'
33
+ for (const [key, value] of Object.entries(avgData)) {
34
+ if (typeof value === 'number') {
35
+ avgData[key] = value / groupedResults.length;
36
+ // round to 2 decimal places
37
+ avgData[key] = Math.round(avgData[key] * 100) / 100;
38
+ }
39
+ }
40
+ averagedResults.push({
41
+ name: groupedResults[0].name,
42
+ metadata: groupedResults[0].metadata,
43
+ data: avgData
44
+ });
45
+ }
46
+ return averagedResults;
47
+ }
48
+ /**
49
+ * Averages all results across runners at the highest level, to get a single result for each runner
50
+ */
51
+ export function averageAcrossRunners(results) {
52
+ // first average across runs
53
+ const processedResults = averageAcrossCases(results);
54
+ // now average across runners
55
+ const mappedResults = new Map();
56
+ const averagedResults = [];
57
+ // collect like-results
58
+ for (const result of processedResults) {
59
+ // add this result to the map (grouping by runner)
60
+ const name = result.metadata.runner;
61
+ const existingResult = mappedResults.get(name) ?? [];
62
+ existingResult.push(result);
63
+ mappedResults.set(name, existingResult);
64
+ }
65
+ // average the results
66
+ for (const [_key, groupedResults] of mappedResults) {
67
+ const avgData = groupedResults[0].data;
68
+ // sum all results except the first
69
+ for (const result of groupedResults.slice(1)) {
70
+ const resultData = result.data;
71
+ for (const [key, value] of Object.entries(resultData)) {
72
+ if (typeof value === 'number') {
73
+ avgData[key] = (avgData[key] ?? 0) + value;
74
+ }
75
+ }
76
+ }
77
+ // lastly, divide each entry by the number of 'groupedResults'
78
+ for (const [key, value] of Object.entries(avgData)) {
79
+ if (typeof value === 'number') {
80
+ avgData[key] = value / groupedResults.length;
81
+ // round to 2 decimal places
82
+ avgData[key] = Math.round(avgData[key] * 100) / 100;
83
+ }
84
+ }
85
+ averagedResults.push({
86
+ name: groupedResults[0].metadata.runner,
87
+ metadata: groupedResults[0].metadata,
88
+ data: avgData
89
+ });
90
+ }
91
+ return averagedResults;
92
+ }
93
+ /**
94
+ * Loads a specific report, containing evaluator results from a file & returns it
95
+ */
96
+ export function loadReport(file) {
97
+ return JSON.parse(readFileSync(file, 'utf-8'));
98
+ }
99
+ /**
100
+ * Attempts to load the most recent evaluator results from the given file
101
+ */
102
+ export function loadLastResults(dir, take) {
103
+ if (!existsSync(dir)) {
104
+ throw new Error(`Directory does not exist: ${dir}`);
105
+ }
106
+ let files = readdirSync(dir).filter(f => f.endsWith('.json'));
107
+ if (!take) {
108
+ const lastFile = path.join(dir, 'last.txt');
109
+ if (!existsSync(lastFile)) {
110
+ throw new Error(`Last file does not exist in directory: ${dir}. Try running an evaluation matrix first.`);
111
+ }
112
+ // read name from last file
113
+ const lastFileName = readFileSync(lastFile).toString();
114
+ files.push(lastFileName);
115
+ }
116
+ else {
117
+ // read the most recent files
118
+ files = files.sort().reverse().slice(0, take);
119
+ }
120
+ const results = [];
121
+ for (const file of files) {
122
+ const report = loadReport(path.join(dir, file));
123
+ results.push(...report.results);
124
+ }
125
+ // find the most recently created file in the path & read it
126
+ // const lastFileName = readFileSync(lastFile).toString();
127
+ // return loadReport(path.join(dir, lastFileName)).results;
128
+ return results;
129
+ }
130
+ /**
131
+ * Evaluator class for evaluating agent responses
132
+ */
133
+ export class Evaluator {
134
+ }
135
+ export function mergeEvaluators(...evaluators) {
136
+ // merge evaluators in sequence
137
+ return evaluators.reduce((acc, val) => mergeEvaluatorsInternal(acc, val));
138
+ }
139
+ /**
140
+ * Merges two evaluators together in sequence, such that results of a are combined with b (b takes precedence in key overrides)
141
+ * @param a First evaluator to merge
142
+ * @param b Second evaluator to merge
143
+ */
144
+ function mergeEvaluatorsInternal(a, b) {
145
+ return {
146
+ async evaluate(response, expected_response) {
147
+ const r1 = await a.evaluate(response, expected_response);
148
+ const r2 = await b.evaluate(response, expected_response);
149
+ return {
150
+ metadata: {
151
+ ...r1.metadata,
152
+ ...r2.metadata
153
+ },
154
+ data: {
155
+ ...r1.data,
156
+ ...r2.data
157
+ }
158
+ };
159
+ }
160
+ };
161
+ }
162
+ //# sourceMappingURL=evaluator.js.map
@@ -0,0 +1 @@
1
+ {"version":3,"file":"evaluator.js","sourceRoot":"","sources":["../../src/evaluator/evaluator.ts"],"names":[],"mappings":"AAAA;;GAEG;AAEH,OAAO,EAAE,YAAY,EAAE,UAAU,EAAE,WAAW,EAAE,MAAM,IAAI,CAAC;AAC3D,OAAO,KAAK,IAAI,MAAM,MAAM,CAAC;AA2B7B;;GAEG;AACH,MAAM,UAAU,kBAAkB,CAAC,OAA0B;IACzD,MAAM,aAAa,GAAmC,IAAI,GAAG,EAAE,CAAC;IAEhE,MAAM,eAAe,GAAsB,EAAE,CAAC;IAE9C,uBAAuB;IACvB,KAAK,MAAM,MAAM,IAAI,OAAO,EAAE,CAAC;QAC3B,yDAAyD;QACzD,MAAM,IAAI,GAAG,MAAM,CAAC,IAAI,CAAC;QACzB,MAAM,cAAc,GAAG,aAAa,CAAC,GAAG,CAAC,IAAI,CAAC,IAAI,EAAE,CAAC;QACrD,cAAc,CAAC,IAAI,CAAC,MAAM,CAAC,CAAC;QAC5B,aAAa,CAAC,GAAG,CAAC,IAAI,EAAE,cAAc,CAAC,CAAC;IAC5C,CAAC;IAED,sBAAsB;IACtB,KAAK,MAAM,CAAC,IAAI,EAAE,cAAc,CAAC,IAAI,aAAa,EAAE,CAAC;QACjD,MAAM,OAAO,GAAG,cAAc,CAAC,CAAC,CAAC,CAAC,IAAI,CAAC;QAEvC,mCAAmC;QACnC,KAAK,MAAM,MAAM,IAAI,cAAc,CAAC,KAAK,CAAC,CAAC,CAAC,EAAE,CAAC;YAC3C,MAAM,UAAU,GAAG,MAAM,CAAC,IAAI,CAAC;YAC/B,KAAK,MAAM,CAAC,GAAG,EAAE,KAAK,CAAC,IAAI,MAAM,CAAC,OAAO,CAAC,UAAU,CAAC,EAAE,CAAC;gBACpD,IAAI,OAAO,KAAK,KAAK,QAAQ,EAAE,CAAC;oBAC5B,OAAO,CAAC,GAAG,CAAC,GAAG,CAAC,OAAO,CAAC,GAAG,CAAW,IAAI,CAAC,CAAC,GAAG,KAAK,CAAC;gBACzD,CAAC;YACL,CAAC;QACL,CAAC;QAED,8DAA8D;QAC9D,KAAK,MAAM,CAAC,GAAG,EAAE,KAAK,CAAC,IAAI,MAAM,CAAC,OAAO,CAAC,OAAO,CAAC,EAAE,CAAC;YACjD,IAAI,OAAO,KAAK,KAAK,QAAQ,EAAE,CAAC;gBAC5B,OAAO,CAAC,GAAG,CAAC,GAAG,KAAK,GAAG,cAAc,CAAC,MAAM,CAAC;gBAC7C,4BAA4B;gBAC5B,OAAO,CAAC,GAAG,CAAC,GAAG,IAAI,CAAC,KAAK,CAAE,OAAO,CAAC,GAAG,CAAY,GAAG,GAAG,CAAC,GAAG,GAAG,CAAC;YACpE,CAAC;QACL,CAAC;QAED,eAAe,CAAC,IAAI,CAAC;YACjB,IAAI,EAAE,cAAc,CAAC,CAAC,CAAC,CAAC,IAAI;YAC5B,QAAQ,EAAE,cAAc,CAAC,CAAC,CAAC,CAAC,QAAQ;YACpC,IAAI,EAAE,OAAO;SAChB,CAAC,CAAC;IACP,CAAC;IACD,OAAO,eAAe,CAAC;AAC3B,CAAC;AAED;;GAEG;AACH,MAAM,UAAU,oBAAoB,CAAC,OAA0B;IAC3D,4BAA4B;IAC5B,MAAM,gBAAgB,GAAG,kBAAkB,CAAC,OAAO,CAAC,CAAC;IAErD,6BAA6B;IAC7B,MAAM,aAAa,GAAmC,IAAI,GAAG,EAAE,CAAC;IAEhE,MAAM,eAAe,GAAsB,EAAE,CAAC;IAE9C,uBAAuB;IACvB,KAAK,MAAM,MAAM,IAAI,gBAAgB,EAAE,CAAC;QACpC,kDAAkD;QAClD,MAAM,IAAI,GAAG,MAAM,CAAC,QAAQ,CAAC,MAAM,CAAC;QACpC,MAAM,cAAc,GAAG,aAAa,CAAC,GAAG,CAAC,IAAI,CAAC,IAAI,EAAE,CAAC;QACrD,cAAc,CAAC,IAAI,CAAC,MAAM,CAAC,CAAC;QAC5B,aAAa,CAAC,GAAG,CAAC,IAAI,EAAE,cAAc,CAAC,CAAC;IAC5C,CAAC;IAED,sBAAsB;IACtB,KAAK,MAAM,CAAC,IAAI,EAAE,cAAc,CAAC,IAAI,aAAa,EAAE,CAAC;QACjD,MAAM,OAAO,GAAG,cAAc,CAAC,CAAC,CAAC,CAAC,IAAI,CAAC;QAEvC,mCAAmC;QACnC,KAAK,MAAM,MAAM,IAAI,cAAc,CAAC,KAAK,CAAC,CAAC,CAAC,EAAE,CAAC;YAC3C,MAAM,UAAU,GAAG,MAAM,CAAC,IAAI,CAAC;YAC/B,KAAK,MAAM,CAAC,GAAG,EAAE,KAAK,CAAC,IAAI,MAAM,CAAC,OAAO,CAAC,UAAU,CAAC,EAAE,CAAC;gBACpD,IAAI,OAAO,KAAK,KAAK,QAAQ,EAAE,CAAC;oBAC5B,OAAO,CAAC,GAAG,CAAC,GAAG,CAAC,OAAO,CAAC,GAAG,CAAW,IAAI,CAAC,CAAC,GAAG,KAAK,CAAC;gBACzD,CAAC;YACL,CAAC;QACL,CAAC;QAED,8DAA8D;QAC9D,KAAK,MAAM,CAAC,GAAG,EAAE,KAAK,CAAC,IAAI,MAAM,CAAC,OAAO,CAAC,OAAO,CAAC,EAAE,CAAC;YACjD,IAAI,OAAO,KAAK,KAAK,QAAQ,EAAE,CAAC;gBAC5B,OAAO,CAAC,GAAG,CAAC,GAAG,KAAK,GAAG,cAAc,CAAC,MAAM,CAAC;gBAC7C,4BAA4B;gBAC5B,OAAO,CAAC,GAAG,CAAC,GAAG,IAAI,CAAC,KAAK,CAAE,OAAO,CAAC,GAAG,CAAY,GAAG,GAAG,CAAC,GAAG,GAAG,CAAC;YACpE,CAAC;QACL,CAAC;QAED,eAAe,CAAC,IAAI,CAAC;YACjB,IAAI,EAAE,cAAc,CAAC,CAAC,CAAC,CAAC,QAAQ,CAAC,MAAM;YACvC,QAAQ,EAAE,cAAc,CAAC,CAAC,CAAC,CAAC,QAAQ;YACpC,IAAI,EAAE,OAAO;SAChB,CAAC,CAAC;IACP,CAAC;IAED,OAAO,eAAe,CAAC;AAC3B,CAAC;AAiBD;;GAEG;AACH,MAAM,UAAU,UAAU,CAAC,IAAY;IACnC,OAAO,IAAI,CAAC,KAAK,CAAC,YAAY,CAAC,IAAI,EAAE,OAAO,CAAC,CAAW,CAAC;AAC7D,CAAC;AAED;;GAEG;AACH,MAAM,UAAU,eAAe,CAAC,GAAW,EAAE,IAAa;IACtD,IAAI,CAAC,UAAU,CAAC,GAAG,CAAC,EAAE,CAAC;QACnB,MAAM,IAAI,KAAK,CAAC,6BAA6B,GAAG,EAAE,CAAC,CAAC;IACxD,CAAC;IAED,IAAI,KAAK,GAAG,WAAW,CAAC,GAAG,CAAC,CAAC,MAAM,CAAC,CAAC,CAAC,EAAE,CAAC,CAAC,CAAC,QAAQ,CAAC,OAAO,CAAC,CAAC,CAAC;IAE9D,IAAI,CAAC,IAAI,EAAE,CAAC;QACR,MAAM,QAAQ,GAAG,IAAI,CAAC,IAAI,CAAC,GAAG,EAAE,UAAU,CAAC,CAAC;QAE5C,IAAI,CAAC,UAAU,CAAC,QAAQ,CAAC,EAAE,CAAC;YACxB,MAAM,IAAI,KAAK,CAAC,0CAA0C,GAAG,2CAA2C,CAAC,CAAC;QAC9G,CAAC;QACD,2BAA2B;QAC3B,MAAM,YAAY,GAAG,YAAY,CAAC,QAAQ,CAAC,CAAC,QAAQ,EAAE,CAAC;QAEvD,KAAK,CAAC,IAAI,CAAC,YAAY,CAAC,CAAC;IAE7B,CAAC;SAAM,CAAC;QACJ,6BAA6B;QAC7B,KAAK,GAAG,KAAK,CAAC,IAAI,EAAE,CAAC,OAAO,EAAE,CAAC,KAAK,CAAC,CAAC,EAAE,IAAI,CAAC,CAAC;IAElD,CAAC;IAED,MAAM,OAAO,GAAsB,EAAE,CAAC;IAEtC,KAAK,MAAM,IAAI,IAAI,KAAK,EAAE,CAAC;QACvB,MAAM,MAAM,GAAG,UAAU,CAAC,IAAI,CAAC,IAAI,CAAC,GAAG,EAAE,IAAI,CAAC,CAAC,CAAC;QAChD,OAAO,CAAC,IAAI,CAAC,GAAG,MAAM,CAAC,OAAO,CAAC,CAAC;IACpC,CAAC;IAED,4DAA4D;IAC5D,0DAA0D;IAC1D,2DAA2D;IAC3D,OAAO,OAAO,CAAC;AACnB,CAAC;AAED;;GAEG;AACH,MAAM,OAAgB,SAAS;CAM9B;AAED,MAAM,UAAU,eAAe,CAAC,GAAG,UAAuB;IACtD,+BAA+B;IAC/B,OAAO,UAAU,CAAC,MAAM,CAAC,CAAC,GAAG,EAAE,GAAG,EAAE,EAAE,CAAC,uBAAuB,CAAC,GAAG,EAAE,GAAG,CAAC,CAAC,CAAC;AAC9E,CAAC;AAED;;;;GAIG;AACH,SAAS,uBAAuB,CAAC,CAAY,EAAE,CAAY;IACvD,OAAO;QACH,KAAK,CAAC,QAAQ,CAAC,QAAgB,EAAE,iBAAyB;YACtD,MAAM,EAAE,GAAG,MAAM,CAAC,CAAC,QAAQ,CAAC,QAAQ,EAAE,iBAAiB,CAAC,CAAC;YACzD,MAAM,EAAE,GAAG,MAAM,CAAC,CAAC,QAAQ,CAAC,QAAQ,EAAE,iBAAiB,CAAC,CAAC;YACzD,OAAO;gBACH,QAAQ,EAAE;oBACN,GAAG,EAAE,CAAC,QAAQ;oBACd,GAAG,EAAE,CAAC,QAAQ;iBACjB;gBACD,IAAI,EAAE;oBACF,GAAG,EAAE,CAAC,IAAI;oBACV,GAAG,EAAE,CAAC,IAAI;iBACb;aACJ,CAAC;QACN,CAAC;KACJ,CAAC;AACN,CAAC"}
@@ -0,0 +1,6 @@
1
+ export * from './evaluator.js';
2
+ export * from './langium-evaluator.js';
3
+ export * from './edit-distance-evaluator.js';
4
+ export * from './eval-matrix.js';
5
+ export * from './chart.js';
6
+ //# sourceMappingURL=index.d.ts.map
@@ -0,0 +1 @@
1
+ {"version":3,"file":"index.d.ts","sourceRoot":"","sources":["../../src/evaluator/index.ts"],"names":[],"mappings":"AAAA,cAAc,gBAAgB,CAAC;AAC/B,cAAc,wBAAwB,CAAC;AACvC,cAAc,8BAA8B,CAAC;AAC7C,cAAc,kBAAkB,CAAC;AACjC,cAAc,YAAY,CAAC"}
@@ -0,0 +1,6 @@
1
+ export * from './evaluator.js';
2
+ export * from './langium-evaluator.js';
3
+ export * from './edit-distance-evaluator.js';
4
+ export * from './eval-matrix.js';
5
+ export * from './chart.js';
6
+ //# sourceMappingURL=index.js.map
@@ -0,0 +1 @@
1
+ {"version":3,"file":"index.js","sourceRoot":"","sources":["../../src/evaluator/index.ts"],"names":[],"mappings":"AAAA,cAAc,gBAAgB,CAAC;AAC/B,cAAc,wBAAwB,CAAC;AACvC,cAAc,8BAA8B,CAAC;AAC7C,cAAc,kBAAkB,CAAC;AACjC,cAAc,YAAY,CAAC"}
@@ -0,0 +1,55 @@
1
+ /**
2
+ * Base Langium DSL validator (taps into Langium's validator messages to provide better results)
3
+ */
4
+ import { LangiumServices } from "langium/lsp";
5
+ import { Diagnostic } from "vscode-languageserver-types";
6
+ import { Evaluator, EvaluatorResult, EvaluatorResultData } from "./evaluator.js";
7
+ /**
8
+ * Langium-specific evaluator result data
9
+ */
10
+ export interface LangiumEvaluatorResultData extends EvaluatorResultData {
11
+ /**
12
+ * Number of validation failures
13
+ */
14
+ failures: number;
15
+ /**
16
+ * Number of errors
17
+ */
18
+ errors: number;
19
+ /**
20
+ * Number of warnings
21
+ */
22
+ warnings: number;
23
+ /**
24
+ * Number of infos
25
+ */
26
+ infos: number;
27
+ /**
28
+ * Number of hints
29
+ */
30
+ hints: number;
31
+ /**
32
+ * Number of unassigned diagnostics
33
+ */
34
+ unassigned: number;
35
+ /**
36
+ * Length of the response in chars
37
+ */
38
+ response_length: number;
39
+ /**
40
+ * Raw diagnostic data, same which is used to compute the other values above
41
+ */
42
+ diagnostics: Diagnostic[];
43
+ }
44
+ export declare class LangiumEvaluator<T extends LangiumServices> extends Evaluator {
45
+ /**
46
+ * Services to use for evaluation
47
+ */
48
+ protected services: T;
49
+ constructor(services: T);
50
+ /**
51
+ * Validate an agent response as if it's a langium program. If we can parse it, we attempt to validate it.
52
+ */
53
+ evaluate(response: string): Promise<Partial<EvaluatorResult>>;
54
+ }
55
+ //# sourceMappingURL=langium-evaluator.d.ts.map
@@ -0,0 +1 @@
1
+ {"version":3,"file":"langium-evaluator.d.ts","sourceRoot":"","sources":["../../src/evaluator/langium-evaluator.ts"],"names":[],"mappings":"AAAA;;GAEG;AAEH,OAAO,EAAE,eAAe,EAAE,MAAM,aAAa,CAAC;AAC9C,OAAO,EAAE,UAAU,EAAE,MAAM,6BAA6B,CAAC;AACzD,OAAO,EAAE,SAAS,EAAE,eAAe,EAAE,mBAAmB,EAAE,MAAM,gBAAgB,CAAC;AAGjF;;GAEG;AACH,MAAM,WAAW,0BAA2B,SAAQ,mBAAmB;IAEnE;;OAEG;IACH,QAAQ,EAAE,MAAM,CAAC;IAEjB;;OAEG;IACH,MAAM,EAAE,MAAM,CAAC;IAEf;;OAEG;IACH,QAAQ,EAAE,MAAM,CAAC;IAEjB;;OAEG;IACH,KAAK,EAAE,MAAM,CAAC;IAEd;;OAEG;IACH,KAAK,EAAE,MAAM,CAAC;IAEd;;OAEG;IACH,UAAU,EAAE,MAAM,CAAC;IAEnB;;OAEG;IACH,eAAe,EAAE,MAAM,CAAC;IAExB;;OAEG;IACH,WAAW,EAAE,UAAU,EAAE,CAAC;CAC7B;AAED,qBAAa,gBAAgB,CAAC,CAAC,SAAS,eAAe,CAAE,SAAQ,SAAS;IAEtE;;OAEG;IACH,SAAS,CAAC,QAAQ,EAAE,CAAC,CAAC;gBAEV,QAAQ,EAAE,CAAC;IAKvB;;OAEG;IACG,QAAQ,CAAC,QAAQ,EAAE,MAAM,GAAG,OAAO,CAAC,OAAO,CAAC,eAAe,CAAC,CAAC;CAqEtE"}
@@ -0,0 +1,78 @@
1
+ /**
2
+ * Base Langium DSL validator (taps into Langium's validator messages to provide better results)
3
+ */
4
+ import { Evaluator } from "./evaluator.js";
5
+ import { URI } from "langium";
6
+ export class LangiumEvaluator extends Evaluator {
7
+ constructor(services) {
8
+ super();
9
+ this.services = services;
10
+ }
11
+ /**
12
+ * Validate an agent response as if it's a langium program. If we can parse it, we attempt to validate it.
13
+ */
14
+ async evaluate(response) {
15
+ if (response.includes('```')) {
16
+ // take the first code block instead, if present (assuming it's a langium grammar)
17
+ const codeBlock = response.split(/```[a-z-]*/)[1];
18
+ response = codeBlock;
19
+ }
20
+ const doc = this.services.shared.workspace.LangiumDocumentFactory.fromString(response, URI.parse('memory://test.langium'));
21
+ try {
22
+ await this.services.shared.workspace.DocumentBuilder.build([doc], { validation: true });
23
+ const validationResults = doc.diagnostics ?? [];
24
+ // count the number of each type of diagnostic
25
+ let evalData = {
26
+ failures: 0,
27
+ errors: 0,
28
+ warnings: 0,
29
+ infos: 0,
30
+ hints: 0,
31
+ unassigned: 0,
32
+ // include length of the response for checking
33
+ response_length: response.length,
34
+ // include the diagnostics for debugging if desired
35
+ diagnostics: validationResults
36
+ };
37
+ for (const diagnostic of validationResults) {
38
+ if (diagnostic.severity) {
39
+ switch (diagnostic.severity) {
40
+ case 1:
41
+ evalData.errors++;
42
+ break;
43
+ case 2:
44
+ evalData.warnings++;
45
+ break;
46
+ case 3:
47
+ evalData.infos++;
48
+ break;
49
+ case 4:
50
+ evalData.hints++;
51
+ break;
52
+ default:
53
+ evalData.unassigned++;
54
+ break;
55
+ }
56
+ }
57
+ }
58
+ return {
59
+ data: evalData
60
+ };
61
+ }
62
+ catch (e) {
63
+ console.error('Error during evaluation: ', e);
64
+ return {
65
+ data: {
66
+ failures: 1,
67
+ errors: 0,
68
+ warnings: 0,
69
+ infos: 0,
70
+ hints: 0,
71
+ unassigned: 0,
72
+ response_length: response.length
73
+ }
74
+ };
75
+ }
76
+ }
77
+ }
78
+ //# sourceMappingURL=langium-evaluator.js.map
@@ -0,0 +1 @@
1
+ {"version":3,"file":"langium-evaluator.js","sourceRoot":"","sources":["../../src/evaluator/langium-evaluator.ts"],"names":[],"mappings":"AAAA;;GAEG;AAIH,OAAO,EAAE,SAAS,EAAwC,MAAM,gBAAgB,CAAC;AACjF,OAAO,EAAE,GAAG,EAAE,MAAM,SAAS,CAAC;AAgD9B,MAAM,OAAO,gBAA4C,SAAQ,SAAS;IAOtE,YAAY,QAAW;QACnB,KAAK,EAAE,CAAC;QACR,IAAI,CAAC,QAAQ,GAAG,QAAQ,CAAC;IAC7B,CAAC;IAED;;OAEG;IACH,KAAK,CAAC,QAAQ,CAAC,QAAgB;QAE3B,IAAI,QAAQ,CAAC,QAAQ,CAAC,KAAK,CAAC,EAAE,CAAC;YAC3B,kFAAkF;YAClF,MAAM,SAAS,GAAG,QAAQ,CAAC,KAAK,CAAC,YAAY,CAAC,CAAC,CAAC,CAAC,CAAC;YAClD,QAAQ,GAAG,SAAS,CAAC;QACzB,CAAC;QAED,MAAM,GAAG,GAAG,IAAI,CAAC,QAAQ,CAAC,MAAM,CAAC,SAAS,CAAC,sBAAsB,CAAC,UAAU,CAAC,QAAQ,EAAE,GAAG,CAAC,KAAK,CAAC,uBAAuB,CAAC,CAAC,CAAC;QAE3H,IAAI,CAAC;YACD,MAAM,IAAI,CAAC,QAAQ,CAAC,MAAM,CAAC,SAAS,CAAC,eAAe,CAAC,KAAK,CAAC,CAAC,GAAG,CAAC,EAAE,EAAE,UAAU,EAAE,IAAI,EAAE,CAAC,CAAC;YACxF,MAAM,iBAAiB,GAAG,GAAG,CAAC,WAAW,IAAI,EAAE,CAAC;YAEhD,8CAA8C;YAC9C,IAAI,QAAQ,GAA+B;gBACvC,QAAQ,EAAE,CAAC;gBACX,MAAM,EAAE,CAAC;gBACT,QAAQ,EAAE,CAAC;gBACX,KAAK,EAAE,CAAC;gBACR,KAAK,EAAE,CAAC;gBACR,UAAU,EAAE,CAAC;gBACb,8CAA8C;gBAC9C,eAAe,EAAE,QAAQ,CAAC,MAAM;gBAChC,mDAAmD;gBACnD,WAAW,EAAE,iBAAiB;aACjC,CAAC;YAEF,KAAK,MAAM,UAAU,IAAI,iBAAiB,EAAE,CAAC;gBACzC,IAAI,UAAU,CAAC,QAAQ,EAAE,CAAC;oBACtB,QAAQ,UAAU,CAAC,QAAQ,EAAE,CAAC;wBAC1B,KAAK,CAAC;4BACF,QAAQ,CAAC,MAAM,EAAE,CAAC;4BAClB,MAAM;wBACV,KAAK,CAAC;4BACF,QAAQ,CAAC,QAAQ,EAAE,CAAC;4BACpB,MAAM;wBACV,KAAK,CAAC;4BACF,QAAQ,CAAC,KAAK,EAAE,CAAC;4BACjB,MAAM;wBACV,KAAK,CAAC;4BACF,QAAQ,CAAC,KAAK,EAAE,CAAC;4BACjB,MAAM;wBACV;4BACI,QAAQ,CAAC,UAAU,EAAE,CAAC;4BACtB,MAAM;oBACd,CAAC;gBACL,CAAC;YACL,CAAC;YAED,OAAO;gBACH,IAAI,EAAE,QAAQ;aACjB,CAAC;QAEN,CAAC;QAAC,OAAO,CAAC,EAAE,CAAC;YACT,OAAO,CAAC,KAAK,CAAC,2BAA2B,EAAE,CAAC,CAAC,CAAC;YAC9C,OAAO;gBACH,IAAI,EAAE;oBACF,QAAQ,EAAE,CAAC;oBACX,MAAM,EAAE,CAAC;oBACT,QAAQ,EAAE,CAAC;oBACX,KAAK,EAAE,CAAC;oBACR,KAAK,EAAE,CAAC;oBACR,UAAU,EAAE,CAAC;oBACb,eAAe,EAAE,QAAQ,CAAC,MAAM;iBACL;aAClC,CAAC;QACN,CAAC;IACL,CAAC;CACJ"}
@@ -0,0 +1,3 @@
1
+ export * from './evaluator/index.js';
2
+ export * from './splitter/index.js';
3
+ //# sourceMappingURL=index.d.ts.map
@@ -0,0 +1 @@
1
+ {"version":3,"file":"index.d.ts","sourceRoot":"","sources":["../src/index.ts"],"names":[],"mappings":"AAAA,cAAc,sBAAsB,CAAC;AACrC,cAAc,qBAAqB,CAAC"}
package/dist/index.js ADDED
@@ -0,0 +1,3 @@
1
+ export * from './evaluator/index.js';
2
+ export * from './splitter/index.js';
3
+ //# sourceMappingURL=index.js.map
@@ -0,0 +1 @@
1
+ {"version":3,"file":"index.js","sourceRoot":"","sources":["../src/index.ts"],"names":[],"mappings":"AAAA,cAAc,sBAAsB,CAAC;AACrC,cAAc,qBAAqB,CAAC"}
@@ -0,0 +1,2 @@
1
+ export * from './splitter.js';
2
+ //# sourceMappingURL=index.d.ts.map
@@ -0,0 +1 @@
1
+ {"version":3,"file":"index.d.ts","sourceRoot":"","sources":["../../src/splitter/index.ts"],"names":[],"mappings":"AAAA,cAAc,eAAe,CAAC"}
@@ -0,0 +1,2 @@
1
+ export * from './splitter.js';
2
+ //# sourceMappingURL=index.js.map
@@ -0,0 +1 @@
1
+ {"version":3,"file":"index.js","sourceRoot":"","sources":["../../src/splitter/index.ts"],"names":[],"mappings":"AAAA,cAAc,eAAe,CAAC"}
@@ -0,0 +1,21 @@
1
+ import { AstNode } from "langium";
2
+ import { LangiumServices } from "langium/lsp";
3
+ interface SplitterOptions {
4
+ /**
5
+ * List of comment rule names to include in the chunk.
6
+ * If not provided comments are ignored.
7
+ * Default: ['ML_COMMENT', 'SL_COMMENT']
8
+ */
9
+ commentRuleNames?: string[];
10
+ }
11
+ /**
12
+ * Splitter function that splits a single text document into 1 or more chunks based on a splitting strategy
13
+ * @param document - The text document to be split.
14
+ * @param nodePredicates - The predicates to determine the nodes for splitting.
15
+ * @param services - The Langium services used for parsing the document.
16
+ * @param options - The splitter configuration. See {@link SplitterOptions}.
17
+ * @returns The chunks of the split document.
18
+ */
19
+ export declare function splitByNode(document: string, nodePredicates: Array<(node: AstNode) => boolean> | ((node: AstNode) => boolean), services: LangiumServices, options?: SplitterOptions): string[];
20
+ export {};
21
+ //# sourceMappingURL=splitter.d.ts.map
@@ -0,0 +1 @@
1
+ {"version":3,"file":"splitter.d.ts","sourceRoot":"","sources":["../../src/splitter/splitter.ts"],"names":[],"mappings":"AAAA,OAAO,EAAE,OAAO,EAAiB,MAAM,SAAS,CAAC;AACjD,OAAO,EAAE,eAAe,EAAE,MAAM,aAAa,CAAC;AAG9C,UAAU,eAAe;IACrB;;;;MAIE;IACF,gBAAgB,CAAC,EAAE,MAAM,EAAE,CAAA;CAC9B;AAED;;;;;;;GAOG;AACH,wBAAgB,WAAW,CACvB,QAAQ,EAAE,MAAM,EAChB,cAAc,EAAE,KAAK,CAAC,CAAC,IAAI,EAAE,OAAO,KAAK,OAAO,CAAC,GAAG,CAAC,CAAC,IAAI,EAAE,OAAO,KAAK,OAAO,CAAC,EAChF,QAAQ,EAAE,eAAe,EACzB,OAAO,GAAE,eAAoE,GAAG,MAAM,EAAE,CA2D3F"}
@@ -0,0 +1,59 @@
1
+ import { CstUtils, URI } from "langium";
2
+ import { AstUtils } from 'langium';
3
+ /**
4
+ * Splitter function that splits a single text document into 1 or more chunks based on a splitting strategy
5
+ * @param document - The text document to be split.
6
+ * @param nodePredicates - The predicates to determine the nodes for splitting.
7
+ * @param services - The Langium services used for parsing the document.
8
+ * @param options - The splitter configuration. See {@link SplitterOptions}.
9
+ * @returns The chunks of the split document.
10
+ */
11
+ export function splitByNode(document, nodePredicates, services, options = { commentRuleNames: ['ML_COMMENT', 'SL_COMMENT'] }) {
12
+ // 1. parse the document into an AST
13
+ // 2. verify that we parsed the document correctly
14
+ // 3. split the document into chunks based on the node
15
+ // 4. using the corresponding CST offsets from those nodes, split the document into chunks
16
+ // 5. return the chunks
17
+ if (document.trim() === '') {
18
+ return [];
19
+ }
20
+ const langiumDoc = services.shared.workspace.LangiumDocumentFactory.fromString(document, URI.parse('memory://document.langium'));
21
+ // not checking for lexer or parser errors here...
22
+ const txtDoc = langiumDoc.textDocument;
23
+ const chunks = [];
24
+ const predicates = Array.isArray(nodePredicates) ? nodePredicates : [nodePredicates];
25
+ // selectively stream nodes from the ast in langium
26
+ const stream = AstUtils.streamAst(langiumDoc.parseResult.value);
27
+ for (const node of stream) {
28
+ if (predicates.some(p => p(node))) {
29
+ // get the starting point of this node
30
+ let start = node.$cstNode?.range.start;
31
+ if (options?.commentRuleNames) {
32
+ // include comments in the chunk
33
+ const cstNode = node.$cstNode;
34
+ const commentNode = CstUtils.findCommentNode(cstNode, options.commentRuleNames);
35
+ if (commentNode) {
36
+ // adjust start to include the comment
37
+ start = commentNode.range.start;
38
+ }
39
+ }
40
+ const end = node.$cstNode?.range.end;
41
+ // add a chunk from the last offset to the start of this node
42
+ const chunk = txtDoc.getText({
43
+ start: {
44
+ line: start?.line || 0,
45
+ character: start?.character || 0
46
+ },
47
+ end: {
48
+ line: end?.line || 0,
49
+ character: end?.character || 0
50
+ }
51
+ });
52
+ if (chunk.trim().length > 0) {
53
+ chunks.push(chunk);
54
+ }
55
+ }
56
+ }
57
+ return chunks;
58
+ }
59
+ //# sourceMappingURL=splitter.js.map
@@ -0,0 +1 @@
1
+ {"version":3,"file":"splitter.js","sourceRoot":"","sources":["../../src/splitter/splitter.ts"],"names":[],"mappings":"AAAA,OAAO,EAAW,QAAQ,EAAE,GAAG,EAAE,MAAM,SAAS,CAAC;AAEjD,OAAO,EAAE,QAAQ,EAAE,MAAM,SAAS,CAAC;AAWnC;;;;;;;GAOG;AACH,MAAM,UAAU,WAAW,CACvB,QAAgB,EAChB,cAAgF,EAChF,QAAyB,EACzB,UAA2B,EAAE,gBAAgB,EAAE,CAAC,YAAY,EAAE,YAAY,CAAC,EAAE;IAC7E,oCAAoC;IACpC,kDAAkD;IAClD,sDAAsD;IACtD,0FAA0F;IAC1F,uBAAuB;IAEvB,IAAI,QAAQ,CAAC,IAAI,EAAE,KAAK,EAAE,EAAE,CAAC;QACzB,OAAO,EAAE,CAAC;IACd,CAAC;IAED,MAAM,UAAU,GAAG,QAAQ,CAAC,MAAM,CAAC,SAAS,CAAC,sBAAsB,CAAC,UAAU,CAAC,QAAQ,EAAE,GAAG,CAAC,KAAK,CAAC,2BAA2B,CAAC,CAAC,CAAC;IAEjI,kDAAkD;IAElD,MAAM,MAAM,GAAG,UAAU,CAAC,YAAY,CAAC;IAEvC,MAAM,MAAM,GAAa,EAAE,CAAC;IAE5B,MAAM,UAAU,GAAG,KAAK,CAAC,OAAO,CAAC,cAAc,CAAC,CAAC,CAAC,CAAC,cAAc,CAAC,CAAC,CAAC,CAAC,cAAc,CAAC,CAAC;IAErF,mDAAmD;IACnD,MAAM,MAAM,GAAG,QAAQ,CAAC,SAAS,CAAC,UAAU,CAAC,WAAW,CAAC,KAAK,CAAC,CAAC;IAChE,KAAK,MAAM,IAAI,IAAI,MAAM,EAAE,CAAC;QACxB,IAAI,UAAU,CAAC,IAAI,CAAC,CAAC,CAAC,EAAE,CAAC,CAAC,CAAC,IAAI,CAAC,CAAC,EAAE,CAAC;YAChC,sCAAsC;YACtC,IAAI,KAAK,GAAG,IAAI,CAAC,QAAQ,EAAE,KAAK,CAAC,KAAK,CAAC;YAEvC,IAAI,OAAO,EAAE,gBAAgB,EAAE,CAAC;gBAC5B,gCAAgC;gBAChC,MAAM,OAAO,GAAG,IAAI,CAAC,QAAQ,CAAC;gBAC9B,MAAM,WAAW,GAAG,QAAQ,CAAC,eAAe,CAAC,OAAO,EAAE,OAAO,CAAC,gBAAgB,CAAC,CAAC;gBAChF,IAAI,WAAW,EAAE,CAAC;oBACd,sCAAsC;oBACtC,KAAK,GAAG,WAAW,CAAC,KAAK,CAAC,KAAK,CAAC;gBACpC,CAAC;YACL,CAAC;YAED,MAAM,GAAG,GAAG,IAAI,CAAC,QAAQ,EAAE,KAAK,CAAC,GAAG,CAAC;YACrC,6DAA6D;YAC7D,MAAM,KAAK,GAAG,MAAM,CAAC,OAAO,CAAC;gBACzB,KAAK,EAAE;oBACH,IAAI,EAAE,KAAK,EAAE,IAAI,IAAI,CAAC;oBACtB,SAAS,EAAE,KAAK,EAAE,SAAS,IAAI,CAAC;iBACnC;gBACD,GAAG,EAAE;oBACD,IAAI,EAAE,GAAG,EAAE,IAAI,IAAI,CAAC;oBACpB,SAAS,EAAE,GAAG,EAAE,SAAS,IAAI,CAAC;iBACjC;aACJ,CAAC,CAAC;YAEH,IAAI,KAAK,CAAC,IAAI,EAAE,CAAC,MAAM,GAAG,CAAC,EAAE,CAAC;gBAC1B,MAAM,CAAC,IAAI,CAAC,KAAK,CAAC,CAAC;YACvB,CAAC;QACL,CAAC;IACL,CAAC;IAED,OAAO,MAAM,CAAC;AAElB,CAAC"}
package/package.json ADDED
@@ -0,0 +1,61 @@
1
+ {
2
+ "name": "langium-ai-tools",
3
+ "version": "0.0.1",
4
+ "description": "Tooling for building AI Applications that leverage Langium DSLs",
5
+ "repository": {
6
+ "type": "git",
7
+ "url": "git+https://github.com/eclipse-langium/langium-ai.git",
8
+ "directory": "packages/langium-ai-tools"
9
+ },
10
+ "bugs": "https://github.com/eclipse-langium/langium-ai/issues",
11
+ "type": "module",
12
+ "main": "dist/index.js",
13
+ "private": false,
14
+ "files": [
15
+ "dist"
16
+ ],
17
+ "exports": {
18
+ ".": {
19
+ "import": "./dist/index.js",
20
+ "types": "./dist/index.d.ts"
21
+ },
22
+ "./splitter": {
23
+ "import": "./dist/splitter/index.js",
24
+ "types": "./dist/splitter/index.d.ts"
25
+ },
26
+ "./evaluator": {
27
+ "import": "./dist/evaluator/index.js",
28
+ "types": "./dist/evaluator/index.d.ts"
29
+ }
30
+ },
31
+ "scripts": {
32
+ "build": "tsc",
33
+ "watch": "tsc -w",
34
+ "test": "vitest run",
35
+ "clean": "rimraf ./dist"
36
+ },
37
+ "author": {
38
+ "name": "TypeFox",
39
+ "url": "https://www.typefox.io"
40
+ },
41
+ "keywords": [
42
+ "langium",
43
+ "ai",
44
+ "tools",
45
+ "llm"
46
+ ],
47
+ "license": "MIT",
48
+ "dependencies": {
49
+ "langium": "~3.4.0",
50
+ "levenshtein-edit-distance": "^3.0.1"
51
+ },
52
+ "volta": {
53
+ "node": "20.10.0",
54
+ "npm": "10.2.3"
55
+ },
56
+ "devDependencies": {
57
+ "typescript": "^5.4.5",
58
+ "vitest": "^3.0.9",
59
+ "rimraf": "^6.0.1"
60
+ }
61
+ }