langium-ai-tools 0.0.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +83 -0
- package/dist/evaluator/chart.d.ts +32 -0
- package/dist/evaluator/chart.d.ts.map +1 -0
- package/dist/evaluator/chart.js +218 -0
- package/dist/evaluator/chart.js.map +1 -0
- package/dist/evaluator/edit-distance-evaluator.d.ts +8 -0
- package/dist/evaluator/edit-distance-evaluator.d.ts.map +1 -0
- package/dist/evaluator/edit-distance-evaluator.js +13 -0
- package/dist/evaluator/edit-distance-evaluator.js.map +1 -0
- package/dist/evaluator/eval-matrix.d.ts +95 -0
- package/dist/evaluator/eval-matrix.d.ts.map +1 -0
- package/dist/evaluator/eval-matrix.js +87 -0
- package/dist/evaluator/eval-matrix.js.map +1 -0
- package/dist/evaluator/evaluator.d.ts +64 -0
- package/dist/evaluator/evaluator.d.ts.map +1 -0
- package/dist/evaluator/evaluator.js +162 -0
- package/dist/evaluator/evaluator.js.map +1 -0
- package/dist/evaluator/index.d.ts +6 -0
- package/dist/evaluator/index.d.ts.map +1 -0
- package/dist/evaluator/index.js +6 -0
- package/dist/evaluator/index.js.map +1 -0
- package/dist/evaluator/langium-evaluator.d.ts +55 -0
- package/dist/evaluator/langium-evaluator.d.ts.map +1 -0
- package/dist/evaluator/langium-evaluator.js +78 -0
- package/dist/evaluator/langium-evaluator.js.map +1 -0
- package/dist/index.d.ts +3 -0
- package/dist/index.d.ts.map +1 -0
- package/dist/index.js +3 -0
- package/dist/index.js.map +1 -0
- package/dist/splitter/index.d.ts +2 -0
- package/dist/splitter/index.d.ts.map +1 -0
- package/dist/splitter/index.js +2 -0
- package/dist/splitter/index.js.map +1 -0
- package/dist/splitter/splitter.d.ts +21 -0
- package/dist/splitter/splitter.d.ts.map +1 -0
- package/dist/splitter/splitter.js +59 -0
- package/dist/splitter/splitter.js.map +1 -0
- package/package.json +61 -0
package/README.md
ADDED
|
@@ -0,0 +1,83 @@
|
|
|
1
|
+
# Langium AI Tools
|
|
2
|
+
|
|
3
|
+
## Overview
|
|
4
|
+
|
|
5
|
+
This project provides core tools that make it easier to build AI applications for Langium DSLs. These core tools help to solve the following problems around building AI applications by making it easier to:
|
|
6
|
+
|
|
7
|
+
- Determine which models work well for your DSL
|
|
8
|
+
- Evaluate which changes to your tooling actually improve your generation results
|
|
9
|
+
- How to process DSL documents in a way that makes sense for your DSL & target application
|
|
10
|
+
|
|
11
|
+
To solve these problems this package provides:
|
|
12
|
+
|
|
13
|
+
- Splitting Support: Using your DSL's parser to make it easier to pre-process documents before ingest (such as into a vector DB)
|
|
14
|
+
- Training & Evaluation Support: Assess the output of your model + RAG + whatever else you have in your stack with regards to a structured input/output evaluation phase.
|
|
15
|
+
- Constraint Support: Synthesize BNF-style grammars from your Langium grammar, which can be used to control the token output from an LLM to conform to your DSL's expected structure (this feature has been added directly into the **langium-cli** itself, as it has wider general applications).
|
|
16
|
+
|
|
17
|
+
What's also important is what is not provided:
|
|
18
|
+
- *We don't choose your model for you.* We believe this is your choice, and we don't want to presume we know best or lock you in. All we assume is that you have a model (or stack) that we can use. For tooling that leverages models directly, we'll be providing a separate package under Langium AI that will be separate from the core here.
|
|
19
|
+
- *We don't choose your stack for you.* There are many excellent choices for hosting providers, databases, caches, and other supporting services (local & remote). There's so many, and they change so often, that we decided it was best to not assume what works here, and rather support preparing information for whatever stack you choose.
|
|
20
|
+
|
|
21
|
+
LLMs (and transformers in general), are evolving quite rapidly. With this approach, these tools help you build your own specific approach, whilst letting you keep up with the latest and greatest in model developments.
|
|
22
|
+
|
|
23
|
+
## Installation
|
|
24
|
+
|
|
25
|
+
You can install Langium AI Tools by running:
|
|
26
|
+
|
|
27
|
+
```sh
|
|
28
|
+
npm i --save langium-ai-tools
|
|
29
|
+
```
|
|
30
|
+
|
|
31
|
+
## Usage
|
|
32
|
+
|
|
33
|
+
### Splitting
|
|
34
|
+
|
|
35
|
+
Langium AI Tools presents various supporting behaviors for splitting.
|
|
36
|
+
|
|
37
|
+
The simplest approach is to, of course, not split at all. For smaller DSL programs this may be perfectly viable, but in all likelihood you're reading this to handle medium to large programs -- or a large quantity of smaller programs with overlapping constructs.
|
|
38
|
+
|
|
39
|
+
In most cases you can split by specific AST nodes. This will map directly to those types that are generated by your Langium grammar rules, and makes it easy to mark how you want to delineate.
|
|
40
|
+
|
|
41
|
+
### Evaluation
|
|
42
|
+
|
|
43
|
+
Regardless of how you've sourced your model, you'll need a metric for determining the quality of your output.
|
|
44
|
+
|
|
45
|
+
For Langium DSLs, we provide an series of *evaluator* utilities to help in assessing the correctness of DSL output.
|
|
46
|
+
|
|
47
|
+
It's important to point out that evaluations are *not* tests, instead this is more similar to [OpenAI's evals framework](https://github.com/openai/evals). The idea is that we're grading or scoring outputs with regards to an expected output from a known input. This is a simple but effective approach to determining if your model is generally doing what you expect it to in a structured way, and *not* doing something else as well.
|
|
48
|
+
|
|
49
|
+
Take the following evaluator for example. Let's assume you have [Ollama](https://ollama.com/) running locally, and the [ollama-js](https://github.com/ollama/ollama-js) package installed. From a given base model you can define evaluatiosn like so.
|
|
50
|
+
|
|
51
|
+
```ts
|
|
52
|
+
import { Evaluator, EvaluatorScore } from 'langium-ai-tools/evaluator';
|
|
53
|
+
import ollama from 'ollama';
|
|
54
|
+
|
|
55
|
+
// get your language's services
|
|
56
|
+
const services = createMyDSLServices(EmptyFileSystem).MyDSL;
|
|
57
|
+
|
|
58
|
+
// define an evaluator using your language's services
|
|
59
|
+
// this effectively uses your existing parser & validations to 'grade' the response
|
|
60
|
+
const evaluator = new LangiumEvaluator(services);
|
|
61
|
+
|
|
62
|
+
// make some prompt
|
|
63
|
+
const response = await ollama.chat({
|
|
64
|
+
'llama3.2',
|
|
65
|
+
[{
|
|
66
|
+
role: 'user',
|
|
67
|
+
content: 'Write me a hello world program written in MyDSL.'
|
|
68
|
+
}]
|
|
69
|
+
});
|
|
70
|
+
|
|
71
|
+
const es: EvaluatorScore = evaluator.evaluate(response.message.content);
|
|
72
|
+
|
|
73
|
+
// print out your score!
|
|
74
|
+
console.log(es);
|
|
75
|
+
```
|
|
76
|
+
|
|
77
|
+
You can also define custom evaluators that are more tuned to the needs of your DSL. This could be handling diagnostics in a very specific fashion, extracting code out of the response itself to check, using an evaluation model to grade the response, or using a combination of techniques to get a more accurate score for your model's output.
|
|
78
|
+
|
|
79
|
+
In general we stick to focusing on what Langium can do to help with evaluation, but leave the opportunity open for you to extend, supplement, or modify evaluation logic as you see fit.
|
|
80
|
+
|
|
81
|
+
## Contributing
|
|
82
|
+
|
|
83
|
+
If you want to help feel free to open an issue or a PR. As a general note we're open to accept changes that focus on improving how we can support AI application development for Langium DSLs. But we don't want to provide explicit bindings to actual services/providers at this time, such as LLamaIndex, Ollama, LangChain, or others. Similarly this package doesn't provide direct bindings for AI providers such as OpenAI and Anthropic here. Instead these changes will go into a separate package under Langium AI that is intended for this purpose.
|
|
@@ -0,0 +1,32 @@
|
|
|
1
|
+
/**
|
|
2
|
+
* Generates & exports an HTML radar chart report using plotly JS
|
|
3
|
+
*/
|
|
4
|
+
import { EvaluatorResult, EvaluatorResultData } from "./evaluator.js";
|
|
5
|
+
/**
|
|
6
|
+
* Generates an HTML radar chart from the provided data
|
|
7
|
+
* @param evalResults Evaluator results to chart
|
|
8
|
+
* @param dest Output file to write the chart to
|
|
9
|
+
* @param rFunc polar r function, used to extract the r values from the data
|
|
10
|
+
* @param theta theta values, i.e. the property names to use for the radar chart
|
|
11
|
+
*/
|
|
12
|
+
export declare function generateRadarChart<T extends EvaluatorResultData>(chartName: string, evalResults: EvaluatorResult[], dest: string, rFunc: (d: T, metadata: Record<string, unknown>) => Record<string, unknown>, preprocess?: (arr: EvaluatorResult[]) => EvaluatorResult[]): void;
|
|
13
|
+
export declare function generateHistogram<T extends EvaluatorResultData>(chartName: string, evalResults: EvaluatorResult[], dest: string, dataFunc: (d: T, metadata: Record<string, unknown>) => Record<string, unknown>, preprocess?: (arr: EvaluatorResult[]) => EvaluatorResult[]): void;
|
|
14
|
+
/**
|
|
15
|
+
* Normalizes all numeric data entries in results (while also retaining non-numeric entries)
|
|
16
|
+
*/
|
|
17
|
+
export declare function normalizeData(data: EvaluatorResult[]): EvaluatorResult[];
|
|
18
|
+
/**
|
|
19
|
+
* Generates a historical chart from the provided data, showing runners along the X, and their performance over time along the X axis
|
|
20
|
+
* @param chartName
|
|
21
|
+
* @param folder
|
|
22
|
+
* @param dest
|
|
23
|
+
* @param dataFunc
|
|
24
|
+
* @param options
|
|
25
|
+
*/
|
|
26
|
+
export declare function generateHistoricalChart<T extends EvaluatorResultData>(chartName: string, folder: string, dest: string, dataFunc: (d: T, metadata: Record<string, unknown>) => number, options?: {
|
|
27
|
+
preprocess?: (arr: EvaluatorResult[]) => EvaluatorResult[];
|
|
28
|
+
filter?: (r: EvaluatorResult) => boolean;
|
|
29
|
+
take?: number;
|
|
30
|
+
chartType?: string;
|
|
31
|
+
}): void;
|
|
32
|
+
//# sourceMappingURL=chart.d.ts.map
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
{"version":3,"file":"chart.d.ts","sourceRoot":"","sources":["../../src/evaluator/chart.ts"],"names":[],"mappings":"AAAA;;GAEG;AAEH,OAAO,EAAE,eAAe,EAAE,mBAAmB,EAAoC,MAAM,gBAAgB,CAAC;AAIxG;;;;;;GAMG;AACH,wBAAgB,kBAAkB,CAAC,CAAC,SAAS,mBAAmB,EAC5D,SAAS,EAAE,MAAM,EACjB,WAAW,EAAE,eAAe,EAAE,EAC9B,IAAI,EAAE,MAAM,EACZ,KAAK,EAAE,CAAC,CAAC,EAAE,CAAC,EAAE,QAAQ,EAAE,MAAM,CAAC,MAAM,EAAE,OAAO,CAAC,KAAK,MAAM,CAAC,MAAM,EAAE,OAAO,CAAC,EAC3E,UAAU,CAAC,EAAE,CAAC,GAAG,EAAE,eAAe,EAAE,KAAK,eAAe,EAAE,GAC3D,IAAI,CAsDN;AAED,wBAAgB,iBAAiB,CAAC,CAAC,SAAS,mBAAmB,EAC3D,SAAS,EAAE,MAAM,EACjB,WAAW,EAAE,eAAe,EAAE,EAC9B,IAAI,EAAE,MAAM,EACZ,QAAQ,EAAE,CAAC,CAAC,EAAE,CAAC,EAAE,QAAQ,EAAE,MAAM,CAAC,MAAM,EAAE,OAAO,CAAC,KAAK,MAAM,CAAC,MAAM,EAAE,OAAO,CAAC,EAC9E,UAAU,CAAC,EAAE,CAAC,GAAG,EAAE,eAAe,EAAE,KAAK,eAAe,EAAE,QAgD7D;AAID;;GAEG;AACH,wBAAgB,aAAa,CAAC,IAAI,EAAE,eAAe,EAAE,GAAG,eAAe,EAAE,CA2BxE;AAED;;;;;;;GAOG;AACH,wBAAgB,uBAAuB,CAAC,CAAC,SAAS,mBAAmB,EACjE,SAAS,EAAE,MAAM,EACjB,MAAM,EAAE,MAAM,EACd,IAAI,EAAE,MAAM,EACZ,QAAQ,EAAE,CAAC,CAAC,EAAE,CAAC,EAAE,QAAQ,EAAE,MAAM,CAAC,MAAM,EAAE,OAAO,CAAC,KAAK,MAAM,EAC7D,OAAO,CAAC,EAAE;IACN,UAAU,CAAC,EAAE,CAAC,GAAG,EAAE,eAAe,EAAE,KAAK,eAAe,EAAE,CAAC;IAC3D,MAAM,CAAC,EAAE,CAAC,CAAC,EAAE,eAAe,KAAK,OAAO,CAAC;IACzC,IAAI,CAAC,EAAE,MAAM,CAAC;IACd,SAAS,CAAC,EAAE,MAAM,CAAA;CACrB,QA4FJ"}
|
|
@@ -0,0 +1,218 @@
|
|
|
1
|
+
/**
|
|
2
|
+
* Generates & exports an HTML radar chart report using plotly JS
|
|
3
|
+
*/
|
|
4
|
+
import { averageAcrossRunners, loadReport } from "./evaluator.js";
|
|
5
|
+
import { writeFileSync, readdirSync } from 'fs';
|
|
6
|
+
import * as path from 'path';
|
|
7
|
+
/**
|
|
8
|
+
* Generates an HTML radar chart from the provided data
|
|
9
|
+
* @param evalResults Evaluator results to chart
|
|
10
|
+
* @param dest Output file to write the chart to
|
|
11
|
+
* @param rFunc polar r function, used to extract the r values from the data
|
|
12
|
+
* @param theta theta values, i.e. the property names to use for the radar chart
|
|
13
|
+
*/
|
|
14
|
+
export function generateRadarChart(chartName, evalResults, dest, rFunc, preprocess) {
|
|
15
|
+
// process results first to average out data (either using the user supplied function, or defaulting to average across runners)
|
|
16
|
+
const processedResults = preprocess ? preprocess(evalResults) : averageAcrossRunners(evalResults);
|
|
17
|
+
const data = processedResults.map((result) => {
|
|
18
|
+
const resultData = result.data;
|
|
19
|
+
const rfuncResult = rFunc(resultData, result.metadata);
|
|
20
|
+
const theta = Object.keys(rfuncResult);
|
|
21
|
+
const r = Object.values(rfuncResult);
|
|
22
|
+
return {
|
|
23
|
+
type: 'scatterpolar',
|
|
24
|
+
r,
|
|
25
|
+
theta,
|
|
26
|
+
fill: 'toself',
|
|
27
|
+
name: result.name
|
|
28
|
+
};
|
|
29
|
+
});
|
|
30
|
+
const layout = {
|
|
31
|
+
title: chartName,
|
|
32
|
+
name: chartName,
|
|
33
|
+
polar: {
|
|
34
|
+
radialaxis: {
|
|
35
|
+
visible: true,
|
|
36
|
+
range: [0, 1]
|
|
37
|
+
}
|
|
38
|
+
},
|
|
39
|
+
showlegend: true,
|
|
40
|
+
width: 1000,
|
|
41
|
+
height: 800
|
|
42
|
+
};
|
|
43
|
+
const html = `
|
|
44
|
+
<!DOCTYPE html>
|
|
45
|
+
<html>
|
|
46
|
+
<head>
|
|
47
|
+
<title>${chartName}</title>
|
|
48
|
+
<script src="https://cdn.plot.ly/plotly-2.35.2.min.js" charset="utf-8"></script>
|
|
49
|
+
</head>
|
|
50
|
+
<body>
|
|
51
|
+
<div id="langium-ai-chart" style="width:1000px;height:1000px;margin:8px auto;"></div>
|
|
52
|
+
<script>
|
|
53
|
+
data = ${JSON.stringify(data)};
|
|
54
|
+
layout = ${JSON.stringify(layout)};
|
|
55
|
+
Plotly.newPlot("langium-ai-chart", data, layout);
|
|
56
|
+
</script>
|
|
57
|
+
</body>
|
|
58
|
+
</html>
|
|
59
|
+
`;
|
|
60
|
+
writeFileSync(dest, html);
|
|
61
|
+
console.log(`Radar chart report written to: ${dest}`);
|
|
62
|
+
}
|
|
63
|
+
export function generateHistogram(chartName, evalResults, dest, dataFunc, preprocess) {
|
|
64
|
+
// process results first to average out data (either using the user supplied function, or defaulting to average across runners)
|
|
65
|
+
const processedResults = preprocess ? preprocess(evalResults) : averageAcrossRunners(evalResults);
|
|
66
|
+
const data = processedResults.map((result) => {
|
|
67
|
+
const data = result.data;
|
|
68
|
+
const dd = dataFunc(data, result.metadata);
|
|
69
|
+
const yLabels = Object.keys(dd);
|
|
70
|
+
const xData = Object.values(dd);
|
|
71
|
+
return {
|
|
72
|
+
type: 'bar',
|
|
73
|
+
x: xData,
|
|
74
|
+
y: yLabels,
|
|
75
|
+
orientation: 'h',
|
|
76
|
+
name: result.name
|
|
77
|
+
};
|
|
78
|
+
});
|
|
79
|
+
const layout = {
|
|
80
|
+
title: chartName,
|
|
81
|
+
barmode: 'group',
|
|
82
|
+
showlegend: true,
|
|
83
|
+
width: 1000,
|
|
84
|
+
height: 800
|
|
85
|
+
};
|
|
86
|
+
const html = `
|
|
87
|
+
<!DOCTYPE html>
|
|
88
|
+
<html>
|
|
89
|
+
<head>
|
|
90
|
+
<title>${chartName}</title>
|
|
91
|
+
<script src="https://cdn.plot.ly/plotly-2.35.2.min.js" charset="utf-8"></script>
|
|
92
|
+
</head>
|
|
93
|
+
<body>
|
|
94
|
+
<div id="langium-ai-chart" style="width:1000px;height:1000px;margin:8px auto;"></div>
|
|
95
|
+
<script>
|
|
96
|
+
data = ${JSON.stringify(data)};
|
|
97
|
+
layout = ${JSON.stringify(layout)};
|
|
98
|
+
Plotly.newPlot("langium-ai-chart", data, layout);
|
|
99
|
+
</script>
|
|
100
|
+
</body>
|
|
101
|
+
</html>
|
|
102
|
+
`;
|
|
103
|
+
writeFileSync(dest, html);
|
|
104
|
+
console.log(`Histogram report written to: ${dest}`);
|
|
105
|
+
}
|
|
106
|
+
/**
|
|
107
|
+
* Normalizes all numeric data entries in results (while also retaining non-numeric entries)
|
|
108
|
+
*/
|
|
109
|
+
export function normalizeData(data) {
|
|
110
|
+
const maxValues = new Map();
|
|
111
|
+
for (const result of data) {
|
|
112
|
+
const d = result.data;
|
|
113
|
+
for (const [key, value] of Object.entries(d)) {
|
|
114
|
+
if (typeof value !== 'number') {
|
|
115
|
+
continue;
|
|
116
|
+
}
|
|
117
|
+
const existingMax = maxValues.get(key) ?? 0;
|
|
118
|
+
if (value > existingMax) {
|
|
119
|
+
maxValues.set(key, value);
|
|
120
|
+
}
|
|
121
|
+
}
|
|
122
|
+
}
|
|
123
|
+
for (const result of data) {
|
|
124
|
+
const d = result.data;
|
|
125
|
+
for (const [key, value] of Object.entries(d)) {
|
|
126
|
+
if (typeof value === 'number') {
|
|
127
|
+
const max = maxValues.get(key) ?? 1;
|
|
128
|
+
d[key] = value / max;
|
|
129
|
+
}
|
|
130
|
+
}
|
|
131
|
+
}
|
|
132
|
+
return data;
|
|
133
|
+
}
|
|
134
|
+
/**
|
|
135
|
+
* Generates a historical chart from the provided data, showing runners along the X, and their performance over time along the X axis
|
|
136
|
+
* @param chartName
|
|
137
|
+
* @param folder
|
|
138
|
+
* @param dest
|
|
139
|
+
* @param dataFunc
|
|
140
|
+
* @param options
|
|
141
|
+
*/
|
|
142
|
+
export function generateHistoricalChart(chartName, folder, dest, dataFunc, options) {
|
|
143
|
+
// generate a historical chart by calculating the average for runners in all previous reports, and organizing them in ascending date order
|
|
144
|
+
let files = readdirSync(folder).filter(f => f.endsWith('.json'));
|
|
145
|
+
// array of results, where each array of results is presumed to be a stream of results from a collection of historical runs
|
|
146
|
+
const runnerResultsMap = new Map();
|
|
147
|
+
// take the most recent files if take is set
|
|
148
|
+
if (options?.take) {
|
|
149
|
+
files = files.sort().slice(0, options.take);
|
|
150
|
+
}
|
|
151
|
+
for (const file of files) {
|
|
152
|
+
// retrieve results from this file
|
|
153
|
+
const report = loadReport(path.join(folder, file));
|
|
154
|
+
const results = report.results;
|
|
155
|
+
const date = report.date;
|
|
156
|
+
console.log(`Processing historical results from: ${date}`);
|
|
157
|
+
// process results first
|
|
158
|
+
let processedResults = options?.preprocess ? options.preprocess(results) : averageAcrossRunners(results);
|
|
159
|
+
// normalize
|
|
160
|
+
processedResults = normalizeData(processedResults);
|
|
161
|
+
// add to the map based by runner name
|
|
162
|
+
for (const result of processedResults) {
|
|
163
|
+
if (options?.filter && !options.filter(result)) {
|
|
164
|
+
// skip
|
|
165
|
+
continue;
|
|
166
|
+
}
|
|
167
|
+
const name = result.metadata.runner;
|
|
168
|
+
const existingResults = runnerResultsMap.get(name) ?? [];
|
|
169
|
+
const rc = {
|
|
170
|
+
...result
|
|
171
|
+
};
|
|
172
|
+
rc.metadata.date = new Date(date).toISOString();
|
|
173
|
+
existingResults.push(result);
|
|
174
|
+
runnerResultsMap.set(name, existingResults);
|
|
175
|
+
}
|
|
176
|
+
}
|
|
177
|
+
const allData = [];
|
|
178
|
+
// organize by date in ascending order
|
|
179
|
+
for (let [name, results] of runnerResultsMap) {
|
|
180
|
+
results.sort((a, b) => {
|
|
181
|
+
return new Date(a.metadata.date).getTime() - new Date(b.metadata.date).getTime();
|
|
182
|
+
});
|
|
183
|
+
const runners = results.map(r => r.metadata.runner);
|
|
184
|
+
const data = results.map(r => dataFunc(r.data, r.metadata)).sort();
|
|
185
|
+
allData.push({
|
|
186
|
+
type: options?.chartType ? options.chartType : 'scatter',
|
|
187
|
+
x: runners,
|
|
188
|
+
y: data,
|
|
189
|
+
name
|
|
190
|
+
});
|
|
191
|
+
}
|
|
192
|
+
const layout = {
|
|
193
|
+
title: chartName,
|
|
194
|
+
showlegend: true,
|
|
195
|
+
width: 1000,
|
|
196
|
+
height: 1000
|
|
197
|
+
};
|
|
198
|
+
const html = `
|
|
199
|
+
<!DOCTYPE html>
|
|
200
|
+
<html>
|
|
201
|
+
<head>
|
|
202
|
+
<title>${chartName}</title>
|
|
203
|
+
<script src="https://cdn.plot.ly/plotly-2.35.2.min.js" charset="utf-8"></script>
|
|
204
|
+
</head>
|
|
205
|
+
<body>
|
|
206
|
+
<div id="langium-ai-chart" style="width:1000px;height:1000px;margin:8px auto;"></div>
|
|
207
|
+
<script>
|
|
208
|
+
data = ${JSON.stringify(allData)};
|
|
209
|
+
layout = ${JSON.stringify(layout)};
|
|
210
|
+
Plotly.newPlot("langium-ai-chart", data, layout);
|
|
211
|
+
</script>
|
|
212
|
+
</body>
|
|
213
|
+
</html>
|
|
214
|
+
`;
|
|
215
|
+
writeFileSync(dest, html);
|
|
216
|
+
console.log(`Historical report written to: ${dest}`);
|
|
217
|
+
}
|
|
218
|
+
//# sourceMappingURL=chart.js.map
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
{"version":3,"file":"chart.js","sourceRoot":"","sources":["../../src/evaluator/chart.ts"],"names":[],"mappings":"AAAA;;GAEG;AAEH,OAAO,EAAwC,oBAAoB,EAAE,UAAU,EAAE,MAAM,gBAAgB,CAAC;AACxG,OAAO,EAAE,aAAa,EAAE,WAAW,EAAgB,MAAM,IAAI,CAAC;AAC9D,OAAO,KAAK,IAAI,MAAM,MAAM,CAAC;AAE7B;;;;;;GAMG;AACH,MAAM,UAAU,kBAAkB,CAC9B,SAAiB,EACjB,WAA8B,EAC9B,IAAY,EACZ,KAA2E,EAC3E,UAA0D;IAG1D,+HAA+H;IAC/H,MAAM,gBAAgB,GAAG,UAAU,CAAC,CAAC,CAAC,UAAU,CAAC,WAAW,CAAC,CAAC,CAAC,CAAC,oBAAoB,CAAC,WAAW,CAAC,CAAC;IAElG,MAAM,IAAI,GAAG,gBAAgB,CAAC,GAAG,CAAC,CAAC,MAAM,EAAE,EAAE;QACzC,MAAM,UAAU,GAAG,MAAM,CAAC,IAAS,CAAC;QACpC,MAAM,WAAW,GAAG,KAAK,CAAC,UAAU,EAAE,MAAM,CAAC,QAAQ,CAAC,CAAC;QACvD,MAAM,KAAK,GAAG,MAAM,CAAC,IAAI,CAAC,WAAW,CAAC,CAAC;QACvC,MAAM,CAAC,GAAG,MAAM,CAAC,MAAM,CAAC,WAAW,CAAC,CAAC;QAErC,OAAO;YACH,IAAI,EAAE,cAAc;YACpB,CAAC;YACD,KAAK;YACL,IAAI,EAAE,QAAQ;YACd,IAAI,EAAE,MAAM,CAAC,IAAI;SACpB,CAAC;IACN,CAAC,CAAC,CAAC;IAEH,MAAM,MAAM,GAAG;QACX,KAAK,EAAE,SAAS;QAChB,IAAI,EAAE,SAAS;QACf,KAAK,EAAE;YACH,UAAU,EAAE;gBACR,OAAO,EAAE,IAAI;gBACb,KAAK,EAAE,CAAC,CAAC,EAAE,CAAC,CAAC;aAChB;SACJ;QACD,UAAU,EAAE,IAAI;QAChB,KAAK,EAAE,IAAI;QACX,MAAM,EAAE,GAAG;KACd,CAAC;IAEF,MAAM,IAAI,GAAG;;;;aAIJ,SAAS;;;;;;iBAML,IAAI,CAAC,SAAS,CAAC,IAAI,CAAC;mBAClB,IAAI,CAAC,SAAS,CAAC,MAAM,CAAC;;;;;SAKhC,CAAC;IAEN,aAAa,CAAC,IAAI,EAAE,IAAI,CAAC,CAAC;IAC1B,OAAO,CAAC,GAAG,CAAC,kCAAkC,IAAI,EAAE,CAAC,CAAC;AAC1D,CAAC;AAED,MAAM,UAAU,iBAAiB,CAC7B,SAAiB,EACjB,WAA8B,EAC9B,IAAY,EACZ,QAA8E,EAC9E,UAA0D;IAG1D,+HAA+H;IAC/H,MAAM,gBAAgB,GAAG,UAAU,CAAC,CAAC,CAAC,UAAU,CAAC,WAAW,CAAC,CAAC,CAAC,CAAC,oBAAoB,CAAC,WAAW,CAAC,CAAC;IAElG,MAAM,IAAI,GAAG,gBAAgB,CAAC,GAAG,CAAC,CAAC,MAAM,EAAE,EAAE;QACzC,MAAM,IAAI,GAAG,MAAM,CAAC,IAAS,CAAC;QAC9B,MAAM,EAAE,GAAG,QAAQ,CAAC,IAAI,EAAE,MAAM,CAAC,QAAQ,CAAC,CAAC;QAC3C,MAAM,OAAO,GAAG,MAAM,CAAC,IAAI,CAAC,EAAE,CAAC,CAAC;QAChC,MAAM,KAAK,GAAG,MAAM,CAAC,MAAM,CAAC,EAAE,CAAC,CAAC;QAChC,OAAO;YACH,IAAI,EAAE,KAAK;YACX,CAAC,EAAE,KAAK;YACR,CAAC,EAAE,OAAO;YACV,WAAW,EAAE,GAAG;YAChB,IAAI,EAAE,MAAM,CAAC,IAAI;SACpB,CAAC;IACN,CAAC,CAAC,CAAC;IAEH,MAAM,MAAM,GAAG;QACX,KAAK,EAAE,SAAS;QAChB,OAAO,EAAE,OAAO;QAChB,UAAU,EAAE,IAAI;QAChB,KAAK,EAAE,IAAI;QACX,MAAM,EAAE,GAAG;KACd,CAAC;IAEF,MAAM,IAAI,GAAG;;;;aAIJ,SAAS;;;;;;iBAML,IAAI,CAAC,SAAS,CAAC,IAAI,CAAC;mBAClB,IAAI,CAAC,SAAS,CAAC,MAAM,CAAC;;;;;SAKhC,CAAC;IAEN,aAAa,CAAC,IAAI,EAAE,IAAI,CAAC,CAAC;IAC1B,OAAO,CAAC,GAAG,CAAC,gCAAgC,IAAI,EAAE,CAAC,CAAC;AACxD,CAAC;AAID;;GAEG;AACH,MAAM,UAAU,aAAa,CAAC,IAAuB;IACjD,MAAM,SAAS,GAAG,IAAI,GAAG,EAAkB,CAAC;IAE5C,KAAK,MAAM,MAAM,IAAI,IAAI,EAAE,CAAC;QACxB,MAAM,CAAC,GAAG,MAAM,CAAC,IAA2B,CAAC;QAC7C,KAAK,MAAM,CAAC,GAAG,EAAE,KAAK,CAAC,IAAI,MAAM,CAAC,OAAO,CAAC,CAAC,CAAC,EAAE,CAAC;YAC3C,IAAI,OAAO,KAAK,KAAK,QAAQ,EAAE,CAAC;gBAC5B,SAAS;YACb,CAAC;YACD,MAAM,WAAW,GAAG,SAAS,CAAC,GAAG,CAAC,GAAG,CAAC,IAAI,CAAC,CAAC;YAC5C,IAAI,KAAK,GAAG,WAAW,EAAE,CAAC;gBACtB,SAAS,CAAC,GAAG,CAAC,GAAG,EAAE,KAAK,CAAC,CAAC;YAC9B,CAAC;QACL,CAAC;IACL,CAAC;IAED,KAAK,MAAM,MAAM,IAAI,IAAI,EAAE,CAAC;QACxB,MAAM,CAAC,GAAG,MAAM,CAAC,IAA2B,CAAC;QAC7C,KAAK,MAAM,CAAC,GAAG,EAAE,KAAK,CAAC,IAAI,MAAM,CAAC,OAAO,CAAC,CAAC,CAAC,EAAE,CAAC;YAC3C,IAAI,OAAO,KAAK,KAAK,QAAQ,EAAE,CAAC;gBAC5B,MAAM,GAAG,GAAG,SAAS,CAAC,GAAG,CAAC,GAAG,CAAC,IAAI,CAAC,CAAC;gBACpC,CAAC,CAAC,GAAG,CAAC,GAAG,KAAK,GAAG,GAAG,CAAC;YACzB,CAAC;QACL,CAAC;IACL,CAAC;IAED,OAAO,IAAI,CAAC;AAChB,CAAC;AAED;;;;;;;GAOG;AACH,MAAM,UAAU,uBAAuB,CACnC,SAAiB,EACjB,MAAc,EACd,IAAY,EACZ,QAA6D,EAC7D,OAKC;IAED,0IAA0I;IAC1I,IAAI,KAAK,GAAG,WAAW,CAAC,MAAM,CAAC,CAAC,MAAM,CAAC,CAAC,CAAC,EAAE,CAAC,CAAC,CAAC,QAAQ,CAAC,OAAO,CAAC,CAAC,CAAC;IAEjE,2HAA2H;IAC3H,MAAM,gBAAgB,GAAuC,IAAI,GAAG,EAAE,CAAC;IAEvE,4CAA4C;IAC5C,IAAI,OAAO,EAAE,IAAI,EAAE,CAAC;QAChB,KAAK,GAAG,KAAK,CAAC,IAAI,EAAE,CAAC,KAAK,CAAC,CAAC,EAAE,OAAO,CAAC,IAAI,CAAC,CAAC;IAChD,CAAC;IAED,KAAK,MAAM,IAAI,IAAI,KAAK,EAAE,CAAC;QACvB,kCAAkC;QAClC,MAAM,MAAM,GAAG,UAAU,CAAC,IAAI,CAAC,IAAI,CAAC,MAAM,EAAE,IAAI,CAAC,CAAC,CAAC;QACnD,MAAM,OAAO,GAAG,MAAM,CAAC,OAAO,CAAC;QAC/B,MAAM,IAAI,GAAW,MAAM,CAAC,IAAI,CAAC;QACjC,OAAO,CAAC,GAAG,CAAC,uCAAuC,IAAI,EAAE,CAAC,CAAC;QAE3D,wBAAwB;QACxB,IAAI,gBAAgB,GAAG,OAAO,EAAE,UAAU,CAAC,CAAC,CAAC,OAAO,CAAC,UAAU,CAAC,OAAO,CAAC,CAAC,CAAC,CAAC,oBAAoB,CAAC,OAAO,CAAC,CAAC;QACzG,YAAY;QACZ,gBAAgB,GAAG,aAAa,CAAC,gBAAgB,CAAC,CAAC;QAEnD,sCAAsC;QACtC,KAAK,MAAM,MAAM,IAAI,gBAAgB,EAAE,CAAC;YACpC,IAAI,OAAO,EAAE,MAAM,IAAI,CAAC,OAAO,CAAC,MAAM,CAAC,MAAM,CAAC,EAAE,CAAC;gBAC7C,OAAO;gBACP,SAAS;YACb,CAAC;YAED,MAAM,IAAI,GAAG,MAAM,CAAC,QAAQ,CAAC,MAAM,CAAC;YACpC,MAAM,eAAe,GAAG,gBAAgB,CAAC,GAAG,CAAC,IAAI,CAAC,IAAI,EAAE,CAAC;YAEzD,MAAM,EAAE,GAAG;gBACP,GAAG,MAAM;aACZ,CAAC;YACF,EAAE,CAAC,QAAQ,CAAC,IAAI,GAAG,IAAI,IAAI,CAAC,IAAI,CAAC,CAAC,WAAW,EAAE,CAAC;YAEhD,eAAe,CAAC,IAAI,CAAC,MAAM,CAAC,CAAC;YAC7B,gBAAgB,CAAC,GAAG,CAAC,IAAI,EAAE,eAAe,CAAC,CAAC;QAChD,CAAC;IACL,CAAC;IAED,MAAM,OAAO,GAAc,EAAE,CAAC;IAE9B,sCAAsC;IACtC,KAAK,IAAI,CAAC,IAAI,EAAE,OAAO,CAAC,IAAI,gBAAgB,EAAE,CAAC;QAC3C,OAAO,CAAC,IAAI,CAAC,CAAC,CAAC,EAAE,CAAC,EAAE,EAAE;YAClB,OAAO,IAAI,IAAI,CAAC,CAAC,CAAC,QAAQ,CAAC,IAAI,CAAC,CAAC,OAAO,EAAE,GAAG,IAAI,IAAI,CAAC,CAAC,CAAC,QAAQ,CAAC,IAAI,CAAC,CAAC,OAAO,EAAE,CAAC;QACrF,CAAC,CAAC,CAAC;QAEH,MAAM,OAAO,GAAG,OAAO,CAAC,GAAG,CAAC,CAAC,CAAC,EAAE,CAAC,CAAC,CAAC,QAAQ,CAAC,MAAM,CAAC,CAAC;QACpD,MAAM,IAAI,GAAG,OAAO,CAAC,GAAG,CAAC,CAAC,CAAC,EAAE,CAAC,QAAQ,CAAC,CAAC,CAAC,IAAS,EAAE,CAAC,CAAC,QAAQ,CAAC,CAAC,CAAC,IAAI,EAAE,CAAC;QAExE,OAAO,CAAC,IAAI,CAAC;YACT,IAAI,EAAE,OAAO,EAAE,SAAS,CAAC,CAAC,CAAC,OAAO,CAAC,SAAS,CAAC,CAAC,CAAC,SAAS;YACxD,CAAC,EAAE,OAAO;YACV,CAAC,EAAE,IAAI;YACP,IAAI;SACP,CAAC,CAAC;IACP,CAAC;IAED,MAAM,MAAM,GAAG;QACX,KAAK,EAAE,SAAS;QAChB,UAAU,EAAE,IAAI;QAChB,KAAK,EAAE,IAAI;QACX,MAAM,EAAE,IAAI;KACf,CAAC;IAEF,MAAM,IAAI,GAAG;;;;aAIJ,SAAS;;;;;;iBAML,IAAI,CAAC,SAAS,CAAC,OAAO,CAAC;mBACrB,IAAI,CAAC,SAAS,CAAC,MAAM,CAAC;;;;;SAKhC,CAAC;IAEN,aAAa,CAAC,IAAI,EAAE,IAAI,CAAC,CAAC;IAC1B,OAAO,CAAC,GAAG,CAAC,iCAAiC,IAAI,EAAE,CAAC,CAAC;AAEzD,CAAC"}
|
|
@@ -0,0 +1,8 @@
|
|
|
1
|
+
import { Evaluator, EvaluatorResult, EvaluatorResultData } from './evaluator.js';
|
|
2
|
+
export interface EditDistanceEvaluatorResultData extends EvaluatorResultData {
|
|
3
|
+
edit_distance: number;
|
|
4
|
+
}
|
|
5
|
+
export declare class EditDistanceEvaluator extends Evaluator {
|
|
6
|
+
evaluate(response: string, expected_response: string): Promise<Partial<EvaluatorResult>>;
|
|
7
|
+
}
|
|
8
|
+
//# sourceMappingURL=edit-distance-evaluator.d.ts.map
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
{"version":3,"file":"edit-distance-evaluator.d.ts","sourceRoot":"","sources":["../../src/evaluator/edit-distance-evaluator.ts"],"names":[],"mappings":"AACA,OAAO,EAAE,SAAS,EAAE,eAAe,EAAE,mBAAmB,EAAE,MAAM,gBAAgB,CAAC;AAEjF,MAAM,WAAW,+BAAgC,SAAQ,mBAAmB;IACxE,aAAa,EAAE,MAAM,CAAC;CACzB;AAED,qBAAa,qBAAsB,SAAQ,SAAS;IAC1C,QAAQ,CAAC,QAAQ,EAAE,MAAM,EAAE,iBAAiB,EAAE,MAAM,GAAG,OAAO,CAAC,OAAO,CAAC,eAAe,CAAC,CAAC;CAQjG"}
|
|
@@ -0,0 +1,13 @@
|
|
|
1
|
+
import { levenshteinEditDistance } from 'levenshtein-edit-distance';
|
|
2
|
+
import { Evaluator } from './evaluator.js';
|
|
3
|
+
export class EditDistanceEvaluator extends Evaluator {
|
|
4
|
+
async evaluate(response, expected_response) {
|
|
5
|
+
const distance = levenshteinEditDistance(response, expected_response);
|
|
6
|
+
return {
|
|
7
|
+
data: {
|
|
8
|
+
edit_distance: distance
|
|
9
|
+
}
|
|
10
|
+
};
|
|
11
|
+
}
|
|
12
|
+
}
|
|
13
|
+
//# sourceMappingURL=edit-distance-evaluator.js.map
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
{"version":3,"file":"edit-distance-evaluator.js","sourceRoot":"","sources":["../../src/evaluator/edit-distance-evaluator.ts"],"names":[],"mappings":"AAAA,OAAO,EAAC,uBAAuB,EAAC,MAAM,2BAA2B,CAAC;AAClE,OAAO,EAAE,SAAS,EAAwC,MAAM,gBAAgB,CAAC;AAMjF,MAAM,OAAO,qBAAsB,SAAQ,SAAS;IAChD,KAAK,CAAC,QAAQ,CAAC,QAAgB,EAAE,iBAAyB;QACtD,MAAM,QAAQ,GAAG,uBAAuB,CAAC,QAAQ,EAAE,iBAAiB,CAAC,CAAC;QACtE,OAAO;YACH,IAAI,EAAE;gBACF,aAAa,EAAE,QAAQ;aAC1B;SACJ,CAAC;IACN,CAAC;CACJ"}
|
|
@@ -0,0 +1,95 @@
|
|
|
1
|
+
import { Evaluator, EvaluatorResult } from "./evaluator.js";
|
|
2
|
+
/**
|
|
3
|
+
* Configuration for the evaluation matrix
|
|
4
|
+
*/
|
|
5
|
+
export interface EvalMatrixConfig {
|
|
6
|
+
config: {
|
|
7
|
+
/**
|
|
8
|
+
* Name of the evaluation matrix
|
|
9
|
+
*/
|
|
10
|
+
name: string;
|
|
11
|
+
/**
|
|
12
|
+
* Helpful description of the evaluation matrix
|
|
13
|
+
*/
|
|
14
|
+
description: string;
|
|
15
|
+
/**
|
|
16
|
+
* Where to store run history
|
|
17
|
+
*/
|
|
18
|
+
history_folder: string;
|
|
19
|
+
/**
|
|
20
|
+
* The number of runs to perform for each case
|
|
21
|
+
* Note this will trigger evaluation for all registered evaluators for each run
|
|
22
|
+
*/
|
|
23
|
+
num_runs: number;
|
|
24
|
+
};
|
|
25
|
+
/**
|
|
26
|
+
* Runners to evaluate
|
|
27
|
+
*/
|
|
28
|
+
runners: Runner[];
|
|
29
|
+
/**
|
|
30
|
+
* Evaluators to evaluate with
|
|
31
|
+
*/
|
|
32
|
+
evaluators: NamedEvaluator[];
|
|
33
|
+
/**
|
|
34
|
+
* Cases to evaluate
|
|
35
|
+
*/
|
|
36
|
+
cases: Case[];
|
|
37
|
+
}
|
|
38
|
+
/**
|
|
39
|
+
* Evaluation matrix for running multiple runners on multiple cases with multiple evaluators
|
|
40
|
+
*/
|
|
41
|
+
export declare class EvalMatrix {
|
|
42
|
+
private config;
|
|
43
|
+
constructor(config: EvalMatrixConfig);
|
|
44
|
+
/**
|
|
45
|
+
* Run the evaluation matrix, getting all results back
|
|
46
|
+
*/
|
|
47
|
+
run(): Promise<EvaluatorResult[]>;
|
|
48
|
+
}
|
|
49
|
+
/**
|
|
50
|
+
* General format for histories when prompting
|
|
51
|
+
*/
|
|
52
|
+
export interface Message {
|
|
53
|
+
role: 'user' | 'system' | 'assistant';
|
|
54
|
+
content: string;
|
|
55
|
+
}
|
|
56
|
+
/**
|
|
57
|
+
* Runner interface for running a prompt against a mode, a service, or something else that provides a response
|
|
58
|
+
*/
|
|
59
|
+
export interface Runner {
|
|
60
|
+
name: string;
|
|
61
|
+
runner: (prompt: string, messages: Message[]) => Promise<string>;
|
|
62
|
+
}
|
|
63
|
+
/**
|
|
64
|
+
* Generic evaluator interface w/ a name to identify it
|
|
65
|
+
*/
|
|
66
|
+
export interface NamedEvaluator {
|
|
67
|
+
name: string;
|
|
68
|
+
eval: Evaluator;
|
|
69
|
+
}
|
|
70
|
+
/**
|
|
71
|
+
* Case interface for defining an evaluation case
|
|
72
|
+
*/
|
|
73
|
+
export interface Case {
|
|
74
|
+
/**
|
|
75
|
+
* Name of the case
|
|
76
|
+
*/
|
|
77
|
+
name: string;
|
|
78
|
+
/**
|
|
79
|
+
* Options Message history, used for system, user & assistant messages
|
|
80
|
+
*/
|
|
81
|
+
history?: Message[];
|
|
82
|
+
/**
|
|
83
|
+
* Core prompt to run with
|
|
84
|
+
*/
|
|
85
|
+
prompt: string;
|
|
86
|
+
/**
|
|
87
|
+
* Context for the prompt, used for RAG applications
|
|
88
|
+
*/
|
|
89
|
+
context: string[];
|
|
90
|
+
/**
|
|
91
|
+
* Expected response
|
|
92
|
+
*/
|
|
93
|
+
expected_response: string;
|
|
94
|
+
}
|
|
95
|
+
//# sourceMappingURL=eval-matrix.d.ts.map
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
{"version":3,"file":"eval-matrix.d.ts","sourceRoot":"","sources":["../../src/evaluator/eval-matrix.ts"],"names":[],"mappings":"AAAA,OAAO,EAAE,SAAS,EAAE,eAAe,EAAE,MAAM,gBAAgB,CAAC;AAI5D;;GAEG;AACH,MAAM,WAAW,gBAAgB;IAC7B,MAAM,EAAE;QACJ;;WAEG;QACH,IAAI,EAAE,MAAM,CAAC;QAEb;;WAEG;QACH,WAAW,EAAE,MAAM,CAAC;QAEpB;;WAEG;QACH,cAAc,EAAE,MAAM,CAAC;QAEvB;;;WAGG;QACH,QAAQ,EAAE,MAAM,CAAC;KACpB,CAAC;IAEF;;OAEG;IACH,OAAO,EAAE,MAAM,EAAE,CAAC;IAElB;;OAEG;IACH,UAAU,EAAE,cAAc,EAAE,CAAC;IAE7B;;OAEG;IACH,KAAK,EAAE,IAAI,EAAE,CAAC;CACjB;AAED;;GAEG;AACH,qBAAa,UAAU;IACnB,OAAO,CAAC,MAAM,CAAmB;gBAErB,MAAM,EAAE,gBAAgB;IAIpC;;OAEG;IACG,GAAG,IAAI,OAAO,CAAC,eAAe,EAAE,CAAC;CA2F1C;AAED;;GAEG;AACH,MAAM,WAAW,OAAO;IACpB,IAAI,EAAE,MAAM,GAAG,QAAQ,GAAG,WAAW,CAAC;IACtC,OAAO,EAAE,MAAM,CAAC;CACnB;AAED;;GAEG;AACH,MAAM,WAAW,MAAM;IACnB,IAAI,EAAE,MAAM,CAAC;IACb,MAAM,EAAE,CAAC,MAAM,EAAE,MAAM,EAAE,QAAQ,EAAE,OAAO,EAAE,KAAK,OAAO,CAAC,MAAM,CAAC,CAAC;CACpE;AAED;;GAEG;AACH,MAAM,WAAW,cAAc;IAC3B,IAAI,EAAE,MAAM,CAAC;IACb,IAAI,EAAE,SAAS,CAAC;CACnB;AAED;;GAEG;AACH,MAAM,WAAW,IAAI;IACjB;;OAEG;IACH,IAAI,EAAE,MAAM,CAAC;IAEb;;OAEG;IACH,OAAO,CAAC,EAAE,OAAO,EAAE,CAAC;IAEpB;;OAEG;IACH,MAAM,EAAE,MAAM,CAAC;IAEf;;OAEG;IACH,OAAO,EAAE,MAAM,EAAE,CAAC;IAElB;;OAEG;IACH,iBAAiB,EAAE,MAAM,CAAC;CAC7B"}
|
|
@@ -0,0 +1,87 @@
|
|
|
1
|
+
import fs from 'fs';
|
|
2
|
+
import * as path from 'path';
|
|
3
|
+
/**
|
|
4
|
+
* Evaluation matrix for running multiple runners on multiple cases with multiple evaluators
|
|
5
|
+
*/
|
|
6
|
+
export class EvalMatrix {
|
|
7
|
+
constructor(config) {
|
|
8
|
+
this.config = config;
|
|
9
|
+
}
|
|
10
|
+
/**
|
|
11
|
+
* Run the evaluation matrix, getting all results back
|
|
12
|
+
*/
|
|
13
|
+
async run() {
|
|
14
|
+
// get the current timestamp
|
|
15
|
+
const start = new Date();
|
|
16
|
+
const results = [];
|
|
17
|
+
// verify that all runners have unique names first
|
|
18
|
+
const runnerNames = this.config.runners.map(r => r.name);
|
|
19
|
+
const uniqueRunnerNames = new Set();
|
|
20
|
+
for (const name of runnerNames) {
|
|
21
|
+
if (uniqueRunnerNames.has(name)) {
|
|
22
|
+
throw new Error(`Runner names must be unique, found duplicate: ${name}`);
|
|
23
|
+
}
|
|
24
|
+
uniqueRunnerNames.add(name);
|
|
25
|
+
}
|
|
26
|
+
console.log(`Running evaluation matrix: ${this.config.config.name}`);
|
|
27
|
+
console.log(`Found ${this.config.runners.length * this.config.cases.length * this.config.evaluators.length} runner-evaluator-case combinations to handle`);
|
|
28
|
+
// run all runners
|
|
29
|
+
for (const runner of this.config.runners) {
|
|
30
|
+
console.log(`* Runner: ${runner.name}`);
|
|
31
|
+
// run all cases for this runner
|
|
32
|
+
for (const testCase of this.config.cases) {
|
|
33
|
+
console.log(` * Case: ${testCase.name}`);
|
|
34
|
+
const runCount = this.config.config.num_runs ?? 1;
|
|
35
|
+
for (let iteration = 0; iteration < runCount; iteration++) {
|
|
36
|
+
const runnerStartTime = new Date();
|
|
37
|
+
const response = await runner.runner(testCase.prompt, testCase.history ?? []);
|
|
38
|
+
const runnerEndTime = new Date();
|
|
39
|
+
// run all evaluators on this response
|
|
40
|
+
for (const evaluator of this.config.evaluators) {
|
|
41
|
+
console.log(` * Evaluator: ${evaluator.name} (run ${iteration + 1})`);
|
|
42
|
+
const result = await evaluator.eval.evaluate(response, testCase.expected_response);
|
|
43
|
+
if (!result.name) {
|
|
44
|
+
result.name = `${runner.name} - ${testCase.name} - ${evaluator.name}`;
|
|
45
|
+
}
|
|
46
|
+
// add runtime there too, so we have access to it
|
|
47
|
+
result.data._runtime = (runnerEndTime.getTime() - runnerStartTime.getTime()) / 1000.0; // in seconds
|
|
48
|
+
result.metadata = {
|
|
49
|
+
runner: runner.name,
|
|
50
|
+
evaluator: evaluator.name,
|
|
51
|
+
testCase: { ...testCase },
|
|
52
|
+
actual_response: response,
|
|
53
|
+
duration: (runnerEndTime.getTime() - runnerStartTime.getTime()) / 1000.0, // in seconds
|
|
54
|
+
run_count: iteration + 1
|
|
55
|
+
};
|
|
56
|
+
results.push(result);
|
|
57
|
+
}
|
|
58
|
+
}
|
|
59
|
+
}
|
|
60
|
+
}
|
|
61
|
+
// check if the folder exists first
|
|
62
|
+
if (!fs.existsSync(this.config.config.history_folder)) {
|
|
63
|
+
fs.mkdirSync(this.config.config.history_folder);
|
|
64
|
+
}
|
|
65
|
+
const dateStr = new Date().toISOString();
|
|
66
|
+
const sanitizedDateStr = dateStr.replace(/:/g, '-').replace(/\./g, '-');
|
|
67
|
+
let fileName = `${sanitizedDateStr}-${this.config.config.name.toLowerCase().replace(/\s+/g, '-')}.json`;
|
|
68
|
+
// escape any slashes too
|
|
69
|
+
fileName = fileName.replace(/\//g, '-');
|
|
70
|
+
console.log(`Writing results to file: ${path.join(this.config.config.history_folder, fileName)}`);
|
|
71
|
+
// run time in seconds
|
|
72
|
+
const runTime = (new Date().getTime() - start.getTime()) / 1000;
|
|
73
|
+
console.log(`Evaluation matrix completed in ${runTime} seconds (${runTime / 60} minutes)`);
|
|
74
|
+
// prepare & write results to file
|
|
75
|
+
const report = {
|
|
76
|
+
config: this.config.config,
|
|
77
|
+
date: dateStr,
|
|
78
|
+
runTime: `${runTime}s`,
|
|
79
|
+
results
|
|
80
|
+
};
|
|
81
|
+
fs.writeFileSync(path.join(this.config.config.history_folder, fileName), JSON.stringify(report, null, 2));
|
|
82
|
+
// write the name of this last report into last.txt
|
|
83
|
+
fs.writeFileSync(path.join(this.config.config.history_folder, 'last.txt'), fileName);
|
|
84
|
+
return results;
|
|
85
|
+
}
|
|
86
|
+
}
|
|
87
|
+
//# sourceMappingURL=eval-matrix.js.map
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
{"version":3,"file":"eval-matrix.js","sourceRoot":"","sources":["../../src/evaluator/eval-matrix.ts"],"names":[],"mappings":"AACA,OAAO,EAAE,MAAM,IAAI,CAAC;AACpB,OAAO,KAAK,IAAI,MAAM,MAAM,CAAC;AA6C7B;;GAEG;AACH,MAAM,OAAO,UAAU;IAGnB,YAAY,MAAwB;QAChC,IAAI,CAAC,MAAM,GAAG,MAAM,CAAC;IACzB,CAAC;IAED;;OAEG;IACH,KAAK,CAAC,GAAG;QAEL,4BAA4B;QAC5B,MAAM,KAAK,GAAG,IAAI,IAAI,EAAE,CAAC;QAEzB,MAAM,OAAO,GAAsB,EAAE,CAAC;QAEtC,kDAAkD;QAClD,MAAM,WAAW,GAAG,IAAI,CAAC,MAAM,CAAC,OAAO,CAAC,GAAG,CAAC,CAAC,CAAC,EAAE,CAAC,CAAC,CAAC,IAAI,CAAC,CAAC;QACzD,MAAM,iBAAiB,GAAG,IAAI,GAAG,EAAE,CAAC;QACpC,KAAK,MAAM,IAAI,IAAI,WAAW,EAAE,CAAC;YAC7B,IAAI,iBAAiB,CAAC,GAAG,CAAC,IAAI,CAAC,EAAE,CAAC;gBAC9B,MAAM,IAAI,KAAK,CAAC,iDAAiD,IAAI,EAAE,CAAC,CAAC;YAC7E,CAAC;YACD,iBAAiB,CAAC,GAAG,CAAC,IAAI,CAAC,CAAC;QAChC,CAAC;QAED,OAAO,CAAC,GAAG,CAAC,8BAA8B,IAAI,CAAC,MAAM,CAAC,MAAM,CAAC,IAAI,EAAE,CAAC,CAAC;QACrE,OAAO,CAAC,GAAG,CAAC,SAAS,IAAI,CAAC,MAAM,CAAC,OAAO,CAAC,MAAM,GAAG,IAAI,CAAC,MAAM,CAAC,KAAK,CAAC,MAAM,GAAG,IAAI,CAAC,MAAM,CAAC,UAAU,CAAC,MAAM,+CAA+C,CAAC,CAAC;QAE3J,kBAAkB;QAClB,KAAK,MAAM,MAAM,IAAI,IAAI,CAAC,MAAM,CAAC,OAAO,EAAE,CAAC;YAEvC,OAAO,CAAC,GAAG,CAAC,aAAa,MAAM,CAAC,IAAI,EAAE,CAAC,CAAC;YAExC,gCAAgC;YAChC,KAAK,MAAM,QAAQ,IAAI,IAAI,CAAC,MAAM,CAAC,KAAK,EAAE,CAAC;gBACvC,OAAO,CAAC,GAAG,CAAC,aAAa,QAAQ,CAAC,IAAI,EAAE,CAAC,CAAC;gBAE1C,MAAM,QAAQ,GAAG,IAAI,CAAC,MAAM,CAAC,MAAM,CAAC,QAAQ,IAAI,CAAC,CAAC;gBAClD,KAAK,IAAI,SAAS,GAAG,CAAC,EAAE,SAAS,GAAG,QAAQ,EAAE,SAAS,EAAE,EAAE,CAAC;oBACxD,MAAM,eAAe,GAAG,IAAI,IAAI,EAAE,CAAC;oBACnC,MAAM,QAAQ,GAAG,MAAM,MAAM,CAAC,MAAM,CAAC,QAAQ,CAAC,MAAM,EAAE,QAAQ,CAAC,OAAO,IAAI,EAAE,CAAC,CAAC;oBAC9E,MAAM,aAAa,GAAG,IAAI,IAAI,EAAE,CAAC;oBAEjC,sCAAsC;oBACtC,KAAK,MAAM,SAAS,IAAI,IAAI,CAAC,MAAM,CAAC,UAAU,EAAE,CAAC;wBAC7C,OAAO,CAAC,GAAG,CAAC,oBAAoB,SAAS,CAAC,IAAI,SAAS,SAAS,GAAG,CAAC,GAAG,CAAC,CAAC;wBACzE,MAAM,MAAM,GAAG,MAAM,SAAS,CAAC,IAAI,CAAC,QAAQ,CAAC,QAAQ,EAAE,QAAQ,CAAC,iBAAiB,CAAC,CAAC;wBACnF,IAAI,CAAC,MAAM,CAAC,IAAI,EAAE,CAAC;4BACf,MAAM,CAAC,IAAI,GAAG,GAAG,MAAM,CAAC,IAAI,MAAM,QAAQ,CAAC,IAAI,MAAM,SAAS,CAAC,IAAI,EAAE,CAAC;wBAC1E,CAAC;wBACD,iDAAiD;wBACjD,MAAM,CAAC,IAAK,CAAC,QAAQ,GAAG,CAAC,aAAa,CAAC,OAAO,EAAE,GAAG,eAAe,CAAC,OAAO,EAAE,CAAC,GAAG,MAAM,CAAC,CAAC,aAAa;wBAErG,MAAM,CAAC,QAAQ,GAAG;4BACd,MAAM,EAAE,MAAM,CAAC,IAAI;4BACnB,SAAS,EAAE,SAAS,CAAC,IAAI;4BACzB,QAAQ,EAAE,EAAE,GAAG,QAAQ,EAAE;4BACzB,eAAe,EAAE,QAAQ;4BACzB,QAAQ,EAAE,CAAC,aAAa,CAAC,OAAO,EAAE,GAAG,eAAe,CAAC,OAAO,EAAE,CAAC,GAAG,MAAM,EAAE,aAAa;4BACvF,SAAS,EAAE,SAAS,GAAG,CAAC;yBAC3B,CAAC;wBAEF,OAAO,CAAC,IAAI,CAAC,MAAyB,CAAC,CAAC;oBAC5C,CAAC;gBACL,CAAC;YACL,CAAC;QACL,CAAC;QAED,mCAAmC;QACnC,IAAI,CAAC,EAAE,CAAC,UAAU,CAAC,IAAI,CAAC,MAAM,CAAC,MAAM,CAAC,cAAc,CAAC,EAAE,CAAC;YACpD,EAAE,CAAC,SAAS,CAAC,IAAI,CAAC,MAAM,CAAC,MAAM,CAAC,cAAc,CAAC,CAAC;QACpD,CAAC;QAED,MAAM,OAAO,GAAG,IAAI,IAAI,EAAE,CAAC,WAAW,EAAE,CAAC;QACzC,MAAM,gBAAgB,GAAG,OAAO,CAAC,OAAO,CAAC,IAAI,EAAE,GAAG,CAAC,CAAC,OAAO,CAAC,KAAK,EAAE,GAAG,CAAC,CAAC;QACxE,IAAI,QAAQ,GAAG,GAAG,gBAAgB,IAAI,IAAI,CAAC,MAAM,CAAC,MAAM,CAAC,IAAI,CAAC,WAAW,EAAE,CAAC,OAAO,CAAC,MAAM,EAAE,GAAG,CAAC,OAAO,CAAC;QACxG,yBAAyB;QACzB,QAAQ,GAAG,QAAQ,CAAC,OAAO,CAAC,KAAK,EAAE,GAAG,CAAC,CAAC;QAExC,OAAO,CAAC,GAAG,CAAC,4BAA4B,IAAI,CAAC,IAAI,CAAC,IAAI,CAAC,MAAM,CAAC,MAAM,CAAC,cAAc,EAAE,QAAQ,CAAC,EAAE,CAAC,CAAC;QAElG,sBAAsB;QACtB,MAAM,OAAO,GAAG,CAAC,IAAI,IAAI,EAAE,CAAC,OAAO,EAAE,GAAG,KAAK,CAAC,OAAO,EAAE,CAAC,GAAG,IAAI,CAAC;QAChE,OAAO,CAAC,GAAG,CAAC,kCAAkC,OAAO,aAAa,OAAO,GAAG,EAAE,WAAW,CAAC,CAAC;QAE3F,kCAAkC;QAClC,MAAM,MAAM,GAAG;YACX,MAAM,EAAE,IAAI,CAAC,MAAM,CAAC,MAAM;YAC1B,IAAI,EAAE,OAAO;YACb,OAAO,EAAE,GAAG,OAAO,GAAG;YACtB,OAAO;SACV,CAAC;QACF,EAAE,CAAC,aAAa,CAAC,IAAI,CAAC,IAAI,CAAC,IAAI,CAAC,MAAM,CAAC,MAAM,CAAC,cAAc,EAAE,QAAQ,CAAC,EAAE,IAAI,CAAC,SAAS,CAAC,MAAM,EAAE,IAAI,EAAE,CAAC,CAAC,CAAC,CAAC;QAE1G,mDAAmD;QACnD,EAAE,CAAC,aAAa,CAAC,IAAI,CAAC,IAAI,CAAC,IAAI,CAAC,MAAM,CAAC,MAAM,CAAC,cAAc,EAAE,UAAU,CAAC,EAAE,QAAQ,CAAC,CAAC;QAErF,OAAO,OAAO,CAAC;IACnB,CAAC;CACJ"}
|
|
@@ -0,0 +1,64 @@
|
|
|
1
|
+
/**
|
|
2
|
+
* Baseline Validator Class
|
|
3
|
+
*/
|
|
4
|
+
export type EvaluatorResultData = Record<string, unknown> & {
|
|
5
|
+
_runtime?: number;
|
|
6
|
+
};
|
|
7
|
+
/**
|
|
8
|
+
* Evaluator result type
|
|
9
|
+
*/
|
|
10
|
+
export type EvaluatorResult = {
|
|
11
|
+
/**
|
|
12
|
+
* Name of this evaluation
|
|
13
|
+
*/
|
|
14
|
+
name: string;
|
|
15
|
+
/**
|
|
16
|
+
* Optional metadata, can be used to store additional information
|
|
17
|
+
*/
|
|
18
|
+
metadata: Record<string, any>;
|
|
19
|
+
/**
|
|
20
|
+
* Data for this evaluation
|
|
21
|
+
*/
|
|
22
|
+
data: EvaluatorResultData;
|
|
23
|
+
};
|
|
24
|
+
/**
|
|
25
|
+
* Helper to process a set of results, averaging all runs of each runner-evaluator-case combination
|
|
26
|
+
*/
|
|
27
|
+
export declare function averageAcrossCases(results: EvaluatorResult[]): EvaluatorResult[];
|
|
28
|
+
/**
|
|
29
|
+
* Averages all results across runners at the highest level, to get a single result for each runner
|
|
30
|
+
*/
|
|
31
|
+
export declare function averageAcrossRunners(results: EvaluatorResult[]): EvaluatorResult[];
|
|
32
|
+
/**
|
|
33
|
+
* Report
|
|
34
|
+
*/
|
|
35
|
+
export interface Report {
|
|
36
|
+
config: {
|
|
37
|
+
name: string;
|
|
38
|
+
description: string;
|
|
39
|
+
history_folder: string;
|
|
40
|
+
num_runs: number;
|
|
41
|
+
};
|
|
42
|
+
date: string;
|
|
43
|
+
runTime: string;
|
|
44
|
+
results: EvaluatorResult[];
|
|
45
|
+
}
|
|
46
|
+
/**
|
|
47
|
+
* Loads a specific report, containing evaluator results from a file & returns it
|
|
48
|
+
*/
|
|
49
|
+
export declare function loadReport(file: string): Report;
|
|
50
|
+
/**
|
|
51
|
+
* Attempts to load the most recent evaluator results from the given file
|
|
52
|
+
*/
|
|
53
|
+
export declare function loadLastResults(dir: string, take?: number): EvaluatorResult[];
|
|
54
|
+
/**
|
|
55
|
+
* Evaluator class for evaluating agent responses
|
|
56
|
+
*/
|
|
57
|
+
export declare abstract class Evaluator {
|
|
58
|
+
/**
|
|
59
|
+
* Validate some agent response
|
|
60
|
+
*/
|
|
61
|
+
abstract evaluate(response: string, expected_response: string): Promise<Partial<EvaluatorResult>>;
|
|
62
|
+
}
|
|
63
|
+
export declare function mergeEvaluators(...evaluators: Evaluator[]): Evaluator;
|
|
64
|
+
//# sourceMappingURL=evaluator.d.ts.map
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
{"version":3,"file":"evaluator.d.ts","sourceRoot":"","sources":["../../src/evaluator/evaluator.ts"],"names":[],"mappings":"AAAA;;GAEG;AAKH,MAAM,MAAM,mBAAmB,GAAG,MAAM,CAAC,MAAM,EAAE,OAAO,CAAC,GAAG;IACxD,QAAQ,CAAC,EAAE,MAAM,CAAC;CACrB,CAAC;AAEF;;GAEG;AACH,MAAM,MAAM,eAAe,GAAG;IAC1B;;OAEG;IACH,IAAI,EAAE,MAAM,CAAC;IAEb;;OAEG;IACH,QAAQ,EAAE,MAAM,CAAC,MAAM,EAAE,GAAG,CAAC,CAAC;IAE9B;;OAEG;IACH,IAAI,EAAE,mBAAmB,CAAC;CAE7B,CAAC;AAEF;;GAEG;AACH,wBAAgB,kBAAkB,CAAC,OAAO,EAAE,eAAe,EAAE,GAAG,eAAe,EAAE,CA4ChF;AAED;;GAEG;AACH,wBAAgB,oBAAoB,CAAC,OAAO,EAAE,eAAe,EAAE,GAAG,eAAe,EAAE,CAiDlF;AAED;;GAEG;AACH,MAAM,WAAW,MAAM;IACnB,MAAM,EAAE;QACJ,IAAI,EAAE,MAAM,CAAC;QACb,WAAW,EAAE,MAAM,CAAC;QACpB,cAAc,EAAE,MAAM,CAAC;QACvB,QAAQ,EAAE,MAAM,CAAC;KACpB,CAAC;IACF,IAAI,EAAE,MAAM,CAAC;IACb,OAAO,EAAE,MAAM,CAAC;IAChB,OAAO,EAAE,eAAe,EAAE,CAAC;CAC9B;AAED;;GAEG;AACH,wBAAgB,UAAU,CAAC,IAAI,EAAE,MAAM,GAAG,MAAM,CAE/C;AAED;;GAEG;AACH,wBAAgB,eAAe,CAAC,GAAG,EAAE,MAAM,EAAE,IAAI,CAAC,EAAE,MAAM,GAAG,eAAe,EAAE,CAmC7E;AAED;;GAEG;AACH,8BAAsB,SAAS;IAC3B;;OAEG;IACH,QAAQ,CAAC,QAAQ,CAAC,QAAQ,EAAE,MAAM,EAAE,iBAAiB,EAAE,MAAM,GAAG,OAAO,CAAC,OAAO,CAAC,eAAe,CAAC,CAAC;CAEpG;AAED,wBAAgB,eAAe,CAAC,GAAG,UAAU,EAAE,SAAS,EAAE,GAAG,SAAS,CAGrE"}
|
|
@@ -0,0 +1,162 @@
|
|
|
1
|
+
/**
|
|
2
|
+
* Baseline Validator Class
|
|
3
|
+
*/
|
|
4
|
+
import { readFileSync, existsSync, readdirSync } from 'fs';
|
|
5
|
+
import * as path from 'path';
|
|
6
|
+
/**
|
|
7
|
+
* Helper to process a set of results, averaging all runs of each runner-evaluator-case combination
|
|
8
|
+
*/
|
|
9
|
+
export function averageAcrossCases(results) {
|
|
10
|
+
const mappedResults = new Map();
|
|
11
|
+
const averagedResults = [];
|
|
12
|
+
// collect like-results
|
|
13
|
+
for (const result of results) {
|
|
14
|
+
// add this result to the map (grouping by runner & case)
|
|
15
|
+
const name = result.name;
|
|
16
|
+
const existingResult = mappedResults.get(name) ?? [];
|
|
17
|
+
existingResult.push(result);
|
|
18
|
+
mappedResults.set(name, existingResult);
|
|
19
|
+
}
|
|
20
|
+
// average the results
|
|
21
|
+
for (const [_key, groupedResults] of mappedResults) {
|
|
22
|
+
const avgData = groupedResults[0].data;
|
|
23
|
+
// sum all results except the first
|
|
24
|
+
for (const result of groupedResults.slice(1)) {
|
|
25
|
+
const resultData = result.data;
|
|
26
|
+
for (const [key, value] of Object.entries(resultData)) {
|
|
27
|
+
if (typeof value === 'number') {
|
|
28
|
+
avgData[key] = (avgData[key] ?? 0) + value;
|
|
29
|
+
}
|
|
30
|
+
}
|
|
31
|
+
}
|
|
32
|
+
// lastly, divide each entry by the number of 'groupedResults'
|
|
33
|
+
for (const [key, value] of Object.entries(avgData)) {
|
|
34
|
+
if (typeof value === 'number') {
|
|
35
|
+
avgData[key] = value / groupedResults.length;
|
|
36
|
+
// round to 2 decimal places
|
|
37
|
+
avgData[key] = Math.round(avgData[key] * 100) / 100;
|
|
38
|
+
}
|
|
39
|
+
}
|
|
40
|
+
averagedResults.push({
|
|
41
|
+
name: groupedResults[0].name,
|
|
42
|
+
metadata: groupedResults[0].metadata,
|
|
43
|
+
data: avgData
|
|
44
|
+
});
|
|
45
|
+
}
|
|
46
|
+
return averagedResults;
|
|
47
|
+
}
|
|
48
|
+
/**
|
|
49
|
+
* Averages all results across runners at the highest level, to get a single result for each runner
|
|
50
|
+
*/
|
|
51
|
+
export function averageAcrossRunners(results) {
|
|
52
|
+
// first average across runs
|
|
53
|
+
const processedResults = averageAcrossCases(results);
|
|
54
|
+
// now average across runners
|
|
55
|
+
const mappedResults = new Map();
|
|
56
|
+
const averagedResults = [];
|
|
57
|
+
// collect like-results
|
|
58
|
+
for (const result of processedResults) {
|
|
59
|
+
// add this result to the map (grouping by runner)
|
|
60
|
+
const name = result.metadata.runner;
|
|
61
|
+
const existingResult = mappedResults.get(name) ?? [];
|
|
62
|
+
existingResult.push(result);
|
|
63
|
+
mappedResults.set(name, existingResult);
|
|
64
|
+
}
|
|
65
|
+
// average the results
|
|
66
|
+
for (const [_key, groupedResults] of mappedResults) {
|
|
67
|
+
const avgData = groupedResults[0].data;
|
|
68
|
+
// sum all results except the first
|
|
69
|
+
for (const result of groupedResults.slice(1)) {
|
|
70
|
+
const resultData = result.data;
|
|
71
|
+
for (const [key, value] of Object.entries(resultData)) {
|
|
72
|
+
if (typeof value === 'number') {
|
|
73
|
+
avgData[key] = (avgData[key] ?? 0) + value;
|
|
74
|
+
}
|
|
75
|
+
}
|
|
76
|
+
}
|
|
77
|
+
// lastly, divide each entry by the number of 'groupedResults'
|
|
78
|
+
for (const [key, value] of Object.entries(avgData)) {
|
|
79
|
+
if (typeof value === 'number') {
|
|
80
|
+
avgData[key] = value / groupedResults.length;
|
|
81
|
+
// round to 2 decimal places
|
|
82
|
+
avgData[key] = Math.round(avgData[key] * 100) / 100;
|
|
83
|
+
}
|
|
84
|
+
}
|
|
85
|
+
averagedResults.push({
|
|
86
|
+
name: groupedResults[0].metadata.runner,
|
|
87
|
+
metadata: groupedResults[0].metadata,
|
|
88
|
+
data: avgData
|
|
89
|
+
});
|
|
90
|
+
}
|
|
91
|
+
return averagedResults;
|
|
92
|
+
}
|
|
93
|
+
/**
|
|
94
|
+
* Loads a specific report, containing evaluator results from a file & returns it
|
|
95
|
+
*/
|
|
96
|
+
export function loadReport(file) {
|
|
97
|
+
return JSON.parse(readFileSync(file, 'utf-8'));
|
|
98
|
+
}
|
|
99
|
+
/**
|
|
100
|
+
* Attempts to load the most recent evaluator results from the given file
|
|
101
|
+
*/
|
|
102
|
+
export function loadLastResults(dir, take) {
|
|
103
|
+
if (!existsSync(dir)) {
|
|
104
|
+
throw new Error(`Directory does not exist: ${dir}`);
|
|
105
|
+
}
|
|
106
|
+
let files = readdirSync(dir).filter(f => f.endsWith('.json'));
|
|
107
|
+
if (!take) {
|
|
108
|
+
const lastFile = path.join(dir, 'last.txt');
|
|
109
|
+
if (!existsSync(lastFile)) {
|
|
110
|
+
throw new Error(`Last file does not exist in directory: ${dir}. Try running an evaluation matrix first.`);
|
|
111
|
+
}
|
|
112
|
+
// read name from last file
|
|
113
|
+
const lastFileName = readFileSync(lastFile).toString();
|
|
114
|
+
files.push(lastFileName);
|
|
115
|
+
}
|
|
116
|
+
else {
|
|
117
|
+
// read the most recent files
|
|
118
|
+
files = files.sort().reverse().slice(0, take);
|
|
119
|
+
}
|
|
120
|
+
const results = [];
|
|
121
|
+
for (const file of files) {
|
|
122
|
+
const report = loadReport(path.join(dir, file));
|
|
123
|
+
results.push(...report.results);
|
|
124
|
+
}
|
|
125
|
+
// find the most recently created file in the path & read it
|
|
126
|
+
// const lastFileName = readFileSync(lastFile).toString();
|
|
127
|
+
// return loadReport(path.join(dir, lastFileName)).results;
|
|
128
|
+
return results;
|
|
129
|
+
}
|
|
130
|
+
/**
|
|
131
|
+
* Evaluator class for evaluating agent responses
|
|
132
|
+
*/
|
|
133
|
+
export class Evaluator {
|
|
134
|
+
}
|
|
135
|
+
export function mergeEvaluators(...evaluators) {
|
|
136
|
+
// merge evaluators in sequence
|
|
137
|
+
return evaluators.reduce((acc, val) => mergeEvaluatorsInternal(acc, val));
|
|
138
|
+
}
|
|
139
|
+
/**
|
|
140
|
+
* Merges two evaluators together in sequence, such that results of a are combined with b (b takes precedence in key overrides)
|
|
141
|
+
* @param a First evaluator to merge
|
|
142
|
+
* @param b Second evaluator to merge
|
|
143
|
+
*/
|
|
144
|
+
function mergeEvaluatorsInternal(a, b) {
|
|
145
|
+
return {
|
|
146
|
+
async evaluate(response, expected_response) {
|
|
147
|
+
const r1 = await a.evaluate(response, expected_response);
|
|
148
|
+
const r2 = await b.evaluate(response, expected_response);
|
|
149
|
+
return {
|
|
150
|
+
metadata: {
|
|
151
|
+
...r1.metadata,
|
|
152
|
+
...r2.metadata
|
|
153
|
+
},
|
|
154
|
+
data: {
|
|
155
|
+
...r1.data,
|
|
156
|
+
...r2.data
|
|
157
|
+
}
|
|
158
|
+
};
|
|
159
|
+
}
|
|
160
|
+
};
|
|
161
|
+
}
|
|
162
|
+
//# sourceMappingURL=evaluator.js.map
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
{"version":3,"file":"evaluator.js","sourceRoot":"","sources":["../../src/evaluator/evaluator.ts"],"names":[],"mappings":"AAAA;;GAEG;AAEH,OAAO,EAAE,YAAY,EAAE,UAAU,EAAE,WAAW,EAAE,MAAM,IAAI,CAAC;AAC3D,OAAO,KAAK,IAAI,MAAM,MAAM,CAAC;AA2B7B;;GAEG;AACH,MAAM,UAAU,kBAAkB,CAAC,OAA0B;IACzD,MAAM,aAAa,GAAmC,IAAI,GAAG,EAAE,CAAC;IAEhE,MAAM,eAAe,GAAsB,EAAE,CAAC;IAE9C,uBAAuB;IACvB,KAAK,MAAM,MAAM,IAAI,OAAO,EAAE,CAAC;QAC3B,yDAAyD;QACzD,MAAM,IAAI,GAAG,MAAM,CAAC,IAAI,CAAC;QACzB,MAAM,cAAc,GAAG,aAAa,CAAC,GAAG,CAAC,IAAI,CAAC,IAAI,EAAE,CAAC;QACrD,cAAc,CAAC,IAAI,CAAC,MAAM,CAAC,CAAC;QAC5B,aAAa,CAAC,GAAG,CAAC,IAAI,EAAE,cAAc,CAAC,CAAC;IAC5C,CAAC;IAED,sBAAsB;IACtB,KAAK,MAAM,CAAC,IAAI,EAAE,cAAc,CAAC,IAAI,aAAa,EAAE,CAAC;QACjD,MAAM,OAAO,GAAG,cAAc,CAAC,CAAC,CAAC,CAAC,IAAI,CAAC;QAEvC,mCAAmC;QACnC,KAAK,MAAM,MAAM,IAAI,cAAc,CAAC,KAAK,CAAC,CAAC,CAAC,EAAE,CAAC;YAC3C,MAAM,UAAU,GAAG,MAAM,CAAC,IAAI,CAAC;YAC/B,KAAK,MAAM,CAAC,GAAG,EAAE,KAAK,CAAC,IAAI,MAAM,CAAC,OAAO,CAAC,UAAU,CAAC,EAAE,CAAC;gBACpD,IAAI,OAAO,KAAK,KAAK,QAAQ,EAAE,CAAC;oBAC5B,OAAO,CAAC,GAAG,CAAC,GAAG,CAAC,OAAO,CAAC,GAAG,CAAW,IAAI,CAAC,CAAC,GAAG,KAAK,CAAC;gBACzD,CAAC;YACL,CAAC;QACL,CAAC;QAED,8DAA8D;QAC9D,KAAK,MAAM,CAAC,GAAG,EAAE,KAAK,CAAC,IAAI,MAAM,CAAC,OAAO,CAAC,OAAO,CAAC,EAAE,CAAC;YACjD,IAAI,OAAO,KAAK,KAAK,QAAQ,EAAE,CAAC;gBAC5B,OAAO,CAAC,GAAG,CAAC,GAAG,KAAK,GAAG,cAAc,CAAC,MAAM,CAAC;gBAC7C,4BAA4B;gBAC5B,OAAO,CAAC,GAAG,CAAC,GAAG,IAAI,CAAC,KAAK,CAAE,OAAO,CAAC,GAAG,CAAY,GAAG,GAAG,CAAC,GAAG,GAAG,CAAC;YACpE,CAAC;QACL,CAAC;QAED,eAAe,CAAC,IAAI,CAAC;YACjB,IAAI,EAAE,cAAc,CAAC,CAAC,CAAC,CAAC,IAAI;YAC5B,QAAQ,EAAE,cAAc,CAAC,CAAC,CAAC,CAAC,QAAQ;YACpC,IAAI,EAAE,OAAO;SAChB,CAAC,CAAC;IACP,CAAC;IACD,OAAO,eAAe,CAAC;AAC3B,CAAC;AAED;;GAEG;AACH,MAAM,UAAU,oBAAoB,CAAC,OAA0B;IAC3D,4BAA4B;IAC5B,MAAM,gBAAgB,GAAG,kBAAkB,CAAC,OAAO,CAAC,CAAC;IAErD,6BAA6B;IAC7B,MAAM,aAAa,GAAmC,IAAI,GAAG,EAAE,CAAC;IAEhE,MAAM,eAAe,GAAsB,EAAE,CAAC;IAE9C,uBAAuB;IACvB,KAAK,MAAM,MAAM,IAAI,gBAAgB,EAAE,CAAC;QACpC,kDAAkD;QAClD,MAAM,IAAI,GAAG,MAAM,CAAC,QAAQ,CAAC,MAAM,CAAC;QACpC,MAAM,cAAc,GAAG,aAAa,CAAC,GAAG,CAAC,IAAI,CAAC,IAAI,EAAE,CAAC;QACrD,cAAc,CAAC,IAAI,CAAC,MAAM,CAAC,CAAC;QAC5B,aAAa,CAAC,GAAG,CAAC,IAAI,EAAE,cAAc,CAAC,CAAC;IAC5C,CAAC;IAED,sBAAsB;IACtB,KAAK,MAAM,CAAC,IAAI,EAAE,cAAc,CAAC,IAAI,aAAa,EAAE,CAAC;QACjD,MAAM,OAAO,GAAG,cAAc,CAAC,CAAC,CAAC,CAAC,IAAI,CAAC;QAEvC,mCAAmC;QACnC,KAAK,MAAM,MAAM,IAAI,cAAc,CAAC,KAAK,CAAC,CAAC,CAAC,EAAE,CAAC;YAC3C,MAAM,UAAU,GAAG,MAAM,CAAC,IAAI,CAAC;YAC/B,KAAK,MAAM,CAAC,GAAG,EAAE,KAAK,CAAC,IAAI,MAAM,CAAC,OAAO,CAAC,UAAU,CAAC,EAAE,CAAC;gBACpD,IAAI,OAAO,KAAK,KAAK,QAAQ,EAAE,CAAC;oBAC5B,OAAO,CAAC,GAAG,CAAC,GAAG,CAAC,OAAO,CAAC,GAAG,CAAW,IAAI,CAAC,CAAC,GAAG,KAAK,CAAC;gBACzD,CAAC;YACL,CAAC;QACL,CAAC;QAED,8DAA8D;QAC9D,KAAK,MAAM,CAAC,GAAG,EAAE,KAAK,CAAC,IAAI,MAAM,CAAC,OAAO,CAAC,OAAO,CAAC,EAAE,CAAC;YACjD,IAAI,OAAO,KAAK,KAAK,QAAQ,EAAE,CAAC;gBAC5B,OAAO,CAAC,GAAG,CAAC,GAAG,KAAK,GAAG,cAAc,CAAC,MAAM,CAAC;gBAC7C,4BAA4B;gBAC5B,OAAO,CAAC,GAAG,CAAC,GAAG,IAAI,CAAC,KAAK,CAAE,OAAO,CAAC,GAAG,CAAY,GAAG,GAAG,CAAC,GAAG,GAAG,CAAC;YACpE,CAAC;QACL,CAAC;QAED,eAAe,CAAC,IAAI,CAAC;YACjB,IAAI,EAAE,cAAc,CAAC,CAAC,CAAC,CAAC,QAAQ,CAAC,MAAM;YACvC,QAAQ,EAAE,cAAc,CAAC,CAAC,CAAC,CAAC,QAAQ;YACpC,IAAI,EAAE,OAAO;SAChB,CAAC,CAAC;IACP,CAAC;IAED,OAAO,eAAe,CAAC;AAC3B,CAAC;AAiBD;;GAEG;AACH,MAAM,UAAU,UAAU,CAAC,IAAY;IACnC,OAAO,IAAI,CAAC,KAAK,CAAC,YAAY,CAAC,IAAI,EAAE,OAAO,CAAC,CAAW,CAAC;AAC7D,CAAC;AAED;;GAEG;AACH,MAAM,UAAU,eAAe,CAAC,GAAW,EAAE,IAAa;IACtD,IAAI,CAAC,UAAU,CAAC,GAAG,CAAC,EAAE,CAAC;QACnB,MAAM,IAAI,KAAK,CAAC,6BAA6B,GAAG,EAAE,CAAC,CAAC;IACxD,CAAC;IAED,IAAI,KAAK,GAAG,WAAW,CAAC,GAAG,CAAC,CAAC,MAAM,CAAC,CAAC,CAAC,EAAE,CAAC,CAAC,CAAC,QAAQ,CAAC,OAAO,CAAC,CAAC,CAAC;IAE9D,IAAI,CAAC,IAAI,EAAE,CAAC;QACR,MAAM,QAAQ,GAAG,IAAI,CAAC,IAAI,CAAC,GAAG,EAAE,UAAU,CAAC,CAAC;QAE5C,IAAI,CAAC,UAAU,CAAC,QAAQ,CAAC,EAAE,CAAC;YACxB,MAAM,IAAI,KAAK,CAAC,0CAA0C,GAAG,2CAA2C,CAAC,CAAC;QAC9G,CAAC;QACD,2BAA2B;QAC3B,MAAM,YAAY,GAAG,YAAY,CAAC,QAAQ,CAAC,CAAC,QAAQ,EAAE,CAAC;QAEvD,KAAK,CAAC,IAAI,CAAC,YAAY,CAAC,CAAC;IAE7B,CAAC;SAAM,CAAC;QACJ,6BAA6B;QAC7B,KAAK,GAAG,KAAK,CAAC,IAAI,EAAE,CAAC,OAAO,EAAE,CAAC,KAAK,CAAC,CAAC,EAAE,IAAI,CAAC,CAAC;IAElD,CAAC;IAED,MAAM,OAAO,GAAsB,EAAE,CAAC;IAEtC,KAAK,MAAM,IAAI,IAAI,KAAK,EAAE,CAAC;QACvB,MAAM,MAAM,GAAG,UAAU,CAAC,IAAI,CAAC,IAAI,CAAC,GAAG,EAAE,IAAI,CAAC,CAAC,CAAC;QAChD,OAAO,CAAC,IAAI,CAAC,GAAG,MAAM,CAAC,OAAO,CAAC,CAAC;IACpC,CAAC;IAED,4DAA4D;IAC5D,0DAA0D;IAC1D,2DAA2D;IAC3D,OAAO,OAAO,CAAC;AACnB,CAAC;AAED;;GAEG;AACH,MAAM,OAAgB,SAAS;CAM9B;AAED,MAAM,UAAU,eAAe,CAAC,GAAG,UAAuB;IACtD,+BAA+B;IAC/B,OAAO,UAAU,CAAC,MAAM,CAAC,CAAC,GAAG,EAAE,GAAG,EAAE,EAAE,CAAC,uBAAuB,CAAC,GAAG,EAAE,GAAG,CAAC,CAAC,CAAC;AAC9E,CAAC;AAED;;;;GAIG;AACH,SAAS,uBAAuB,CAAC,CAAY,EAAE,CAAY;IACvD,OAAO;QACH,KAAK,CAAC,QAAQ,CAAC,QAAgB,EAAE,iBAAyB;YACtD,MAAM,EAAE,GAAG,MAAM,CAAC,CAAC,QAAQ,CAAC,QAAQ,EAAE,iBAAiB,CAAC,CAAC;YACzD,MAAM,EAAE,GAAG,MAAM,CAAC,CAAC,QAAQ,CAAC,QAAQ,EAAE,iBAAiB,CAAC,CAAC;YACzD,OAAO;gBACH,QAAQ,EAAE;oBACN,GAAG,EAAE,CAAC,QAAQ;oBACd,GAAG,EAAE,CAAC,QAAQ;iBACjB;gBACD,IAAI,EAAE;oBACF,GAAG,EAAE,CAAC,IAAI;oBACV,GAAG,EAAE,CAAC,IAAI;iBACb;aACJ,CAAC;QACN,CAAC;KACJ,CAAC;AACN,CAAC"}
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
{"version":3,"file":"index.d.ts","sourceRoot":"","sources":["../../src/evaluator/index.ts"],"names":[],"mappings":"AAAA,cAAc,gBAAgB,CAAC;AAC/B,cAAc,wBAAwB,CAAC;AACvC,cAAc,8BAA8B,CAAC;AAC7C,cAAc,kBAAkB,CAAC;AACjC,cAAc,YAAY,CAAC"}
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
{"version":3,"file":"index.js","sourceRoot":"","sources":["../../src/evaluator/index.ts"],"names":[],"mappings":"AAAA,cAAc,gBAAgB,CAAC;AAC/B,cAAc,wBAAwB,CAAC;AACvC,cAAc,8BAA8B,CAAC;AAC7C,cAAc,kBAAkB,CAAC;AACjC,cAAc,YAAY,CAAC"}
|
|
@@ -0,0 +1,55 @@
|
|
|
1
|
+
/**
|
|
2
|
+
* Base Langium DSL validator (taps into Langium's validator messages to provide better results)
|
|
3
|
+
*/
|
|
4
|
+
import { LangiumServices } from "langium/lsp";
|
|
5
|
+
import { Diagnostic } from "vscode-languageserver-types";
|
|
6
|
+
import { Evaluator, EvaluatorResult, EvaluatorResultData } from "./evaluator.js";
|
|
7
|
+
/**
|
|
8
|
+
* Langium-specific evaluator result data
|
|
9
|
+
*/
|
|
10
|
+
export interface LangiumEvaluatorResultData extends EvaluatorResultData {
|
|
11
|
+
/**
|
|
12
|
+
* Number of validation failures
|
|
13
|
+
*/
|
|
14
|
+
failures: number;
|
|
15
|
+
/**
|
|
16
|
+
* Number of errors
|
|
17
|
+
*/
|
|
18
|
+
errors: number;
|
|
19
|
+
/**
|
|
20
|
+
* Number of warnings
|
|
21
|
+
*/
|
|
22
|
+
warnings: number;
|
|
23
|
+
/**
|
|
24
|
+
* Number of infos
|
|
25
|
+
*/
|
|
26
|
+
infos: number;
|
|
27
|
+
/**
|
|
28
|
+
* Number of hints
|
|
29
|
+
*/
|
|
30
|
+
hints: number;
|
|
31
|
+
/**
|
|
32
|
+
* Number of unassigned diagnostics
|
|
33
|
+
*/
|
|
34
|
+
unassigned: number;
|
|
35
|
+
/**
|
|
36
|
+
* Length of the response in chars
|
|
37
|
+
*/
|
|
38
|
+
response_length: number;
|
|
39
|
+
/**
|
|
40
|
+
* Raw diagnostic data, same which is used to compute the other values above
|
|
41
|
+
*/
|
|
42
|
+
diagnostics: Diagnostic[];
|
|
43
|
+
}
|
|
44
|
+
export declare class LangiumEvaluator<T extends LangiumServices> extends Evaluator {
|
|
45
|
+
/**
|
|
46
|
+
* Services to use for evaluation
|
|
47
|
+
*/
|
|
48
|
+
protected services: T;
|
|
49
|
+
constructor(services: T);
|
|
50
|
+
/**
|
|
51
|
+
* Validate an agent response as if it's a langium program. If we can parse it, we attempt to validate it.
|
|
52
|
+
*/
|
|
53
|
+
evaluate(response: string): Promise<Partial<EvaluatorResult>>;
|
|
54
|
+
}
|
|
55
|
+
//# sourceMappingURL=langium-evaluator.d.ts.map
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
{"version":3,"file":"langium-evaluator.d.ts","sourceRoot":"","sources":["../../src/evaluator/langium-evaluator.ts"],"names":[],"mappings":"AAAA;;GAEG;AAEH,OAAO,EAAE,eAAe,EAAE,MAAM,aAAa,CAAC;AAC9C,OAAO,EAAE,UAAU,EAAE,MAAM,6BAA6B,CAAC;AACzD,OAAO,EAAE,SAAS,EAAE,eAAe,EAAE,mBAAmB,EAAE,MAAM,gBAAgB,CAAC;AAGjF;;GAEG;AACH,MAAM,WAAW,0BAA2B,SAAQ,mBAAmB;IAEnE;;OAEG;IACH,QAAQ,EAAE,MAAM,CAAC;IAEjB;;OAEG;IACH,MAAM,EAAE,MAAM,CAAC;IAEf;;OAEG;IACH,QAAQ,EAAE,MAAM,CAAC;IAEjB;;OAEG;IACH,KAAK,EAAE,MAAM,CAAC;IAEd;;OAEG;IACH,KAAK,EAAE,MAAM,CAAC;IAEd;;OAEG;IACH,UAAU,EAAE,MAAM,CAAC;IAEnB;;OAEG;IACH,eAAe,EAAE,MAAM,CAAC;IAExB;;OAEG;IACH,WAAW,EAAE,UAAU,EAAE,CAAC;CAC7B;AAED,qBAAa,gBAAgB,CAAC,CAAC,SAAS,eAAe,CAAE,SAAQ,SAAS;IAEtE;;OAEG;IACH,SAAS,CAAC,QAAQ,EAAE,CAAC,CAAC;gBAEV,QAAQ,EAAE,CAAC;IAKvB;;OAEG;IACG,QAAQ,CAAC,QAAQ,EAAE,MAAM,GAAG,OAAO,CAAC,OAAO,CAAC,eAAe,CAAC,CAAC;CAqEtE"}
|
|
@@ -0,0 +1,78 @@
|
|
|
1
|
+
/**
|
|
2
|
+
* Base Langium DSL validator (taps into Langium's validator messages to provide better results)
|
|
3
|
+
*/
|
|
4
|
+
import { Evaluator } from "./evaluator.js";
|
|
5
|
+
import { URI } from "langium";
|
|
6
|
+
export class LangiumEvaluator extends Evaluator {
|
|
7
|
+
constructor(services) {
|
|
8
|
+
super();
|
|
9
|
+
this.services = services;
|
|
10
|
+
}
|
|
11
|
+
/**
|
|
12
|
+
* Validate an agent response as if it's a langium program. If we can parse it, we attempt to validate it.
|
|
13
|
+
*/
|
|
14
|
+
async evaluate(response) {
|
|
15
|
+
if (response.includes('```')) {
|
|
16
|
+
// take the first code block instead, if present (assuming it's a langium grammar)
|
|
17
|
+
const codeBlock = response.split(/```[a-z-]*/)[1];
|
|
18
|
+
response = codeBlock;
|
|
19
|
+
}
|
|
20
|
+
const doc = this.services.shared.workspace.LangiumDocumentFactory.fromString(response, URI.parse('memory://test.langium'));
|
|
21
|
+
try {
|
|
22
|
+
await this.services.shared.workspace.DocumentBuilder.build([doc], { validation: true });
|
|
23
|
+
const validationResults = doc.diagnostics ?? [];
|
|
24
|
+
// count the number of each type of diagnostic
|
|
25
|
+
let evalData = {
|
|
26
|
+
failures: 0,
|
|
27
|
+
errors: 0,
|
|
28
|
+
warnings: 0,
|
|
29
|
+
infos: 0,
|
|
30
|
+
hints: 0,
|
|
31
|
+
unassigned: 0,
|
|
32
|
+
// include length of the response for checking
|
|
33
|
+
response_length: response.length,
|
|
34
|
+
// include the diagnostics for debugging if desired
|
|
35
|
+
diagnostics: validationResults
|
|
36
|
+
};
|
|
37
|
+
for (const diagnostic of validationResults) {
|
|
38
|
+
if (diagnostic.severity) {
|
|
39
|
+
switch (diagnostic.severity) {
|
|
40
|
+
case 1:
|
|
41
|
+
evalData.errors++;
|
|
42
|
+
break;
|
|
43
|
+
case 2:
|
|
44
|
+
evalData.warnings++;
|
|
45
|
+
break;
|
|
46
|
+
case 3:
|
|
47
|
+
evalData.infos++;
|
|
48
|
+
break;
|
|
49
|
+
case 4:
|
|
50
|
+
evalData.hints++;
|
|
51
|
+
break;
|
|
52
|
+
default:
|
|
53
|
+
evalData.unassigned++;
|
|
54
|
+
break;
|
|
55
|
+
}
|
|
56
|
+
}
|
|
57
|
+
}
|
|
58
|
+
return {
|
|
59
|
+
data: evalData
|
|
60
|
+
};
|
|
61
|
+
}
|
|
62
|
+
catch (e) {
|
|
63
|
+
console.error('Error during evaluation: ', e);
|
|
64
|
+
return {
|
|
65
|
+
data: {
|
|
66
|
+
failures: 1,
|
|
67
|
+
errors: 0,
|
|
68
|
+
warnings: 0,
|
|
69
|
+
infos: 0,
|
|
70
|
+
hints: 0,
|
|
71
|
+
unassigned: 0,
|
|
72
|
+
response_length: response.length
|
|
73
|
+
}
|
|
74
|
+
};
|
|
75
|
+
}
|
|
76
|
+
}
|
|
77
|
+
}
|
|
78
|
+
//# sourceMappingURL=langium-evaluator.js.map
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
{"version":3,"file":"langium-evaluator.js","sourceRoot":"","sources":["../../src/evaluator/langium-evaluator.ts"],"names":[],"mappings":"AAAA;;GAEG;AAIH,OAAO,EAAE,SAAS,EAAwC,MAAM,gBAAgB,CAAC;AACjF,OAAO,EAAE,GAAG,EAAE,MAAM,SAAS,CAAC;AAgD9B,MAAM,OAAO,gBAA4C,SAAQ,SAAS;IAOtE,YAAY,QAAW;QACnB,KAAK,EAAE,CAAC;QACR,IAAI,CAAC,QAAQ,GAAG,QAAQ,CAAC;IAC7B,CAAC;IAED;;OAEG;IACH,KAAK,CAAC,QAAQ,CAAC,QAAgB;QAE3B,IAAI,QAAQ,CAAC,QAAQ,CAAC,KAAK,CAAC,EAAE,CAAC;YAC3B,kFAAkF;YAClF,MAAM,SAAS,GAAG,QAAQ,CAAC,KAAK,CAAC,YAAY,CAAC,CAAC,CAAC,CAAC,CAAC;YAClD,QAAQ,GAAG,SAAS,CAAC;QACzB,CAAC;QAED,MAAM,GAAG,GAAG,IAAI,CAAC,QAAQ,CAAC,MAAM,CAAC,SAAS,CAAC,sBAAsB,CAAC,UAAU,CAAC,QAAQ,EAAE,GAAG,CAAC,KAAK,CAAC,uBAAuB,CAAC,CAAC,CAAC;QAE3H,IAAI,CAAC;YACD,MAAM,IAAI,CAAC,QAAQ,CAAC,MAAM,CAAC,SAAS,CAAC,eAAe,CAAC,KAAK,CAAC,CAAC,GAAG,CAAC,EAAE,EAAE,UAAU,EAAE,IAAI,EAAE,CAAC,CAAC;YACxF,MAAM,iBAAiB,GAAG,GAAG,CAAC,WAAW,IAAI,EAAE,CAAC;YAEhD,8CAA8C;YAC9C,IAAI,QAAQ,GAA+B;gBACvC,QAAQ,EAAE,CAAC;gBACX,MAAM,EAAE,CAAC;gBACT,QAAQ,EAAE,CAAC;gBACX,KAAK,EAAE,CAAC;gBACR,KAAK,EAAE,CAAC;gBACR,UAAU,EAAE,CAAC;gBACb,8CAA8C;gBAC9C,eAAe,EAAE,QAAQ,CAAC,MAAM;gBAChC,mDAAmD;gBACnD,WAAW,EAAE,iBAAiB;aACjC,CAAC;YAEF,KAAK,MAAM,UAAU,IAAI,iBAAiB,EAAE,CAAC;gBACzC,IAAI,UAAU,CAAC,QAAQ,EAAE,CAAC;oBACtB,QAAQ,UAAU,CAAC,QAAQ,EAAE,CAAC;wBAC1B,KAAK,CAAC;4BACF,QAAQ,CAAC,MAAM,EAAE,CAAC;4BAClB,MAAM;wBACV,KAAK,CAAC;4BACF,QAAQ,CAAC,QAAQ,EAAE,CAAC;4BACpB,MAAM;wBACV,KAAK,CAAC;4BACF,QAAQ,CAAC,KAAK,EAAE,CAAC;4BACjB,MAAM;wBACV,KAAK,CAAC;4BACF,QAAQ,CAAC,KAAK,EAAE,CAAC;4BACjB,MAAM;wBACV;4BACI,QAAQ,CAAC,UAAU,EAAE,CAAC;4BACtB,MAAM;oBACd,CAAC;gBACL,CAAC;YACL,CAAC;YAED,OAAO;gBACH,IAAI,EAAE,QAAQ;aACjB,CAAC;QAEN,CAAC;QAAC,OAAO,CAAC,EAAE,CAAC;YACT,OAAO,CAAC,KAAK,CAAC,2BAA2B,EAAE,CAAC,CAAC,CAAC;YAC9C,OAAO;gBACH,IAAI,EAAE;oBACF,QAAQ,EAAE,CAAC;oBACX,MAAM,EAAE,CAAC;oBACT,QAAQ,EAAE,CAAC;oBACX,KAAK,EAAE,CAAC;oBACR,KAAK,EAAE,CAAC;oBACR,UAAU,EAAE,CAAC;oBACb,eAAe,EAAE,QAAQ,CAAC,MAAM;iBACL;aAClC,CAAC;QACN,CAAC;IACL,CAAC;CACJ"}
|
package/dist/index.d.ts
ADDED
|
@@ -0,0 +1 @@
|
|
|
1
|
+
{"version":3,"file":"index.d.ts","sourceRoot":"","sources":["../src/index.ts"],"names":[],"mappings":"AAAA,cAAc,sBAAsB,CAAC;AACrC,cAAc,qBAAqB,CAAC"}
|
package/dist/index.js
ADDED
|
@@ -0,0 +1 @@
|
|
|
1
|
+
{"version":3,"file":"index.js","sourceRoot":"","sources":["../src/index.ts"],"names":[],"mappings":"AAAA,cAAc,sBAAsB,CAAC;AACrC,cAAc,qBAAqB,CAAC"}
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
{"version":3,"file":"index.d.ts","sourceRoot":"","sources":["../../src/splitter/index.ts"],"names":[],"mappings":"AAAA,cAAc,eAAe,CAAC"}
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
{"version":3,"file":"index.js","sourceRoot":"","sources":["../../src/splitter/index.ts"],"names":[],"mappings":"AAAA,cAAc,eAAe,CAAC"}
|
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
import { AstNode } from "langium";
|
|
2
|
+
import { LangiumServices } from "langium/lsp";
|
|
3
|
+
interface SplitterOptions {
|
|
4
|
+
/**
|
|
5
|
+
* List of comment rule names to include in the chunk.
|
|
6
|
+
* If not provided comments are ignored.
|
|
7
|
+
* Default: ['ML_COMMENT', 'SL_COMMENT']
|
|
8
|
+
*/
|
|
9
|
+
commentRuleNames?: string[];
|
|
10
|
+
}
|
|
11
|
+
/**
|
|
12
|
+
* Splitter function that splits a single text document into 1 or more chunks based on a splitting strategy
|
|
13
|
+
* @param document - The text document to be split.
|
|
14
|
+
* @param nodePredicates - The predicates to determine the nodes for splitting.
|
|
15
|
+
* @param services - The Langium services used for parsing the document.
|
|
16
|
+
* @param options - The splitter configuration. See {@link SplitterOptions}.
|
|
17
|
+
* @returns The chunks of the split document.
|
|
18
|
+
*/
|
|
19
|
+
export declare function splitByNode(document: string, nodePredicates: Array<(node: AstNode) => boolean> | ((node: AstNode) => boolean), services: LangiumServices, options?: SplitterOptions): string[];
|
|
20
|
+
export {};
|
|
21
|
+
//# sourceMappingURL=splitter.d.ts.map
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
{"version":3,"file":"splitter.d.ts","sourceRoot":"","sources":["../../src/splitter/splitter.ts"],"names":[],"mappings":"AAAA,OAAO,EAAE,OAAO,EAAiB,MAAM,SAAS,CAAC;AACjD,OAAO,EAAE,eAAe,EAAE,MAAM,aAAa,CAAC;AAG9C,UAAU,eAAe;IACrB;;;;MAIE;IACF,gBAAgB,CAAC,EAAE,MAAM,EAAE,CAAA;CAC9B;AAED;;;;;;;GAOG;AACH,wBAAgB,WAAW,CACvB,QAAQ,EAAE,MAAM,EAChB,cAAc,EAAE,KAAK,CAAC,CAAC,IAAI,EAAE,OAAO,KAAK,OAAO,CAAC,GAAG,CAAC,CAAC,IAAI,EAAE,OAAO,KAAK,OAAO,CAAC,EAChF,QAAQ,EAAE,eAAe,EACzB,OAAO,GAAE,eAAoE,GAAG,MAAM,EAAE,CA2D3F"}
|
|
@@ -0,0 +1,59 @@
|
|
|
1
|
+
import { CstUtils, URI } from "langium";
|
|
2
|
+
import { AstUtils } from 'langium';
|
|
3
|
+
/**
|
|
4
|
+
* Splitter function that splits a single text document into 1 or more chunks based on a splitting strategy
|
|
5
|
+
* @param document - The text document to be split.
|
|
6
|
+
* @param nodePredicates - The predicates to determine the nodes for splitting.
|
|
7
|
+
* @param services - The Langium services used for parsing the document.
|
|
8
|
+
* @param options - The splitter configuration. See {@link SplitterOptions}.
|
|
9
|
+
* @returns The chunks of the split document.
|
|
10
|
+
*/
|
|
11
|
+
export function splitByNode(document, nodePredicates, services, options = { commentRuleNames: ['ML_COMMENT', 'SL_COMMENT'] }) {
|
|
12
|
+
// 1. parse the document into an AST
|
|
13
|
+
// 2. verify that we parsed the document correctly
|
|
14
|
+
// 3. split the document into chunks based on the node
|
|
15
|
+
// 4. using the corresponding CST offsets from those nodes, split the document into chunks
|
|
16
|
+
// 5. return the chunks
|
|
17
|
+
if (document.trim() === '') {
|
|
18
|
+
return [];
|
|
19
|
+
}
|
|
20
|
+
const langiumDoc = services.shared.workspace.LangiumDocumentFactory.fromString(document, URI.parse('memory://document.langium'));
|
|
21
|
+
// not checking for lexer or parser errors here...
|
|
22
|
+
const txtDoc = langiumDoc.textDocument;
|
|
23
|
+
const chunks = [];
|
|
24
|
+
const predicates = Array.isArray(nodePredicates) ? nodePredicates : [nodePredicates];
|
|
25
|
+
// selectively stream nodes from the ast in langium
|
|
26
|
+
const stream = AstUtils.streamAst(langiumDoc.parseResult.value);
|
|
27
|
+
for (const node of stream) {
|
|
28
|
+
if (predicates.some(p => p(node))) {
|
|
29
|
+
// get the starting point of this node
|
|
30
|
+
let start = node.$cstNode?.range.start;
|
|
31
|
+
if (options?.commentRuleNames) {
|
|
32
|
+
// include comments in the chunk
|
|
33
|
+
const cstNode = node.$cstNode;
|
|
34
|
+
const commentNode = CstUtils.findCommentNode(cstNode, options.commentRuleNames);
|
|
35
|
+
if (commentNode) {
|
|
36
|
+
// adjust start to include the comment
|
|
37
|
+
start = commentNode.range.start;
|
|
38
|
+
}
|
|
39
|
+
}
|
|
40
|
+
const end = node.$cstNode?.range.end;
|
|
41
|
+
// add a chunk from the last offset to the start of this node
|
|
42
|
+
const chunk = txtDoc.getText({
|
|
43
|
+
start: {
|
|
44
|
+
line: start?.line || 0,
|
|
45
|
+
character: start?.character || 0
|
|
46
|
+
},
|
|
47
|
+
end: {
|
|
48
|
+
line: end?.line || 0,
|
|
49
|
+
character: end?.character || 0
|
|
50
|
+
}
|
|
51
|
+
});
|
|
52
|
+
if (chunk.trim().length > 0) {
|
|
53
|
+
chunks.push(chunk);
|
|
54
|
+
}
|
|
55
|
+
}
|
|
56
|
+
}
|
|
57
|
+
return chunks;
|
|
58
|
+
}
|
|
59
|
+
//# sourceMappingURL=splitter.js.map
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
{"version":3,"file":"splitter.js","sourceRoot":"","sources":["../../src/splitter/splitter.ts"],"names":[],"mappings":"AAAA,OAAO,EAAW,QAAQ,EAAE,GAAG,EAAE,MAAM,SAAS,CAAC;AAEjD,OAAO,EAAE,QAAQ,EAAE,MAAM,SAAS,CAAC;AAWnC;;;;;;;GAOG;AACH,MAAM,UAAU,WAAW,CACvB,QAAgB,EAChB,cAAgF,EAChF,QAAyB,EACzB,UAA2B,EAAE,gBAAgB,EAAE,CAAC,YAAY,EAAE,YAAY,CAAC,EAAE;IAC7E,oCAAoC;IACpC,kDAAkD;IAClD,sDAAsD;IACtD,0FAA0F;IAC1F,uBAAuB;IAEvB,IAAI,QAAQ,CAAC,IAAI,EAAE,KAAK,EAAE,EAAE,CAAC;QACzB,OAAO,EAAE,CAAC;IACd,CAAC;IAED,MAAM,UAAU,GAAG,QAAQ,CAAC,MAAM,CAAC,SAAS,CAAC,sBAAsB,CAAC,UAAU,CAAC,QAAQ,EAAE,GAAG,CAAC,KAAK,CAAC,2BAA2B,CAAC,CAAC,CAAC;IAEjI,kDAAkD;IAElD,MAAM,MAAM,GAAG,UAAU,CAAC,YAAY,CAAC;IAEvC,MAAM,MAAM,GAAa,EAAE,CAAC;IAE5B,MAAM,UAAU,GAAG,KAAK,CAAC,OAAO,CAAC,cAAc,CAAC,CAAC,CAAC,CAAC,cAAc,CAAC,CAAC,CAAC,CAAC,cAAc,CAAC,CAAC;IAErF,mDAAmD;IACnD,MAAM,MAAM,GAAG,QAAQ,CAAC,SAAS,CAAC,UAAU,CAAC,WAAW,CAAC,KAAK,CAAC,CAAC;IAChE,KAAK,MAAM,IAAI,IAAI,MAAM,EAAE,CAAC;QACxB,IAAI,UAAU,CAAC,IAAI,CAAC,CAAC,CAAC,EAAE,CAAC,CAAC,CAAC,IAAI,CAAC,CAAC,EAAE,CAAC;YAChC,sCAAsC;YACtC,IAAI,KAAK,GAAG,IAAI,CAAC,QAAQ,EAAE,KAAK,CAAC,KAAK,CAAC;YAEvC,IAAI,OAAO,EAAE,gBAAgB,EAAE,CAAC;gBAC5B,gCAAgC;gBAChC,MAAM,OAAO,GAAG,IAAI,CAAC,QAAQ,CAAC;gBAC9B,MAAM,WAAW,GAAG,QAAQ,CAAC,eAAe,CAAC,OAAO,EAAE,OAAO,CAAC,gBAAgB,CAAC,CAAC;gBAChF,IAAI,WAAW,EAAE,CAAC;oBACd,sCAAsC;oBACtC,KAAK,GAAG,WAAW,CAAC,KAAK,CAAC,KAAK,CAAC;gBACpC,CAAC;YACL,CAAC;YAED,MAAM,GAAG,GAAG,IAAI,CAAC,QAAQ,EAAE,KAAK,CAAC,GAAG,CAAC;YACrC,6DAA6D;YAC7D,MAAM,KAAK,GAAG,MAAM,CAAC,OAAO,CAAC;gBACzB,KAAK,EAAE;oBACH,IAAI,EAAE,KAAK,EAAE,IAAI,IAAI,CAAC;oBACtB,SAAS,EAAE,KAAK,EAAE,SAAS,IAAI,CAAC;iBACnC;gBACD,GAAG,EAAE;oBACD,IAAI,EAAE,GAAG,EAAE,IAAI,IAAI,CAAC;oBACpB,SAAS,EAAE,GAAG,EAAE,SAAS,IAAI,CAAC;iBACjC;aACJ,CAAC,CAAC;YAEH,IAAI,KAAK,CAAC,IAAI,EAAE,CAAC,MAAM,GAAG,CAAC,EAAE,CAAC;gBAC1B,MAAM,CAAC,IAAI,CAAC,KAAK,CAAC,CAAC;YACvB,CAAC;QACL,CAAC;IACL,CAAC;IAED,OAAO,MAAM,CAAC;AAElB,CAAC"}
|
package/package.json
ADDED
|
@@ -0,0 +1,61 @@
|
|
|
1
|
+
{
|
|
2
|
+
"name": "langium-ai-tools",
|
|
3
|
+
"version": "0.0.1",
|
|
4
|
+
"description": "Tooling for building AI Applications that leverage Langium DSLs",
|
|
5
|
+
"repository": {
|
|
6
|
+
"type": "git",
|
|
7
|
+
"url": "git+https://github.com/eclipse-langium/langium-ai.git",
|
|
8
|
+
"directory": "packages/langium-ai-tools"
|
|
9
|
+
},
|
|
10
|
+
"bugs": "https://github.com/eclipse-langium/langium-ai/issues",
|
|
11
|
+
"type": "module",
|
|
12
|
+
"main": "dist/index.js",
|
|
13
|
+
"private": false,
|
|
14
|
+
"files": [
|
|
15
|
+
"dist"
|
|
16
|
+
],
|
|
17
|
+
"exports": {
|
|
18
|
+
".": {
|
|
19
|
+
"import": "./dist/index.js",
|
|
20
|
+
"types": "./dist/index.d.ts"
|
|
21
|
+
},
|
|
22
|
+
"./splitter": {
|
|
23
|
+
"import": "./dist/splitter/index.js",
|
|
24
|
+
"types": "./dist/splitter/index.d.ts"
|
|
25
|
+
},
|
|
26
|
+
"./evaluator": {
|
|
27
|
+
"import": "./dist/evaluator/index.js",
|
|
28
|
+
"types": "./dist/evaluator/index.d.ts"
|
|
29
|
+
}
|
|
30
|
+
},
|
|
31
|
+
"scripts": {
|
|
32
|
+
"build": "tsc",
|
|
33
|
+
"watch": "tsc -w",
|
|
34
|
+
"test": "vitest run",
|
|
35
|
+
"clean": "rimraf ./dist"
|
|
36
|
+
},
|
|
37
|
+
"author": {
|
|
38
|
+
"name": "TypeFox",
|
|
39
|
+
"url": "https://www.typefox.io"
|
|
40
|
+
},
|
|
41
|
+
"keywords": [
|
|
42
|
+
"langium",
|
|
43
|
+
"ai",
|
|
44
|
+
"tools",
|
|
45
|
+
"llm"
|
|
46
|
+
],
|
|
47
|
+
"license": "MIT",
|
|
48
|
+
"dependencies": {
|
|
49
|
+
"langium": "~3.4.0",
|
|
50
|
+
"levenshtein-edit-distance": "^3.0.1"
|
|
51
|
+
},
|
|
52
|
+
"volta": {
|
|
53
|
+
"node": "20.10.0",
|
|
54
|
+
"npm": "10.2.3"
|
|
55
|
+
},
|
|
56
|
+
"devDependencies": {
|
|
57
|
+
"typescript": "^5.4.5",
|
|
58
|
+
"vitest": "^3.0.9",
|
|
59
|
+
"rimraf": "^6.0.1"
|
|
60
|
+
}
|
|
61
|
+
}
|