@exulu/backend 1.46.0 → 1.47.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (151) hide show
  1. package/.agents/skills/mintlify/SKILL.md +347 -0
  2. package/.editorconfig +15 -0
  3. package/.eslintrc.json +52 -0
  4. package/.jscpd.json +18 -0
  5. package/.prettierignore +5 -0
  6. package/.prettierrc.json +12 -0
  7. package/CHANGELOG.md +11 -4
  8. package/README.md +747 -0
  9. package/SECURITY.md +5 -0
  10. package/dist/index.cjs +12015 -10506
  11. package/dist/index.d.cts +725 -667
  12. package/dist/index.d.ts +725 -667
  13. package/dist/index.js +12034 -10518
  14. package/ee/LICENSE.md +62 -0
  15. package/ee/agentic-retrieval/index.ts +1109 -0
  16. package/ee/documents/THIRD_PARTY_LICENSES/docling.txt +31 -0
  17. package/ee/documents/processing/build_pdf_processor.sh +35 -0
  18. package/ee/documents/processing/chunk_markdown.py +263 -0
  19. package/ee/documents/processing/doc_processor.ts +635 -0
  20. package/ee/documents/processing/pdf_processor.spec +115 -0
  21. package/ee/documents/processing/pdf_to_markdown.py +420 -0
  22. package/ee/documents/processing/requirements.txt +4 -0
  23. package/ee/entitlements.ts +49 -0
  24. package/ee/markdown.ts +686 -0
  25. package/ee/queues/decorator.ts +140 -0
  26. package/ee/queues/queues.ts +156 -0
  27. package/ee/queues/server.ts +6 -0
  28. package/ee/rbac-resolver.ts +51 -0
  29. package/ee/rbac-update.ts +111 -0
  30. package/ee/schemas.ts +347 -0
  31. package/ee/tokenizer.ts +80 -0
  32. package/ee/workers.ts +1423 -0
  33. package/eslint.config.js +88 -0
  34. package/jest.config.ts +25 -0
  35. package/license.md +73 -49
  36. package/mintlify-docs/.mintignore +7 -0
  37. package/mintlify-docs/AGENTS.md +33 -0
  38. package/mintlify-docs/CLAUDE.MD +50 -0
  39. package/mintlify-docs/CONTRIBUTING.md +32 -0
  40. package/mintlify-docs/LICENSE +21 -0
  41. package/mintlify-docs/README.md +55 -0
  42. package/mintlify-docs/ai-tools/claude-code.mdx +43 -0
  43. package/mintlify-docs/ai-tools/cursor.mdx +39 -0
  44. package/mintlify-docs/ai-tools/windsurf.mdx +39 -0
  45. package/mintlify-docs/api-reference/core-types/agent-types.mdx +110 -0
  46. package/mintlify-docs/api-reference/core-types/analytics-types.mdx +95 -0
  47. package/mintlify-docs/api-reference/core-types/configuration-types.mdx +83 -0
  48. package/mintlify-docs/api-reference/core-types/evaluation-types.mdx +106 -0
  49. package/mintlify-docs/api-reference/core-types/job-types.mdx +135 -0
  50. package/mintlify-docs/api-reference/core-types/overview.mdx +73 -0
  51. package/mintlify-docs/api-reference/core-types/prompt-types.mdx +102 -0
  52. package/mintlify-docs/api-reference/core-types/rbac-types.mdx +163 -0
  53. package/mintlify-docs/api-reference/core-types/session-types.mdx +77 -0
  54. package/mintlify-docs/api-reference/core-types/user-management.mdx +112 -0
  55. package/mintlify-docs/api-reference/core-types/workflow-types.mdx +88 -0
  56. package/mintlify-docs/api-reference/core-types.mdx +585 -0
  57. package/mintlify-docs/api-reference/dynamic-types.mdx +851 -0
  58. package/mintlify-docs/api-reference/endpoint/create.mdx +4 -0
  59. package/mintlify-docs/api-reference/endpoint/delete.mdx +4 -0
  60. package/mintlify-docs/api-reference/endpoint/get.mdx +4 -0
  61. package/mintlify-docs/api-reference/endpoint/webhook.mdx +4 -0
  62. package/mintlify-docs/api-reference/introduction.mdx +661 -0
  63. package/mintlify-docs/api-reference/mutations.mdx +1012 -0
  64. package/mintlify-docs/api-reference/openapi.json +217 -0
  65. package/mintlify-docs/api-reference/queries.mdx +1154 -0
  66. package/mintlify-docs/backend/introduction.mdx +218 -0
  67. package/mintlify-docs/changelog.mdx +293 -0
  68. package/mintlify-docs/community-edition.mdx +304 -0
  69. package/mintlify-docs/core/exulu-agent/api-reference.mdx +894 -0
  70. package/mintlify-docs/core/exulu-agent/configuration.mdx +690 -0
  71. package/mintlify-docs/core/exulu-agent/introduction.mdx +552 -0
  72. package/mintlify-docs/core/exulu-app/api-reference.mdx +481 -0
  73. package/mintlify-docs/core/exulu-app/configuration.mdx +319 -0
  74. package/mintlify-docs/core/exulu-app/introduction.mdx +117 -0
  75. package/mintlify-docs/core/exulu-authentication.mdx +810 -0
  76. package/mintlify-docs/core/exulu-chunkers/api-reference.mdx +1011 -0
  77. package/mintlify-docs/core/exulu-chunkers/configuration.mdx +596 -0
  78. package/mintlify-docs/core/exulu-chunkers/introduction.mdx +403 -0
  79. package/mintlify-docs/core/exulu-context/api-reference.mdx +911 -0
  80. package/mintlify-docs/core/exulu-context/configuration.mdx +648 -0
  81. package/mintlify-docs/core/exulu-context/introduction.mdx +394 -0
  82. package/mintlify-docs/core/exulu-database.mdx +811 -0
  83. package/mintlify-docs/core/exulu-default-agents.mdx +545 -0
  84. package/mintlify-docs/core/exulu-eval/api-reference.mdx +772 -0
  85. package/mintlify-docs/core/exulu-eval/configuration.mdx +680 -0
  86. package/mintlify-docs/core/exulu-eval/introduction.mdx +459 -0
  87. package/mintlify-docs/core/exulu-logging.mdx +464 -0
  88. package/mintlify-docs/core/exulu-otel.mdx +670 -0
  89. package/mintlify-docs/core/exulu-queues/api-reference.mdx +648 -0
  90. package/mintlify-docs/core/exulu-queues/configuration.mdx +650 -0
  91. package/mintlify-docs/core/exulu-queues/introduction.mdx +474 -0
  92. package/mintlify-docs/core/exulu-reranker/api-reference.mdx +630 -0
  93. package/mintlify-docs/core/exulu-reranker/configuration.mdx +663 -0
  94. package/mintlify-docs/core/exulu-reranker/introduction.mdx +516 -0
  95. package/mintlify-docs/core/exulu-tool/api-reference.mdx +723 -0
  96. package/mintlify-docs/core/exulu-tool/configuration.mdx +805 -0
  97. package/mintlify-docs/core/exulu-tool/introduction.mdx +539 -0
  98. package/mintlify-docs/core/exulu-variables/api-reference.mdx +699 -0
  99. package/mintlify-docs/core/exulu-variables/configuration.mdx +736 -0
  100. package/mintlify-docs/core/exulu-variables/introduction.mdx +511 -0
  101. package/mintlify-docs/development.mdx +94 -0
  102. package/mintlify-docs/docs.json +248 -0
  103. package/mintlify-docs/enterprise-edition.mdx +538 -0
  104. package/mintlify-docs/essentials/code.mdx +35 -0
  105. package/mintlify-docs/essentials/images.mdx +59 -0
  106. package/mintlify-docs/essentials/markdown.mdx +88 -0
  107. package/mintlify-docs/essentials/navigation.mdx +87 -0
  108. package/mintlify-docs/essentials/reusable-snippets.mdx +110 -0
  109. package/mintlify-docs/essentials/settings.mdx +318 -0
  110. package/mintlify-docs/favicon.svg +3 -0
  111. package/mintlify-docs/frontend/introduction.mdx +39 -0
  112. package/mintlify-docs/getting-started.mdx +267 -0
  113. package/mintlify-docs/guides/custom-agent.mdx +608 -0
  114. package/mintlify-docs/guides/first-agent.mdx +315 -0
  115. package/mintlify-docs/images/admin_ui.png +0 -0
  116. package/mintlify-docs/images/contexts.png +0 -0
  117. package/mintlify-docs/images/create_agents.png +0 -0
  118. package/mintlify-docs/images/evals.png +0 -0
  119. package/mintlify-docs/images/graphql.png +0 -0
  120. package/mintlify-docs/images/graphql_api.png +0 -0
  121. package/mintlify-docs/images/hero-dark.png +0 -0
  122. package/mintlify-docs/images/hero-light.png +0 -0
  123. package/mintlify-docs/images/hero.png +0 -0
  124. package/mintlify-docs/images/knowledge_sources.png +0 -0
  125. package/mintlify-docs/images/mcp.png +0 -0
  126. package/mintlify-docs/images/scaling.png +0 -0
  127. package/mintlify-docs/index.mdx +411 -0
  128. package/mintlify-docs/logo/dark.svg +9 -0
  129. package/mintlify-docs/logo/light.svg +9 -0
  130. package/mintlify-docs/partners.mdx +558 -0
  131. package/mintlify-docs/products.mdx +77 -0
  132. package/mintlify-docs/snippets/snippet-intro.mdx +4 -0
  133. package/mintlify-docs/styles.css +207 -0
  134. package/{documentation → old-documentation}/logging.md +3 -3
  135. package/package.json +35 -4
  136. package/skills-lock.json +10 -0
  137. package/types/context-processor.ts +45 -0
  138. package/types/exulu-table-definition.ts +79 -0
  139. package/types/file-types.ts +18 -0
  140. package/types/models/agent.ts +10 -12
  141. package/types/models/exulu-agent-tool-config.ts +11 -0
  142. package/types/models/rate-limiter-rules.ts +7 -0
  143. package/types/provider-config.ts +21 -0
  144. package/types/queue-config.ts +16 -0
  145. package/types/rbac-rights-modes.ts +1 -0
  146. package/types/statistics.ts +20 -0
  147. package/types/workflow.ts +31 -0
  148. package/changelog-backend-10.11.2025_03.12.2025.md +0 -316
  149. package/types/models/agent-backend.ts +0 -15
  150. /package/{documentation → old-documentation}/otel.md +0 -0
  151. /package/{patch-older-releases-readme.md → old-documentation/patch-older-releases.md} +0 -0
@@ -0,0 +1,459 @@
1
+ ---
2
+ title: "Overview"
3
+ description: "Custom evaluation functions to measure and score agent performance"
4
+ ---
5
+
6
+ ## Overview
7
+
8
+ `ExuluEval` is a class for creating custom evaluation functions that measure and score agent performance against test cases. Evaluations allow you to systematically test your agents, track quality over time, and identify areas for improvement.
9
+
10
+ ## What is ExuluEval?
11
+
12
+ ExuluEval provides a framework for defining evaluation logic that:
13
+
14
+ - **Scores agent responses**: Returns a score from 0-100 based on custom criteria
15
+ - **Runs against test cases**: Evaluates agent behavior using structured test inputs
16
+ - **Supports any evaluation method**: Custom logic, LLM-as-judge, regex matching, or any scoring approach
17
+ - **Integrates with queues**: Can be run as background jobs using ExuluQueues
18
+ - **Enables A/B testing**: Compare different agent configurations, prompts, or models
19
+
20
+ <CardGroup cols={2}>
21
+ <Card title="Custom scoring logic" icon="code">
22
+ Write any evaluation function in TypeScript
23
+ </Card>
24
+ <Card title="Test cases" icon="clipboard-check">
25
+ Structured inputs with expected outputs
26
+ </Card>
27
+ <Card title="LLM-as-judge" icon="scale-balanced">
28
+ Use LLMs to evaluate response quality
29
+ </Card>
30
+ <Card title="Queue integration" icon="layer-group">
31
+ Run evaluations as background jobs
32
+ </Card>
33
+ </CardGroup>
34
+
35
+ ## Why use evaluations?
36
+
37
+ Evaluations help you:
38
+
39
+ <AccordionGroup>
40
+ <Accordion title="Measure quality">
41
+ Quantify agent performance with consistent scoring criteria across all responses
42
+ </Accordion>
43
+
44
+ <Accordion title="Prevent regressions">
45
+ Catch performance degradation when updating prompts, models, or tools
46
+ </Accordion>
47
+
48
+ <Accordion title="Compare configurations">
49
+ A/B test different agent setups to find the best performing configuration
50
+ </Accordion>
51
+
52
+ <Accordion title="Track improvements">
53
+ Monitor evaluation scores over time to verify that changes improve quality
54
+ </Accordion>
55
+
56
+ <Accordion title="Automate testing">
57
+ Build CI/CD pipelines that fail if evaluation scores drop below thresholds
58
+ </Accordion>
59
+ </AccordionGroup>
60
+
61
+ ## Quick start
62
+
63
+ ```typescript
64
+ import { ExuluEval } from "@exulu/backend";
65
+
66
+ // Create an evaluation function
67
+ const exactMatchEval = new ExuluEval({
68
+ id: "exact_match",
69
+ name: "Exact Match",
70
+ description: "Checks if response exactly matches expected output",
71
+ llm: false, // Not using LLM-as-judge
72
+ execute: async ({ messages, testCase }) => {
73
+ const lastMessage = messages[messages.length - 1];
74
+ const response = lastMessage?.content || "";
75
+
76
+ return response === testCase.expected_output ? 100 : 0;
77
+ }
78
+ });
79
+
80
+ // Run against a test case
81
+ const score = await exactMatchEval.run(
82
+ agent, // Agent database record
83
+ backend, // ExuluAgent instance
84
+ testCase, // Test case with inputs and expected output
85
+ messages // Conversation messages
86
+ );
87
+
88
+ console.log(`Score: ${score}/100`);
89
+ ```
90
+
91
+ ## Evaluation types
92
+
93
+ ### Custom logic evaluations
94
+
95
+ Write any scoring logic in TypeScript:
96
+
97
+ ```typescript
98
+ const containsKeywordEval = new ExuluEval({
99
+ id: "contains_keyword",
100
+ name: "Contains Keyword",
101
+ description: "Checks if response contains required keywords",
102
+ llm: false,
103
+ execute: async ({ messages, testCase, config }) => {
104
+ const lastMessage = messages[messages.length - 1];
105
+ const response = lastMessage?.content?.toLowerCase() || "";
106
+
107
+ const keywords = config?.keywords || [];
108
+ const foundKeywords = keywords.filter(kw => response.includes(kw.toLowerCase()));
109
+
110
+ return (foundKeywords.length / keywords.length) * 100;
111
+ },
112
+ config: [
113
+ {
114
+ name: "keywords",
115
+ description: "List of keywords that should appear in the response"
116
+ }
117
+ ]
118
+ });
119
+ ```
120
+
121
+ ### LLM-as-judge evaluations
122
+
123
+ Use an LLM to evaluate response quality:
124
+
125
+ ```typescript
126
+ const llmJudgeEval = new ExuluEval({
127
+ id: "llm_judge_quality",
128
+ name: "LLM Judge - Quality",
129
+ description: "Uses an LLM to evaluate response quality",
130
+ llm: true, // Using LLM
131
+ execute: async ({ backend, messages, testCase, config }) => {
132
+ const lastMessage = messages[messages.length - 1];
133
+ const response = lastMessage?.content || "";
134
+
135
+ const prompt = `
136
+ You are an expert evaluator. Rate the quality of this response on a scale of 0-100.
137
+
138
+ Test Case: ${testCase.name}
139
+ Expected: ${testCase.expected_output}
140
+ Actual Response: ${response}
141
+
142
+ Consider:
143
+ - Accuracy: Does it match the expected output?
144
+ - Completeness: Does it address all aspects?
145
+ - Clarity: Is it well-structured and clear?
146
+
147
+ Respond with ONLY a number from 0-100.
148
+ `;
149
+
150
+ const result = await backend.generateSync({
151
+ prompt,
152
+ agentInstance: await loadAgent(config?.judgeAgentId),
153
+ statistics: { label: "eval", trigger: "system" }
154
+ });
155
+
156
+ const score = parseInt(result.text);
157
+ return isNaN(score) ? 0 : Math.max(0, Math.min(100, score));
158
+ },
159
+ config: [
160
+ {
161
+ name: "judgeAgentId",
162
+ description: "Agent ID to use as judge"
163
+ }
164
+ ]
165
+ });
166
+ ```
167
+
168
+ ### Tool usage evaluations
169
+
170
+ Check if the agent used the correct tools:
171
+
172
+ ```typescript
173
+ const toolUsageEval = new ExuluEval({
174
+ id: "tool_usage",
175
+ name: "Tool Usage",
176
+ description: "Checks if agent used expected tools",
177
+ llm: false,
178
+ execute: async ({ messages, testCase }) => {
179
+ // Extract tool calls from messages
180
+ const toolCalls = messages
181
+ .flatMap(msg => msg.toolInvocations || [])
182
+ .map(inv => inv.toolName);
183
+
184
+ const expectedTools = testCase.expected_tools || [];
185
+
186
+ if (expectedTools.length === 0) return 100;
187
+
188
+ const usedExpected = expectedTools.filter(tool => toolCalls.includes(tool));
189
+
190
+ return (usedExpected.length / expectedTools.length) * 100;
191
+ }
192
+ });
193
+ ```
194
+
195
+ ### Similarity evaluations
196
+
197
+ Use embeddings to measure semantic similarity:
198
+
199
+ ```typescript
200
+ import { ExuluEval, ExuluEmbedder } from "@exulu/backend";
201
+
202
+ const similarityEval = new ExuluEval({
203
+ id: "semantic_similarity",
204
+ name: "Semantic Similarity",
205
+ description: "Measures semantic similarity between response and expected output",
206
+ llm: false,
207
+ execute: async ({ messages, testCase }) => {
208
+ const lastMessage = messages[messages.length - 1];
209
+ const response = lastMessage?.content || "";
210
+
211
+ const embedder = new ExuluEmbedder({
212
+ id: "eval_embedder",
213
+ name: "Eval Embedder",
214
+ provider: "openai",
215
+ model: "text-embedding-3-small",
216
+ vectorDimensions: 1536,
217
+ authenticationInformation: await ExuluVariables.get("openai_api_key")
218
+ });
219
+
220
+ const [responseEmb, expectedEmb] = await embedder.generate([
221
+ response,
222
+ testCase.expected_output
223
+ ]);
224
+
225
+ // Cosine similarity
226
+ const similarity = cosineSimilarity(responseEmb, expectedEmb);
227
+
228
+ return similarity * 100;
229
+ }
230
+ });
231
+
232
+ function cosineSimilarity(a: number[], b: number[]): number {
233
+ const dotProduct = a.reduce((sum, val, i) => sum + val * b[i], 0);
234
+ const magnitudeA = Math.sqrt(a.reduce((sum, val) => sum + val * val, 0));
235
+ const magnitudeB = Math.sqrt(b.reduce((sum, val) => sum + val * val, 0));
236
+ return dotProduct / (magnitudeA * magnitudeB);
237
+ }
238
+ ```
239
+
240
+ ## Test cases
241
+
242
+ Test cases define the inputs and expected outputs for evaluations:
243
+
244
+ ```typescript
245
+ interface TestCase {
246
+ id: string;
247
+ name: string;
248
+ description?: string;
249
+ inputs: UIMessage[]; // Input conversation
250
+ expected_output: string; // Expected response
251
+ expected_tools?: string[]; // Expected tool calls
252
+ expected_knowledge_sources?: string[]; // Expected contexts used
253
+ expected_agent_tools?: string[]; // Expected agent tools
254
+ createdAt: string;
255
+ updatedAt: string;
256
+ }
257
+ ```
258
+
259
+ **Example test case:**
260
+
261
+ ```typescript
262
+ const testCase: TestCase = {
263
+ id: "tc_001",
264
+ name: "Weather query",
265
+ description: "User asks about weather",
266
+ inputs: [
267
+ {
268
+ role: "user",
269
+ content: "What's the weather like in San Francisco?"
270
+ }
271
+ ],
272
+ expected_output: "Based on current data, it's 68°F and sunny in San Francisco.",
273
+ expected_tools: ["get_weather"],
274
+ expected_knowledge_sources: [],
275
+ expected_agent_tools: [],
276
+ createdAt: "2025-01-15T10:00:00Z",
277
+ updatedAt: "2025-01-15T10:00:00Z"
278
+ };
279
+ ```
280
+
281
+ ## Running evaluations
282
+
283
+ ### Basic evaluation run
284
+
285
+ ```typescript
286
+ import { ExuluEval, ExuluAgent } from "@exulu/backend";
287
+
288
+ const eval = new ExuluEval({
289
+ id: "my_eval",
290
+ name: "My Evaluation",
291
+ description: "Custom evaluation",
292
+ llm: false,
293
+ execute: async ({ messages, testCase }) => {
294
+ // Your scoring logic
295
+ return 85; // Score from 0-100
296
+ }
297
+ });
298
+
299
+ // Run evaluation
300
+ const score = await eval.run(
301
+ agent, // Agent DB record
302
+ backend, // ExuluAgent instance
303
+ testCase, // TestCase
304
+ messages, // UIMessage[]
305
+ config // Optional config
306
+ );
307
+
308
+ console.log(`Score: ${score}/100`);
309
+ ```
310
+
311
+ ### Batch evaluation
312
+
313
+ Run multiple evaluations on a test suite:
314
+
315
+ ```typescript
316
+ async function runEvaluations(
317
+ agent: Agent,
318
+ backend: ExuluAgent,
319
+ testCases: TestCase[],
320
+ evals: ExuluEval[]
321
+ ) {
322
+ const results = [];
323
+
324
+ for (const testCase of testCases) {
325
+ // Generate response
326
+ const response = await backend.generateSync({
327
+ prompt: testCase.inputs[testCase.inputs.length - 1].content,
328
+ agentInstance: await loadAgent(agent.id),
329
+ statistics: { label: "eval", trigger: "test" }
330
+ });
331
+
332
+ const messages = [
333
+ ...testCase.inputs,
334
+ { role: "assistant", content: response.text }
335
+ ];
336
+
337
+ // Run all evals on this test case
338
+ for (const eval of evals) {
339
+ const score = await eval.run(agent, backend, testCase, messages);
340
+
341
+ results.push({
342
+ testCaseId: testCase.id,
343
+ testCaseName: testCase.name,
344
+ evalId: eval.id,
345
+ evalName: eval.name,
346
+ score
347
+ });
348
+ }
349
+ }
350
+
351
+ return results;
352
+ }
353
+ ```
354
+
355
+ ## Integration with ExuluQueues
356
+
357
+ Run evaluations as background jobs:
358
+
359
+ ```typescript
360
+ import { ExuluEval, ExuluQueues } from "@exulu/backend";
361
+
362
+ // Create eval with queue config
363
+ const eval = new ExuluEval({
364
+ id: "background_eval",
365
+ name: "Background Evaluation",
366
+ description: "Runs as background job",
367
+ llm: true,
368
+ execute: async ({ backend, messages, testCase }) => {
369
+ // Evaluation logic
370
+ return 90;
371
+ },
372
+ queue: Promise.resolve({
373
+ connection: await ExuluQueues.getConnection(),
374
+ name: "evaluations",
375
+ prefix: "{exulu}",
376
+ defaultJobOptions: {
377
+ attempts: 3,
378
+ backoff: { type: "exponential", delay: 2000 }
379
+ }
380
+ })
381
+ });
382
+
383
+ // Queue the evaluation job
384
+ // (Implementation depends on your worker setup)
385
+ ```
386
+
387
+ ## Best practices
388
+
389
+ <Tip>
390
+ **Start simple**: Begin with basic evaluations (exact match, keyword presence) before building complex LLM-as-judge evaluations.
391
+ </Tip>
392
+
393
+ <Note>
394
+ **Multiple evaluations**: Use multiple evaluation functions to assess different aspects (accuracy, tone, tool usage, etc.).
395
+ </Note>
396
+
397
+ <Warning>
398
+ **Score range**: Evaluation functions must return a score between 0 and 100. Scores outside this range will throw an error.
399
+ </Warning>
400
+
401
+ <Info>
402
+ **Test case quality**: Good test cases are specific, representative of real usage, and have clear expected outputs.
403
+ </Info>
404
+
405
+ ## When to use ExuluEval
406
+
407
+ <AccordionGroup>
408
+ <Accordion title="Agent development">
409
+ Test agent behavior during development to catch issues early
410
+ </Accordion>
411
+
412
+ <Accordion title="Prompt engineering">
413
+ Compare prompt variations to find the best performing instructions
414
+ </Accordion>
415
+
416
+ <Accordion title="Model comparison">
417
+ Evaluate the same agent with different LLM models (GPT-4 vs Claude vs Gemini)
418
+ </Accordion>
419
+
420
+ <Accordion title="CI/CD pipelines">
421
+ Automated testing in deployment pipelines to prevent regressions
422
+ </Accordion>
423
+
424
+ <Accordion title="Quality monitoring">
425
+ Continuous evaluation in production to track performance over time
426
+ </Accordion>
427
+ </AccordionGroup>
428
+
429
+ ## Evaluation workflow
430
+
431
+ ```mermaid
432
+ graph TD
433
+ A[Create Test Cases] --> B[Define Evaluation Functions]
434
+ B --> C[Generate Agent Response]
435
+ C --> D[Run Evaluation]
436
+ D --> E{Score >= Threshold?}
437
+ E -->|Yes| F[Pass]
438
+ E -->|No| G[Fail]
439
+ F --> H[Deploy / Continue]
440
+ G --> I[Fix Agent]
441
+ I --> C
442
+ ```
443
+
444
+ ## Next steps
445
+
446
+ <CardGroup cols={2}>
447
+ <Card title="Configuration" icon="gear" href="/core/exulu-eval/configuration">
448
+ Learn about evaluation configuration
449
+ </Card>
450
+ <Card title="API reference" icon="code" href="/core/exulu-eval/api-reference">
451
+ Explore methods and properties
452
+ </Card>
453
+ <Card title="ExuluAgent" icon="robot" href="/core/exulu-agent/introduction">
454
+ Create agents to evaluate
455
+ </Card>
456
+ <Card title="ExuluQueues" icon="layer-group" href="/core/exulu-queues/introduction">
457
+ Run evaluations as background jobs
458
+ </Card>
459
+ </CardGroup>