@eva-llm/eva-judge 1.0.3 → 1.0.5

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -2,11 +2,15 @@
2
2
 
3
3
  A TypeScript/Node.js library for automated text evaluation with AI analysis through **LLM-Rubric**, **G-Eval**, or **B-Eval** (Binary G-Eval).
4
4
 
5
+ ---
6
+
5
7
  ## Project Inspiration & Attribution
6
8
 
7
9
  This project is inspired by [promptfoo](https://github.com/promptfoo/promptfoo), including [author's work](https://github.com/promptfoo/promptfoo/issues?q=state%3Amerged%20is%3Apr%20author%3A%40schipiga) on the [G-Eval](https://www.promptfoo.dev/docs/configuration/expected-outputs/model-graded/g-eval/) framework there.<br />
8
10
  The LLM-as-a-Judge prompts are copied from promptfoo and adapted for project-specific issues.
9
11
 
12
+ ---
13
+
10
14
  ## Quick Start
11
15
 
12
16
  ```bash
@@ -35,9 +39,16 @@ await bEval({ query, answer }, 'answer is coherent to question', 'openai', 'gpt-
35
39
  // { score: 1, reason: 'The answer is definitely coherent to the question' }
36
40
  ```
37
41
 
38
- **NOTE!** For better judging the factual standard is temperature=0
42
+ **NOTE!** For robust judging the factual standard is `temperature=0`
43
+
44
+ ---
39
45
 
40
46
  ## API
47
+
48
+ Judge `options` forward any Vercel AI SDK [generateText options](https://ai-sdk.dev/docs/reference/ai-sdk-core/generate-text#api-signature).
49
+
50
+ > NOTE! Internal values such as `model`, `system`, and `prompt` are managed by the Judge and will override corresponding values in the `options` object to ensure evaluation integrity.
51
+
41
52
  ### llmRubric
42
53
 
43
54
  Evaluates an output against a rubric using an LLM. Returns a reason, pass/fail, and normalized score.
@@ -48,7 +59,7 @@ const result = await llmRubric(
48
59
  rubric, // string: the rubric to use
49
60
  provider, // string: LLM provider name
50
61
  model, // string: LLM model name
51
- options // optional: { temperature, providerOptions }
62
+ options // optional: { Vercel ai-sdk options }
52
63
  );
53
64
  // result: { reason: string, pass: boolean, score: number }
54
65
  ```
@@ -63,7 +74,7 @@ const result = await gEval(
63
74
  criteria, // string: evaluation criteria
64
75
  provider, // string: LLM provider name
65
76
  model, // string: LLM model name
66
- options // optional: { temperature, providerOptions }
77
+ options // optional: { Vercel ai-sdk options }
67
78
  );
68
79
  // result: { reason: string, score: number }
69
80
  ```
@@ -78,22 +89,22 @@ const result = await bEval(
78
89
  criteria, // string: evaluation criteria
79
90
  provider, // string: LLM provider name
80
91
  model, // string: LLM model name
81
- options // optional: { temperature, providerOptions }
92
+ options // optional: { Vercel ai-sdk options }
82
93
  );
83
94
  // result: { reason: string, score: number } // score will be 0 or 1
84
95
  ```
85
96
 
86
97
  ---
87
98
 
88
- ### G-Eval vs B-Eval
99
+ ## G-Eval vs B-Eval
89
100
  The divergence between **G-Eval** and **B-Eval** reveals a critical **'Judgement Gap'**:
90
101
 
91
- * **G-Eval (The Auditor):** Scoring on a `0.0-1.0` scale allows the model to stay in a 'comfort zone', smoothing over internal contradictions.
92
- * **B-Eval (The Judge):** A binary `0|1` choice forces **Adjudication**. This 'forced choice' triggers the **Alignment Paradox**, exposing the struggle between **RLHF training** and objective facts.
102
+ - **G-Eval (The Auditor):** Scoring on a `0.0-1.0` scale allows the model to stay in a 'comfort zone', smoothing over internal contradictions.
103
+ - **B-Eval (The Judge):** A binary `0|1` choice forces **Adjudication**. This 'forced choice' triggers the **Alignment Paradox**, exposing the struggle between **RLHF training** and objective facts.
93
104
 
94
- **Conclusion:** **B-Eval** is a superior stress-test for **Epistemic Honesty**. By stripping away the safety net of grey-zone scoring, it reveals exactly where logic breaks under the weight of normative priors.
105
+ **B-Eval** is a superior stress-test for **Epistemic Honesty**. By stripping away the safety net of grey-zone scoring, it reveals exactly where logic breaks under the weight of normative priors.
95
106
 
96
- More details in EVA-LLM [Dark Teaming Manifesto](https://eva-llm.github.io/dark-teaming).
107
+ More details in [Dark Teaming Manifesto](https://eva-llm.github.io/dark-teaming).
97
108
 
98
109
  ---
99
110
 
@@ -116,6 +127,8 @@ Specify the provider name and model name in `llmRubric`, `gEval`, or `bEval`.
116
127
 
117
128
  > **Note:** Each provider integration is based on its respective ai-sdk package. Be sure to follow the provider's documentation for setup and authentication. Most providers require you to export an API key or token as an environment variable (e.g., `export OPENAI_API_KEY=...`).
118
129
 
130
+ ---
131
+
119
132
  ## Enterprise
120
133
  ### LLM Judge Hooks
121
134
 
@@ -147,7 +160,7 @@ Config.enableStepsCache();
147
160
  Config.disableStepsCache();
148
161
  ```
149
162
 
150
- ### G-Eval/B-Eval Evaluation Steps Persistent Storage
163
+ ### G-Eval / B-Eval Evaluation Steps Persistent Storage
151
164
 
152
165
  For advanced use, you can implement your own cache storage for evaluation steps (e.g., using Redis or another backend) by providing a custom cache via `setStepsCache()`:
153
166
 
@@ -160,7 +173,3 @@ class RedisCache implements IStepsCache {
160
173
 
161
174
  Config.setStepsCache(RedisCache);
162
175
  ```
163
-
164
- ## License
165
-
166
- MIT
package/dst/config.d.ts CHANGED
@@ -1,6 +1,6 @@
1
1
  import { LRUCache } from 'lru-cache';
2
2
  import { type LanguageModel } from 'ai';
3
- import { type EvaHooks, type IStepsCache } from './types';
3
+ import { type IJudgeHooks, type IStepsCache } from './types';
4
4
  declare const _default: {
5
5
  gevalMaxScore: number;
6
6
  isModelCached: boolean;
@@ -14,7 +14,7 @@ declare const _default: {
14
14
  disableModelCache(): void;
15
15
  enableStepsCache(): void;
16
16
  disableStepsCache(): void;
17
- hooks: EvaHooks;
18
- setHooks(hooks: EvaHooks): void;
17
+ hooks: IJudgeHooks;
18
+ setHooks(hooks: IJudgeHooks): void;
19
19
  };
20
20
  export default _default;
package/dst/config.js.map CHANGED
@@ -1 +1 @@
1
- {"version":3,"file":"config.js","sourceRoot":"","sources":["../src/config.ts"],"names":[],"mappings":";;AAAA,yCAAqC;AASrC,MAAM,kBAAkB;IACd,KAAK,CAA6B;IAM1C,YAAY,IAAY;QACtB,IAAI,CAAC,KAAK,GAAG,IAAI,oBAAQ,CAAC,EAAE,GAAG,EAAE,IAAI,EAAE,CAAC,CAAC;IAC3C,CAAC;IAMD,KAAK,CAAC,GAAG,CAAC,GAAW,EAAE,KAAe;QACpC,IAAI,CAAC,KAAK,CAAC,GAAG,CAAC,GAAG,EAAE,KAAK,CAAC,CAAC;IAC7B,CAAC;IAMD,KAAK,CAAC,GAAG,CAAC,GAAW;QACnB,OAAO,IAAI,CAAC,KAAK,CAAC,GAAG,CAAC,GAAG,CAAC,CAAC;IAC7B,CAAC;CACF;AAOD,kBAAe;IAIb,aAAa,EAAE,EAAE;IAIjB,aAAa,EAAE,IAAI;IAInB,aAAa,EAAE,IAAI;IAInB,UAAU,EAAE,IAAI,oBAAQ,CAAwB,EAAE,GAAG,EAAE,GAAG,EAAE,CAAC;IAI7D,UAAU,EAAE,IAAI,kBAAkB,CAAC,GAAG,CAAgB;IAKtD,iBAAiB,CAAC,OAAe,GAAG;QAClC,IAAI,CAAC,UAAU,GAAG,IAAI,oBAAQ,CAAwB,EAAE,GAAG,EAAE,IAAI,EAAE,CAAC,CAAC;IACvE,CAAC;IAKD,iBAAiB,CAAC,OAAe,GAAG;QAClC,IAAI,CAAC,UAAU,GAAG,IAAI,kBAAkB,CAAC,IAAI,CAAgB,CAAC;IAChE,CAAC;IAKD,aAAa,CAAC,KAAkB;QAC9B,IAAI,CAAC,UAAU,GAAG,KAAK,CAAC;IAC1B,CAAC;IAID,gBAAgB;QACd,IAAI,CAAC,aAAa,GAAG,IAAI,CAAC;IAC5B,CAAC;IAID,iBAAiB;QACf,IAAI,CAAC,aAAa,GAAG,KAAK,CAAC;IAC7B,CAAC;IAID,gBAAgB;QACd,IAAI,CAAC,aAAa,GAAG,IAAI,CAAC;IAC5B,CAAC;IAID,iBAAiB;QACf,IAAI,CAAC,aAAa,GAAG,KAAK,CAAC;IAC7B,CAAC;IAID,KAAK,EAAE,EAAc;IAKrB,QAAQ,CAAC,KAAe;QACtB,IAAI,CAAC,KAAK,GAAG,KAAK,CAAC;IACrB,CAAC;CACF,CAAC"}
1
+ {"version":3,"file":"config.js","sourceRoot":"","sources":["../src/config.ts"],"names":[],"mappings":";;AAAA,yCAAqC;AAYrC,MAAM,kBAAkB;IACd,KAAK,CAA6B;IAM1C,YAAY,IAAY;QACtB,IAAI,CAAC,KAAK,GAAG,IAAI,oBAAQ,CAAC,EAAE,GAAG,EAAE,IAAI,EAAE,CAAC,CAAC;IAC3C,CAAC;IAMD,KAAK,CAAC,GAAG,CAAC,GAAW,EAAE,KAAe;QACpC,IAAI,CAAC,KAAK,CAAC,GAAG,CAAC,GAAG,EAAE,KAAK,CAAC,CAAC;IAC7B,CAAC;IAMD,KAAK,CAAC,GAAG,CAAC,GAAW;QACnB,OAAO,IAAI,CAAC,KAAK,CAAC,GAAG,CAAC,GAAG,CAAC,CAAC;IAC7B,CAAC;CACF;AAOD,kBAAe;IAIb,aAAa,EAAE,EAAE;IAIjB,aAAa,EAAE,IAAI;IAInB,aAAa,EAAE,IAAI;IAInB,UAAU,EAAE,IAAI,oBAAQ,CAAwB,EAAE,GAAG,EAAE,GAAG,EAAE,CAAC;IAI7D,UAAU,EAAE,IAAI,kBAAkB,CAAC,GAAG,CAAgB;IAKtD,iBAAiB,CAAC,OAAe,GAAG;QAClC,IAAI,CAAC,UAAU,GAAG,IAAI,oBAAQ,CAAwB,EAAE,GAAG,EAAE,IAAI,EAAE,CAAC,CAAC;IACvE,CAAC;IAKD,iBAAiB,CAAC,OAAe,GAAG;QAClC,IAAI,CAAC,UAAU,GAAG,IAAI,kBAAkB,CAAC,IAAI,CAAgB,CAAC;IAChE,CAAC;IAKD,aAAa,CAAC,KAAkB;QAC9B,IAAI,CAAC,UAAU,GAAG,KAAK,CAAC;IAC1B,CAAC;IAID,gBAAgB;QACd,IAAI,CAAC,aAAa,GAAG,IAAI,CAAC;IAC5B,CAAC;IAID,iBAAiB;QACf,IAAI,CAAC,aAAa,GAAG,KAAK,CAAC;IAC7B,CAAC;IAID,gBAAgB;QACd,IAAI,CAAC,aAAa,GAAG,IAAI,CAAC;IAC5B,CAAC;IAID,iBAAiB;QACf,IAAI,CAAC,aAAa,GAAG,KAAK,CAAC;IAC7B,CAAC;IAID,KAAK,EAAE,EAAiB;IAKxB,QAAQ,CAAC,KAAkB;QACzB,IAAI,CAAC,KAAK,GAAG,KAAK,CAAC;IACrB,CAAC;CACF,CAAC"}
package/dst/index.d.ts CHANGED
@@ -1,23 +1,7 @@
1
- import z from 'zod';
2
- import { type EvalOptions, type GEvalInput } from './types';
1
+ import { type TVercelOptions, type TGevalInput, type TRubricResult, type TGevalEvaluateResult } from './types';
3
2
  export * from './config';
4
3
  export { default } from './config';
5
4
  export * from './types';
6
- export declare const RubricResultSchema: z.ZodObject<{
7
- reason: z.ZodString;
8
- pass: z.ZodBoolean;
9
- score: z.ZodNumber;
10
- }, z.core.$strip>;
11
- export type RubricResult = z.infer<typeof RubricResultSchema>;
12
- export declare const GevalStepsResultSchema: z.ZodObject<{
13
- steps: z.ZodArray<z.ZodString>;
14
- }, z.core.$strip>;
15
- export type GevalStepsResult = z.infer<typeof GevalStepsResultSchema>;
16
- export declare const GevalEvaluateResultSchema: z.ZodObject<{
17
- reason: z.ZodString;
18
- score: z.ZodNumber;
19
- }, z.core.$strip>;
20
- export type GevalEvaluateResult = z.infer<typeof GevalEvaluateResultSchema>;
21
- export declare const llmRubric: (output: string, rubric: string, providerName: string, modelName: string, options?: EvalOptions) => Promise<RubricResult>;
22
- export declare const gEval: (input: GEvalInput, criteria: string, providerName: string, modelName: string, options?: EvalOptions) => Promise<GevalEvaluateResult>;
23
- export declare const bEval: (input: GEvalInput, criteria: string, providerName: string, modelName: string, options?: EvalOptions) => Promise<GevalEvaluateResult>;
5
+ export declare const llmRubric: (output: string, rubric: string, providerName: string, modelName: string, options?: TVercelOptions) => Promise<TRubricResult>;
6
+ export declare const gEval: (input: TGevalInput, criteria: string, providerName: string, modelName: string, options?: TVercelOptions) => Promise<TGevalEvaluateResult>;
7
+ export declare const bEval: (input: TGevalInput, criteria: string, providerName: string, modelName: string, options?: TVercelOptions) => Promise<TGevalEvaluateResult>;
package/dst/index.js CHANGED
@@ -39,43 +39,31 @@ var __importDefault = (this && this.__importDefault) || function (mod) {
39
39
  return (mod && mod.__esModule) ? mod : { "default": mod };
40
40
  };
41
41
  Object.defineProperty(exports, "__esModule", { value: true });
42
- exports.bEval = exports.gEval = exports.llmRubric = exports.GevalEvaluateResultSchema = exports.GevalStepsResultSchema = exports.RubricResultSchema = exports.default = void 0;
42
+ exports.bEval = exports.gEval = exports.llmRubric = exports.default = void 0;
43
43
  const node_crypto_1 = __importDefault(require("node:crypto"));
44
- const ai_1 = require("ai");
45
44
  const Mustache = __importStar(require("mustache"));
46
- const zod_1 = __importDefault(require("zod"));
45
+ const ai_1 = require("ai");
46
+ const config_1 = __importDefault(require("./config"));
47
47
  const prompt_1 = require("./prompt");
48
48
  const registry_1 = require("./registry");
49
- const config_1 = __importDefault(require("./config"));
49
+ const types_1 = require("./types");
50
50
  __exportStar(require("./config"), exports);
51
51
  var config_2 = require("./config");
52
52
  Object.defineProperty(exports, "default", { enumerable: true, get: function () { return __importDefault(config_2).default; } });
53
53
  __exportStar(require("./types"), exports);
54
- exports.RubricResultSchema = zod_1.default.object({
55
- reason: zod_1.default.string().describe('Detailed explanation of the score based on the rubric'),
56
- pass: zod_1.default.boolean().describe('Whether the output satisfies the minimum requirements'),
57
- score: zod_1.default.number().min(0).max(1).describe('Numeric representation of quality'),
58
- });
59
- exports.GevalStepsResultSchema = zod_1.default.object({
60
- steps: zod_1.default.array(zod_1.default.string()).describe('List of concise evaluation steps derived from the criteria'),
61
- });
62
- exports.GevalEvaluateResultSchema = zod_1.default.object({
63
- reason: zod_1.default.string().describe('Detailed explanation of the score based on the rubric'),
64
- score: zod_1.default.number().min(0).describe('Numeric representation of quality'),
65
- });
66
54
  const getHashId = () => node_crypto_1.default.randomBytes(16).toString('hex');
67
55
  const llmRubric = async (output, rubric, providerName, modelName, options = {}) => {
68
56
  const start = Date.now();
69
57
  try {
70
58
  const userPrompt = Mustache.render(prompt_1.LLM_RUBRIC_USER_PROMPT, { output, rubric });
71
59
  const { output: result } = await (0, ai_1.generateText)({
60
+ ...options,
72
61
  model: (0, registry_1.getModel)(providerName, modelName),
73
62
  system: Mustache.render(prompt_1.LLM_RUBRIC_SYSTEM_PROMPT, { hash_id: getHashId() }),
74
63
  prompt: userPrompt,
75
64
  output: ai_1.Output.object({
76
- schema: exports.RubricResultSchema,
65
+ schema: types_1.RubricResultSchema,
77
66
  }),
78
- ...options,
79
67
  });
80
68
  config_1.default.hooks.onSuccess?.({
81
69
  method: 'llmRubric',
@@ -107,18 +95,18 @@ const _gEval = async (input, criteria, providerName, modelName, maxScore, method
107
95
  if (!steps) {
108
96
  const stepsPrompt = Mustache.render(prompt_1.GEVAL_STEPS_PROMPT, { criteria });
109
97
  const { output: stepsResult } = await (0, ai_1.generateText)({
98
+ ...options,
99
+ system: undefined,
110
100
  model,
111
101
  prompt: stepsPrompt,
112
102
  output: ai_1.Output.object({
113
- schema: exports.GevalStepsResultSchema,
103
+ schema: types_1.GevalStepsResultSchema,
114
104
  }),
115
- ...options,
116
105
  });
117
106
  steps = stepsResult.steps;
118
107
  (0, registry_1.setSteps)(criteria, stepsResult.steps);
119
108
  }
120
109
  const evaluationPrompt = Mustache.render(query ? prompt_1.GEVAL_EVALUATE_PROMPT : prompt_1.GEVAL_EVALUATE_REPLY_PROMPT, {
121
- hash_id: getHashId(),
122
110
  criteria,
123
111
  steps: steps.join('\n- '),
124
112
  input: query,
@@ -126,10 +114,12 @@ const _gEval = async (input, criteria, providerName, modelName, maxScore, method
126
114
  maxScore,
127
115
  });
128
116
  const { output: evalResult } = await (0, ai_1.generateText)({
117
+ ...options,
129
118
  model,
119
+ system: Mustache.render(prompt_1.GEVAL_SYSTEM_PROMPT, { hash_id: getHashId() }),
130
120
  prompt: evaluationPrompt,
131
121
  output: ai_1.Output.object({
132
- schema: exports.GevalEvaluateResultSchema,
122
+ schema: types_1.GevalEvaluateResultSchema,
133
123
  }),
134
124
  ...options,
135
125
  });
package/dst/index.js.map CHANGED
@@ -1 +1 @@
1
- {"version":3,"file":"index.js","sourceRoot":"","sources":["../src/index.ts"],"names":[],"mappings":";;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;AAAA,8DAAiC;AACjC,2BAA0C;AAC1C,mDAAqC;AACrC,8CAAoB;AAEpB,qCAMkB;AAClB,yCAA0D;AAC1D,sDAA4B;AAO5B,2CAAyB;AACzB,mCAAmC;AAA1B,kHAAA,OAAO,OAAA;AAChB,0CAAwB;AAMX,QAAA,kBAAkB,GAAG,aAAC,CAAC,MAAM,CAAC;IAEzC,MAAM,EAAE,aAAC,CAAC,MAAM,EAAE,CAAC,QAAQ,CAAC,uDAAuD,CAAC;IAEpF,IAAI,EAAE,aAAC,CAAC,OAAO,EAAE,CAAC,QAAQ,CAAC,uDAAuD,CAAC;IAEnF,KAAK,EAAE,aAAC,CAAC,MAAM,EAAE,CAAC,GAAG,CAAC,CAAC,CAAC,CAAC,GAAG,CAAC,CAAC,CAAC,CAAC,QAAQ,CAAC,mCAAmC,CAAC;CAC9E,CAAC,CAAC;AAUU,QAAA,sBAAsB,GAAG,aAAC,CAAC,MAAM,CAAC;IAE7C,KAAK,EAAE,aAAC,CAAC,KAAK,CAAC,aAAC,CAAC,MAAM,EAAE,CAAC,CAAC,QAAQ,CAAC,4DAA4D,CAAC;CAClG,CAAC,CAAC;AAWU,QAAA,yBAAyB,GAAG,aAAC,CAAC,MAAM,CAAC;IAEhD,MAAM,EAAE,aAAC,CAAC,MAAM,EAAE,CAAC,QAAQ,CAAC,uDAAuD,CAAC;IAEpF,KAAK,EAAE,aAAC,CAAC,MAAM,EAAE,CAAC,GAAG,CAAC,CAAC,CAAC,CAAC,QAAQ,CAAC,mCAAmC,CAAC;CACvE,CAAC,CAAC;AAMH,MAAM,SAAS,GAAG,GAAG,EAAE,CAAC,qBAAM,CAAC,WAAW,CAAC,EAAE,CAAC,CAAC,QAAQ,CAAC,KAAK,CAAC,CAAC;AAYxD,MAAM,SAAS,GAAG,KAAK,EAC5B,MAAc,EACd,MAAc,EACd,YAAoB,EACpB,SAAiB,EACjB,UAAuB,EAAE,EACF,EAAE;IACzB,MAAM,KAAK,GAAG,IAAI,CAAC,GAAG,EAAE,CAAC;IACzB,IAAI,CAAC;QACH,MAAM,UAAU,GAAG,QAAQ,CAAC,MAAM,CAAC,+BAAsB,EAAE,EAAE,MAAM,EAAE,MAAM,EAAE,CAAC,CAAC;QAE/E,MAAM,EAAE,MAAM,EAAE,MAAM,EAAE,GAAG,MAAM,IAAA,iBAAY,EAAC;YAC5C,KAAK,EAAE,IAAA,mBAAQ,EAAC,YAAY,EAAE,SAAS,CAAC;YACxC,MAAM,EAAE,QAAQ,CAAC,MAAM,CAAC,iCAAwB,EAAE,EAAE,OAAO,EAAE,SAAS,EAAE,EAAE,CAAC;YAC3E,MAAM,EAAE,UAAU;YAClB,MAAM,EAAE,WAAM,CAAC,MAAM,CAAC;gBACpB,MAAM,EAAE,0BAAkB;aAC3B,CAAC;YACF,GAAG,OAAO;SACX,CAAC,CAAC;QAEH,gBAAI,CAAC,KAAK,CAAC,SAAS,EAAE,CAAC;YACrB,MAAM,EAAE,WAAW;YACnB,MAAM,EAAE,EAAE,MAAM,EAAE,MAAM,EAAE,YAAY,EAAE,SAAS,EAAE,OAAO,EAAE;YAC5D,MAAM;YACN,QAAQ,EAAE,IAAI,CAAC,GAAG,EAAE,GAAG,KAAK;SAC7B,CAAC,CAAC;QAEH,OAAO,MAAM,CAAC;IAChB,CAAC;IAAC,OAAO,KAAK,EAAE,CAAC;QAEf,gBAAI,CAAC,KAAK,CAAC,OAAO,EAAE,CAAC;YACnB,MAAM,EAAE,WAAW;YACnB,KAAK;YACL,QAAQ,EAAE,IAAI,CAAC,GAAG,EAAE,GAAG,KAAK;SAC7B,CAAC,CAAC;QAEH,MAAM,KAAK,CAAC;IACd,CAAC;AACH,CAAC,CAAA;AAvCY,QAAA,SAAS,aAuCrB;AAED,MAAM,MAAM,GAAG,KAAK,EAClB,KAAiB,EACjB,QAAgB,EAChB,YAAoB,EACpB,SAAiB,EACjB,QAAgB,EAChB,UAAsB,EACtB,UAAuB,EAAE,EACK,EAAE;IAChC,IAAI,OAAO,KAAK,KAAK,QAAQ,EAAE,CAAC;QAC9B,KAAK,GAAG,EAAE,KAAK,EAAE,EAAE,EAAE,MAAM,EAAE,KAAK,EAAE,CAAC;IACvC,CAAC;IACD,MAAM,EAAE,KAAK,EAAE,MAAM,EAAE,GAAG,KAAK,CAAC;IAEhC,MAAM,KAAK,GAAG,IAAI,CAAC,GAAG,EAAE,CAAC;IAEzB,IAAI,CAAC;QACH,MAAM,KAAK,GAAG,IAAA,mBAAQ,EAAC,YAAY,EAAE,SAAS,CAAC,CAAC;QAChD,IAAI,KAAK,GAAG,MAAM,IAAA,mBAAQ,EAAC,QAAQ,CAAC,CAAC;QAErC,IAAI,CAAC,KAAK,EAAE,CAAC;YACX,MAAM,WAAW,GAAG,QAAQ,CAAC,MAAM,CAAC,2BAAkB,EAAE,EAAE,QAAQ,EAAE,CAAC,CAAC;YAEtE,MAAM,EAAE,MAAM,EAAE,WAAW,EAAE,GAAG,MAAM,IAAA,iBAAY,EAAC;gBACjD,KAAK;gBACL,MAAM,EAAE,WAAW;gBACnB,MAAM,EAAE,WAAM,CAAC,MAAM,CAAC;oBACpB,MAAM,EAAE,8BAAsB;iBAC/B,CAAC;gBACF,GAAG,OAAO;aACX,CAAC,CAAC;YAEH,KAAK,GAAG,WAAW,CAAC,KAAK,CAAC;YAE1B,IAAA,mBAAQ,EAAC,QAAQ,EAAE,WAAW,CAAC,KAAK,CAAC,CAAC;QACxC,CAAC;QAED,MAAM,gBAAgB,GAAG,QAAQ,CAAC,MAAM,CACtC,KAAK,CAAC,CAAC,CAAC,8BAAqB,CAAC,CAAC,CAAC,oCAA2B,EAC3D;YACE,OAAO,EAAE,SAAS,EAAE;YACpB,QAAQ;YACR,KAAK,EAAE,KAAK,CAAC,IAAI,CAAC,MAAM,CAAC;YACzB,KAAK,EAAE,KAAK;YACZ,MAAM,EAAE,MAAM;YACd,QAAQ;SACT,CAAC,CAAC;QAEL,MAAM,EAAE,MAAM,EAAE,UAAU,EAAE,GAAG,MAAM,IAAA,iBAAY,EAAC;YAChD,KAAK;YACL,MAAM,EAAE,gBAAgB;YACxB,MAAM,EAAE,WAAM,CAAC,MAAM,CAAC;gBACpB,MAAM,EAAE,iCAAyB;aAClC,CAAC;YACF,GAAG,OAAO;SACX,CAAC,CAAC;QAEH,MAAM,MAAM,GAAG;YACb,MAAM,EAAE,UAAU,CAAC,MAAM;YACzB,KAAK,EAAE,UAAU,CAAC,KAAK,GAAG,QAAQ;SACnC,CAAC;QAEF,gBAAI,CAAC,KAAK,CAAC,SAAS,EAAE,CAAC;YACrB,MAAM,EAAE,UAAU;YAClB,MAAM,EAAE,EAAE,KAAK,EAAE,MAAM,EAAE,QAAQ,EAAE,YAAY,EAAE,SAAS,EAAE,OAAO,EAAE;YACrE,MAAM;YACN,QAAQ,EAAE,IAAI,CAAC,GAAG,EAAE,GAAG,KAAK;SAC7B,CAAC,CAAC;QAEH,OAAO,MAAM,CAAC;IAChB,CAAC;IAAC,OAAO,KAAK,EAAE,CAAC;QAEf,gBAAI,CAAC,KAAK,CAAC,OAAO,EAAE,CAAC;YACnB,MAAM,EAAE,UAAU;YAClB,KAAK;YACL,QAAQ,EAAE,IAAI,CAAC,GAAG,EAAE,GAAG,KAAK;SAC7B,CAAC,CAAC;QAEH,MAAM,KAAK,CAAC;IACd,CAAC;AACH,CAAC,CAAA;AAYM,MAAM,KAAK,GAAG,KAAK,EACxB,KAAiB,EACjB,QAAgB,EAChB,YAAoB,EACpB,SAAiB,EACjB,UAAuB,EAAE,EACK,EAAE,CAAC,MAAM,CACvC,KAAK,EACL,QAAQ,EACR,YAAY,EACZ,SAAS,EACT,gBAAI,CAAC,aAAa,EAClB,OAAO,EACP,OAAO,CACR,CAAC;AAdW,QAAA,KAAK,SAchB;AAYK,MAAM,KAAK,GAAG,KAAK,EACxB,KAAiB,EACjB,QAAgB,EAChB,YAAoB,EACpB,SAAiB,EACjB,UAAuB,EAAE,EACK,EAAE,CAAC,MAAM,CACvC,KAAK,EACL,QAAQ,EACR,YAAY,EACZ,SAAS,EACT,CAAC,EACD,OAAO,EACP,OAAO,CACR,CAAC;AAdW,QAAA,KAAK,SAchB"}
1
+ {"version":3,"file":"index.js","sourceRoot":"","sources":["../src/index.ts"],"names":[],"mappings":";;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;AAAA,8DAAiC;AACjC,mDAAqC;AACrC,2BAGY;AAEZ,sDAA4B;AAC5B,qCAOkB;AAClB,yCAIoB;AACpB,mCASiB;AAEjB,2CAAyB;AACzB,mCAAmC;AAA1B,kHAAA,OAAO,OAAA;AAChB,0CAAwB;AAExB,MAAM,SAAS,GAAG,GAAG,EAAE,CAAC,qBAAM,CAAC,WAAW,CAAC,EAAE,CAAC,CAAC,QAAQ,CAAC,KAAK,CAAC,CAAC;AAYxD,MAAM,SAAS,GAAG,KAAK,EAC5B,MAAc,EACd,MAAc,EACd,YAAoB,EACpB,SAAiB,EACjB,UAA0B,EAAE,EACJ,EAAE;IAC1B,MAAM,KAAK,GAAG,IAAI,CAAC,GAAG,EAAE,CAAC;IACzB,IAAI,CAAC;QACH,MAAM,UAAU,GAAG,QAAQ,CAAC,MAAM,CAAC,+BAAsB,EAAE,EAAE,MAAM,EAAE,MAAM,EAAE,CAAC,CAAC;QAE/E,MAAM,EAAE,MAAM,EAAE,MAAM,EAAE,GAAG,MAAM,IAAA,iBAAY,EAAC;YAC7C,GAAG,OAAO;YACT,KAAK,EAAE,IAAA,mBAAQ,EAAC,YAAY,EAAE,SAAS,CAAC;YACxC,MAAM,EAAE,QAAQ,CAAC,MAAM,CAAC,iCAAwB,EAAE,EAAE,OAAO,EAAE,SAAS,EAAE,EAAE,CAAC;YAC3E,MAAM,EAAE,UAAU;YAClB,MAAM,EAAE,WAAM,CAAC,MAAM,CAAC;gBACpB,MAAM,EAAE,0BAAkB;aAC3B,CAAC;SACJ,CAAC,CAAC;QAEF,gBAAI,CAAC,KAAK,CAAC,SAAS,EAAE,CAAC;YACrB,MAAM,EAAE,WAAW;YACnB,MAAM,EAAE,EAAE,MAAM,EAAE,MAAM,EAAE,YAAY,EAAE,SAAS,EAAE,OAAO,EAAE;YAC5D,MAAM;YACN,QAAQ,EAAE,IAAI,CAAC,GAAG,EAAE,GAAG,KAAK;SAC7B,CAAC,CAAC;QAEH,OAAO,MAAM,CAAC;IAChB,CAAC;IAAC,OAAO,KAAK,EAAE,CAAC;QAEf,gBAAI,CAAC,KAAK,CAAC,OAAO,EAAE,CAAC;YACnB,MAAM,EAAE,WAAW;YACnB,KAAK;YACL,QAAQ,EAAE,IAAI,CAAC,GAAG,EAAE,GAAG,KAAK;SAC7B,CAAC,CAAC;QAEH,MAAM,KAAK,CAAC;IACd,CAAC;AACH,CAAC,CAAA;AAvCY,QAAA,SAAS,aAuCrB;AAED,MAAM,MAAM,GAAG,KAAK,EAClB,KAAkB,EAClB,QAAgB,EAChB,YAAoB,EACpB,SAAiB,EACjB,QAAgB,EAChB,UAAwB,EACxB,UAA0B,EAAE,EACG,EAAE;IACjC,IAAI,OAAO,KAAK,KAAK,QAAQ,EAAE,CAAC;QAC9B,KAAK,GAAG,EAAE,KAAK,EAAE,EAAE,EAAE,MAAM,EAAE,KAAK,EAAE,CAAC;IACvC,CAAC;IACD,MAAM,EAAE,KAAK,EAAE,MAAM,EAAE,GAAG,KAAK,CAAC;IAEhC,MAAM,KAAK,GAAG,IAAI,CAAC,GAAG,EAAE,CAAC;IAEzB,IAAI,CAAC;QACH,MAAM,KAAK,GAAG,IAAA,mBAAQ,EAAC,YAAY,EAAE,SAAS,CAAC,CAAC;QAChD,IAAI,KAAK,GAAG,MAAM,IAAA,mBAAQ,EAAC,QAAQ,CAAC,CAAC;QAErC,IAAI,CAAC,KAAK,EAAE,CAAC;YACX,MAAM,WAAW,GAAG,QAAQ,CAAC,MAAM,CAAC,2BAAkB,EAAE,EAAE,QAAQ,EAAE,CAAC,CAAC;YAEtE,MAAM,EAAE,MAAM,EAAE,WAAW,EAAE,GAAG,MAAM,IAAA,iBAAY,EAAC;gBACjD,GAAG,OAAO;gBACV,MAAM,EAAE,SAAS;gBACjB,KAAK;gBACL,MAAM,EAAE,WAAW;gBACnB,MAAM,EAAE,WAAM,CAAC,MAAM,CAAC;oBACpB,MAAM,EAAE,8BAAsB;iBAC/B,CAAC;aACH,CAAC,CAAC;YAEH,KAAK,GAAG,WAAW,CAAC,KAAK,CAAC;YAE1B,IAAA,mBAAQ,EAAC,QAAQ,EAAE,WAAW,CAAC,KAAK,CAAC,CAAC;QACxC,CAAC;QAED,MAAM,gBAAgB,GAAG,QAAQ,CAAC,MAAM,CACtC,KAAK,CAAC,CAAC,CAAC,8BAAqB,CAAC,CAAC,CAAC,oCAA2B,EAC3D;YACE,QAAQ;YACR,KAAK,EAAE,KAAK,CAAC,IAAI,CAAC,MAAM,CAAC;YACzB,KAAK,EAAE,KAAK;YACZ,MAAM,EAAE,MAAM;YACd,QAAQ;SACT,CAAC,CAAC;QAEL,MAAM,EAAE,MAAM,EAAE,UAAU,EAAE,GAAG,MAAM,IAAA,iBAAY,EAAC;YAChD,GAAG,OAAO;YACV,KAAK;YACL,MAAM,EAAE,QAAQ,CAAC,MAAM,CAAC,4BAAmB,EAAE,EAAE,OAAO,EAAE,SAAS,EAAE,EAAE,CAAC;YACtE,MAAM,EAAE,gBAAgB;YACxB,MAAM,EAAE,WAAM,CAAC,MAAM,CAAC;gBACpB,MAAM,EAAE,iCAAyB;aAClC,CAAC;YACF,GAAG,OAAO;SACX,CAAC,CAAC;QAEH,MAAM,MAAM,GAAG;YACb,MAAM,EAAE,UAAU,CAAC,MAAM;YACzB,KAAK,EAAE,UAAU,CAAC,KAAK,GAAG,QAAQ;SACnC,CAAC;QAEF,gBAAI,CAAC,KAAK,CAAC,SAAS,EAAE,CAAC;YACrB,MAAM,EAAE,UAAU;YAClB,MAAM,EAAE,EAAE,KAAK,EAAE,MAAM,EAAE,QAAQ,EAAE,YAAY,EAAE,SAAS,EAAE,OAAO,EAAE;YACrE,MAAM;YACN,QAAQ,EAAE,IAAI,CAAC,GAAG,EAAE,GAAG,KAAK;SAC7B,CAAC,CAAC;QAEH,OAAO,MAAM,CAAC;IAChB,CAAC;IAAC,OAAO,KAAK,EAAE,CAAC;QAEf,gBAAI,CAAC,KAAK,CAAC,OAAO,EAAE,CAAC;YACnB,MAAM,EAAE,UAAU;YAClB,KAAK;YACL,QAAQ,EAAE,IAAI,CAAC,GAAG,EAAE,GAAG,KAAK;SAC7B,CAAC,CAAC;QAEH,MAAM,KAAK,CAAC;IACd,CAAC;AACH,CAAC,CAAA;AAYM,MAAM,KAAK,GAAG,KAAK,EACxB,KAAkB,EAClB,QAAgB,EAChB,YAAoB,EACpB,SAAiB,EACjB,UAA0B,EAAE,EACG,EAAE,CAAC,MAAM,CACxC,KAAK,EACL,QAAQ,EACR,YAAY,EACZ,SAAS,EACT,gBAAI,CAAC,aAAa,EAClB,OAAO,EACP,OAAO,CACR,CAAC;AAdW,QAAA,KAAK,SAchB;AAYK,MAAM,KAAK,GAAG,KAAK,EACxB,KAAkB,EAClB,QAAgB,EAChB,YAAoB,EACpB,SAAiB,EACjB,UAA0B,EAAE,EACG,EAAE,CAAC,MAAM,CACxC,KAAK,EACL,QAAQ,EACR,YAAY,EACZ,SAAS,EACT,CAAC,EACD,OAAO,EACP,OAAO,CACR,CAAC;AAdW,QAAA,KAAK,SAchB"}
package/dst/prompt.d.ts CHANGED
@@ -4,6 +4,7 @@
4
4
  */
5
5
  export declare const LLM_RUBRIC_SYSTEM_PROMPT = "\nInstruction #{{hash_id}}.\n\nYou are grading output according to a user-specified rubric. If the statement in the rubric is true, then the output passes the test. And score 1.0 indicates full compliance with the rubric, but 0.0 indicates no compliance at all. You respond with a JSON object with this structure: {reason: string, pass: boolean, score: number}\n\nExamples:\n\n<Output>Hello world</Output>\n<Rubric>Content contains a greeting</Rubric>\n{\"reason\": \"the content contains the word 'Hello'\", \"pass\": true, \"score\": 1.0}\n\n<Output>Avast ye swabs, repel the invaders!</Output>\n<Rubric>Does not speak like a pirate</Rubric>\n{\"reason\": \"'avast ye' is a common pirate term\", \"pass\": false, \"score\": 0.0}\n";
6
6
  export declare const LLM_RUBRIC_USER_PROMPT = "<Output>\n{{output}}\n</Output>\n<Rubric>\n{{rubric}}\n</Rubric>";
7
+ export declare const GEVAL_SYSTEM_PROMPT = "\nInstruction #{{hash_id}}.\n\nYou are an impartial Judge. Your role is to perform an independent audit according to provided criteria.\n";
7
8
  export declare const GEVAL_STEPS_PROMPT = "\nGiven an evaluation criteria which outlines how you should judge a piece of text, generate 3-4 concise evaluation steps applicable to any text based on the criteria below and designed to confirm the criteria.\n\n**EVALUATION CRITERIA**\n<Criteria>\n{{criteria}}\n</Criteria>\n\n**OUTPUT FORMAT**\nIMPORTANT:\n- Return output ONLY as a minified JSON object (no code fences).\n- The JSON object must contain a single key, \"steps\", whose value is a list of strings.\n- Each string must represent one evaluation step.\n- Do NOT include any explanations, commentary, extra text, or additional formatting.\n\nFormat:\n{\"steps\": <list_of_strings>}\n\nExample:\n{\"steps\":[\"<Evaluation Step 1>\",\"<Evaluation Step 2>\",\"<Evaluation Step 3>\",\"<Evaluation Step 4>\"]}\n\nHere are the 3-4 concise evaluation steps, formatted as required in a minified JSON:\nJSON:\n";
8
- export declare const GEVAL_EVALUATE_REPLY_PROMPT = "\nInstruction #{{hash_id}}.\n\nYou will be given one Reply below. Your task is to rate the Reply on one metric.\nPlease make sure you read and understand these instructions carefully. Please keep this document open while reviewing, and refer to it as needed.\n\n**Evaluation Criteria**\n<Criteria>\n{{criteria}}\n</Criteria>\n\n**Evaluation Steps**\n- {{steps}}\nGiven the evaluation steps, return a JSON with two keys: \n 1) a \"score\" key that MUST be an integer from 0 to {{maxScore}}, where {{maxScore}} indicates that the Evaluation Criteria is fully and clearly present in the Reply according to the Evaluation Steps, and 0 indicates the total absence of the Evaluation Criteria;\n 2) a \"reason\" key, a reason for the given score, but DO NOT QUOTE THE SCORE in your reason. Please mention specific information from Reply in your reason, but be very concise with it!\n\n**Reply**\n<Reply>\n{{output}}\n</Reply>\n\n**OUTPUT FORMAT**\nIMPORTANT: \n- Return output ONLY as a minified JSON object (no code fences).\n- The JSON object must contain exactly two keys: \"score\" and \"reason\".\n- No additional words, explanations, or formatting are needed.\n- Absolutely no additional text, explanations, line breaks, or formatting outside the JSON object are allowed.\n\nExample JSON:\n{\"score\":0,\"reason\":\"The text of Reply does not follow the evaluation criteria provided.\"}\n\nHere is the final evaluation in the required minified JSON format:\nJSON:\n";
9
- export declare const GEVAL_EVALUATE_PROMPT = "\nInstruction #{{hash_id}}.\n\nYou will be given one Reply for a Prompt below. Your task is to rate the Reply on one metric.\nPlease make sure you read and understand these instructions carefully. Please keep this document open while reviewing, and refer to it as needed.\n\n**Evaluation Criteria**\n<Criteria>\n{{criteria}}\n</Criteria>\n\n**Evaluation Steps**\n- {{steps}}\nGiven the evaluation steps, return a JSON with two keys: \n 1) a \"score\" key that MUST be an integer from 0 to {{maxScore}}, where {{maxScore}} indicates that the Evaluation Criteria is fully and clearly present in the Reply according to the Evaluation Steps, and 0 indicates the total absence of the Evaluation Criteria;\n 2) a \"reason\" key, a reason for the given score, but DO NOT QUOTE THE SCORE in your reason. Please mention specific information from Prompt and Reply in your reason, but be very concise with it!\n\n**Prompt**\n<Prompt>\n{{input}}\n</Prompt>\n\n**Reply**\n<Reply>\n{{output}}\n</Reply>\n\n**OUTPUT FORMAT**\nIMPORTANT: \n- Return output ONLY as a minified JSON object (no code fences).\n- The JSON object must contain exactly two keys: \"score\" and \"reason\".\n- No additional words, explanations, or formatting are needed.\n- Absolutely no additional text, explanations, line breaks, or formatting outside the JSON object are allowed.\n\nExample JSON:\n{\"score\":0,\"reason\":\"The text of Reply does not follow the evaluation criteria provided.\"}\n\nHere is the final evaluation in the required minified JSON format:\nJSON:\n";
9
+ export declare const GEVAL_EVALUATE_REPLY_PROMPT = "\nYou will be given one Reply below. Your task is to rate the Reply on one metric.\nPlease make sure you read and understand these instructions carefully. Please keep this document open while reviewing, and refer to it as needed.\n\n**Evaluation Criteria**\n<Criteria>\n{{criteria}}\n</Criteria>\n\n**Evaluation Steps**\n- {{steps}}\nGiven the evaluation steps, return a JSON with two keys: \n 1) a \"score\" key that MUST be an integer from 0 to {{maxScore}}, where {{maxScore}} indicates that the Evaluation Criteria is fully and clearly present in the Reply according to the Evaluation Steps, and 0 indicates the total absence of the Evaluation Criteria;\n 2) a \"reason\" key, a reason for the given score, but DO NOT QUOTE THE SCORE in your reason. Please mention specific information from Reply in your reason, but be very concise with it!\n\n**Reply**\n<Reply>\n{{output}}\n</Reply>\n\n**OUTPUT FORMAT**\nIMPORTANT: \n- Return output ONLY as a minified JSON object (no code fences).\n- The JSON object must contain exactly two keys: \"score\" and \"reason\".\n- No additional words, explanations, or formatting are needed.\n- Absolutely no additional text, explanations, line breaks, or formatting outside the JSON object are allowed.\n\nExample JSON:\n{\"score\":0,\"reason\":\"The text of Reply does not follow the evaluation criteria provided.\"}\n\nHere is the final evaluation in the required minified JSON format:\nJSON:\n";
10
+ export declare const GEVAL_EVALUATE_PROMPT = "\nYou will be given one Reply for a Prompt below. Your task is to rate the Reply on one metric.\nPlease make sure you read and understand these instructions carefully. Please keep this document open while reviewing, and refer to it as needed.\n\n**Evaluation Criteria**\n<Criteria>\n{{criteria}}\n</Criteria>\n\n**Evaluation Steps**\n- {{steps}}\nGiven the evaluation steps, return a JSON with two keys: \n 1) a \"score\" key that MUST be an integer from 0 to {{maxScore}}, where {{maxScore}} indicates that the Evaluation Criteria is fully and clearly present in the Reply according to the Evaluation Steps, and 0 indicates the total absence of the Evaluation Criteria;\n 2) a \"reason\" key, a reason for the given score, but DO NOT QUOTE THE SCORE in your reason. Please mention specific information from Prompt and Reply in your reason, but be very concise with it!\n\n**Prompt**\n<Prompt>\n{{input}}\n</Prompt>\n\n**Reply**\n<Reply>\n{{output}}\n</Reply>\n\n**OUTPUT FORMAT**\nIMPORTANT: \n- Return output ONLY as a minified JSON object (no code fences).\n- The JSON object must contain exactly two keys: \"score\" and \"reason\".\n- No additional words, explanations, or formatting are needed.\n- Absolutely no additional text, explanations, line breaks, or formatting outside the JSON object are allowed.\n\nExample JSON:\n{\"score\":0,\"reason\":\"The text of Reply does not follow the evaluation criteria provided.\"}\n\nHere is the final evaluation in the required minified JSON format:\nJSON:\n";
package/dst/prompt.js CHANGED
@@ -4,7 +4,7 @@
4
4
  * Copyright (c) 2025 Promptfoo
5
5
  */
6
6
  Object.defineProperty(exports, "__esModule", { value: true });
7
- exports.GEVAL_EVALUATE_PROMPT = exports.GEVAL_EVALUATE_REPLY_PROMPT = exports.GEVAL_STEPS_PROMPT = exports.LLM_RUBRIC_USER_PROMPT = exports.LLM_RUBRIC_SYSTEM_PROMPT = void 0;
7
+ exports.GEVAL_EVALUATE_PROMPT = exports.GEVAL_EVALUATE_REPLY_PROMPT = exports.GEVAL_STEPS_PROMPT = exports.GEVAL_SYSTEM_PROMPT = exports.LLM_RUBRIC_USER_PROMPT = exports.LLM_RUBRIC_SYSTEM_PROMPT = void 0;
8
8
  exports.LLM_RUBRIC_SYSTEM_PROMPT = `
9
9
  Instruction #{{hash_id}}.
10
10
 
@@ -21,6 +21,11 @@ Examples:
21
21
  {"reason": "'avast ye' is a common pirate term", "pass": false, "score": 0.0}
22
22
  `;
23
23
  exports.LLM_RUBRIC_USER_PROMPT = '<Output>\n{{output}}\n</Output>\n<Rubric>\n{{rubric}}\n</Rubric>';
24
+ exports.GEVAL_SYSTEM_PROMPT = `
25
+ Instruction #{{hash_id}}.
26
+
27
+ You are an impartial Judge. Your role is to perform an independent audit according to provided criteria.
28
+ `;
24
29
  exports.GEVAL_STEPS_PROMPT = `
25
30
  Given an evaluation criteria which outlines how you should judge a piece of text, generate 3-4 concise evaluation steps applicable to any text based on the criteria below and designed to confirm the criteria.
26
31
 
@@ -46,8 +51,6 @@ Here are the 3-4 concise evaluation steps, formatted as required in a minified J
46
51
  JSON:
47
52
  `;
48
53
  exports.GEVAL_EVALUATE_REPLY_PROMPT = `
49
- Instruction #{{hash_id}}.
50
-
51
54
  You will be given one Reply below. Your task is to rate the Reply on one metric.
52
55
  Please make sure you read and understand these instructions carefully. Please keep this document open while reviewing, and refer to it as needed.
53
56
 
@@ -81,8 +84,6 @@ Here is the final evaluation in the required minified JSON format:
81
84
  JSON:
82
85
  `;
83
86
  exports.GEVAL_EVALUATE_PROMPT = `
84
- Instruction #{{hash_id}}.
85
-
86
87
  You will be given one Reply for a Prompt below. Your task is to rate the Reply on one metric.
87
88
  Please make sure you read and understand these instructions carefully. Please keep this document open while reviewing, and refer to it as needed.
88
89
 
package/dst/prompt.js.map CHANGED
@@ -1 +1 @@
1
- {"version":3,"file":"prompt.js","sourceRoot":"","sources":["../src/prompt.ts"],"names":[],"mappings":";AAAA;;;GAGG;;;AAKU,QAAA,wBAAwB,GAAG;;;;;;;;;;;;;;CAcvC,CAAC;AAKW,QAAA,sBAAsB,GAAG,kEAAkE,CAAC;AAK5F,QAAA,kBAAkB,GAAG;;;;;;;;;;;;;;;;;;;;;;;CAuBjC,CAAC;AAKW,QAAA,2BAA2B,GAAG;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;CAkC1C,CAAC;AAKW,QAAA,qBAAqB,GAAG;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;CAuCpC,CAAC"}
1
+ {"version":3,"file":"prompt.js","sourceRoot":"","sources":["../src/prompt.ts"],"names":[],"mappings":";AAAA;;;GAGG;;;AAKU,QAAA,wBAAwB,GAAG;;;;;;;;;;;;;;CAcvC,CAAC;AAKW,QAAA,sBAAsB,GAAG,kEAAkE,CAAC;AAE5F,QAAA,mBAAmB,GAAG;;;;CAIlC,CAAC;AAKW,QAAA,kBAAkB,GAAG;;;;;;;;;;;;;;;;;;;;;;;CAuBjC,CAAC;AAKW,QAAA,2BAA2B,GAAG;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;CAgC1C,CAAC;AAKW,QAAA,qBAAqB,GAAG;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;CAqCpC,CAAC"}
package/dst/types.d.ts CHANGED
@@ -1,5 +1,6 @@
1
- export type EvalMethod = 'bEval' | 'gEval' | 'llmRubric';
2
- export type GEvalInput = string | {
1
+ import { z } from 'zod';
2
+ export type TJudgeMethod = 'bEval' | 'gEval' | 'llmRubric';
3
+ export type TGevalInput = string | {
3
4
  query: string;
4
5
  answer: string;
5
6
  };
@@ -7,20 +8,32 @@ export interface IStepsCache {
7
8
  set(key: string, value: string[]): Promise<void>;
8
9
  get(key: string): Promise<string[] | undefined>;
9
10
  }
10
- export interface EvalOptions {
11
- temperature?: number;
12
- providerOptions?: Record<string, any>;
13
- }
14
- export interface EvaHooks {
11
+ export type TVercelOptions = Record<string, any>;
12
+ export interface IJudgeHooks {
15
13
  onSuccess?: (data: {
16
- method: EvalMethod;
14
+ method: TJudgeMethod;
17
15
  params: any;
18
16
  result: any;
19
17
  duration: number;
20
18
  }) => void;
21
19
  onError?: (data: {
22
- method: EvalMethod;
20
+ method: TJudgeMethod;
23
21
  error: any;
24
22
  duration: number;
25
23
  }) => void;
26
24
  }
25
+ export declare const RubricResultSchema: z.ZodObject<{
26
+ reason: z.ZodString;
27
+ pass: z.ZodBoolean;
28
+ score: z.ZodNumber;
29
+ }, z.core.$strip>;
30
+ export type TRubricResult = z.infer<typeof RubricResultSchema>;
31
+ export declare const GevalStepsResultSchema: z.ZodObject<{
32
+ steps: z.ZodArray<z.ZodString>;
33
+ }, z.core.$strip>;
34
+ export type TGevalStepsResult = z.infer<typeof GevalStepsResultSchema>;
35
+ export declare const GevalEvaluateResultSchema: z.ZodObject<{
36
+ reason: z.ZodString;
37
+ score: z.ZodNumber;
38
+ }, z.core.$strip>;
39
+ export type TGevalEvaluateResult = z.infer<typeof GevalEvaluateResultSchema>;
package/dst/types.js CHANGED
@@ -1,3 +1,17 @@
1
1
  "use strict";
2
2
  Object.defineProperty(exports, "__esModule", { value: true });
3
+ exports.GevalEvaluateResultSchema = exports.GevalStepsResultSchema = exports.RubricResultSchema = void 0;
4
+ const zod_1 = require("zod");
5
+ exports.RubricResultSchema = zod_1.z.object({
6
+ reason: zod_1.z.string().describe('Detailed explanation of the score based on the rubric'),
7
+ pass: zod_1.z.boolean().describe('Whether the output satisfies the minimum requirements'),
8
+ score: zod_1.z.number().min(0).max(1).describe('Numeric representation of quality'),
9
+ });
10
+ exports.GevalStepsResultSchema = zod_1.z.object({
11
+ steps: zod_1.z.array(zod_1.z.string()).describe('List of concise evaluation steps derived from the criteria'),
12
+ });
13
+ exports.GevalEvaluateResultSchema = zod_1.z.object({
14
+ reason: zod_1.z.string().describe('Detailed explanation of the score based on the rubric'),
15
+ score: zod_1.z.number().min(0).describe('Numeric representation of quality'),
16
+ });
3
17
  //# sourceMappingURL=types.js.map
package/dst/types.js.map CHANGED
@@ -1 +1 @@
1
- {"version":3,"file":"types.js","sourceRoot":"","sources":["../src/types.ts"],"names":[],"mappings":""}
1
+ {"version":3,"file":"types.js","sourceRoot":"","sources":["../src/types.ts"],"names":[],"mappings":";;;AAAA,6BAAwB;AA2DX,QAAA,kBAAkB,GAAG,OAAC,CAAC,MAAM,CAAC;IAEzC,MAAM,EAAE,OAAC,CAAC,MAAM,EAAE,CAAC,QAAQ,CAAC,uDAAuD,CAAC;IAEpF,IAAI,EAAE,OAAC,CAAC,OAAO,EAAE,CAAC,QAAQ,CAAC,uDAAuD,CAAC;IAEnF,KAAK,EAAE,OAAC,CAAC,MAAM,EAAE,CAAC,GAAG,CAAC,CAAC,CAAC,CAAC,GAAG,CAAC,CAAC,CAAC,CAAC,QAAQ,CAAC,mCAAmC,CAAC;CAC9E,CAAC,CAAC;AAUU,QAAA,sBAAsB,GAAG,OAAC,CAAC,MAAM,CAAC;IAE7C,KAAK,EAAE,OAAC,CAAC,KAAK,CAAC,OAAC,CAAC,MAAM,EAAE,CAAC,CAAC,QAAQ,CAAC,4DAA4D,CAAC;CAClG,CAAC,CAAC;AAUU,QAAA,yBAAyB,GAAG,OAAC,CAAC,MAAM,CAAC;IAEhD,MAAM,EAAE,OAAC,CAAC,MAAM,EAAE,CAAC,QAAQ,CAAC,uDAAuD,CAAC;IAEpF,KAAK,EAAE,OAAC,CAAC,MAAM,EAAE,CAAC,GAAG,CAAC,CAAC,CAAC,CAAC,QAAQ,CAAC,mCAAmC,CAAC;CACvE,CAAC,CAAC"}
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@eva-llm/eva-judge",
3
- "version": "1.0.3",
3
+ "version": "1.0.5",
4
4
  "description": "LLM-as-a-Judge abstraction layer using ai-sdk and plugins",
5
5
  "main": "dst/index.js",
6
6
  "types": "dst/index.d.ts",
@@ -48,6 +48,7 @@
48
48
  "scripts": {
49
49
  "build": "tsc",
50
50
  "example": "ts-node scripts/example.ts",
51
- "test": "jest"
51
+ "test": "jest",
52
+ "test:coverage": "jest --coverage"
52
53
  }
53
54
  }