npm - @eva-llm/eva-judge - Versions diffs - 1.0.4 → 1.0.5 - Mend

@eva-llm/eva-judge 1.0.4 → 1.0.5

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (10) hide show

package/README.md CHANGED Viewed

@@ -2,11 +2,15 @@
 A TypeScript/Node.js library for automated text evaluation with AI analysis through **LLM-Rubric**, **G-Eval**, or **B-Eval** (Binary G-Eval).
+---
 ## Project Inspiration & Attribution
 This project is inspired by [promptfoo](https://github.com/promptfoo/promptfoo), including [author's work](https://github.com/promptfoo/promptfoo/issues?q=state%3Amerged%20is%3Apr%20author%3A%40schipiga) on the [G-Eval](https://www.promptfoo.dev/docs/configuration/expected-outputs/model-graded/g-eval/) framework there.<br />
 The LLM-as-a-Judge prompts are copied from promptfoo and adapted for project-specific issues.
+---
 ## Quick Start
 ```bash
@@ -35,9 +39,16 @@ await bEval({ query, answer }, 'answer is coherent to question', 'openai', 'gpt-
 // { score: 1, reason: 'The answer is definitely coherent to the question' }
 ```
-**NOTE!** For better judging the factual standard is temperature=0
+**NOTE!** For robust judging the factual standard is `temperature=0`
+---
 ## API
+Judge `options` forward any Vercel AI SDK [generateText options](https://ai-sdk.dev/docs/reference/ai-sdk-core/generate-text#api-signature).
+> NOTE! Internal values such as `model`, `system`, and `prompt` are managed by the Judge and will override corresponding values in the `options` object to ensure evaluation integrity.
 ### llmRubric
 Evaluates an output against a rubric using an LLM. Returns a reason, pass/fail, and normalized score.
@@ -48,7 +59,7 @@ const result = await llmRubric(
   rubric,      // string: the rubric to use
   provider,    // string: LLM provider name
   model,       // string: LLM model name
-  options      // optional: { temperature, providerOptions }
+  options      // optional: { Vercel ai-sdk options }
 );
 // result: { reason: string, pass: boolean, score: number }
 ```
@@ -63,7 +74,7 @@ const result = await gEval(
   criteria,    // string: evaluation criteria
   provider,    // string: LLM provider name
   model,       // string: LLM model name
-  options      // optional: { temperature, providerOptions }
+  options      // optional: { Vercel ai-sdk options }
 );
 // result: { reason: string, score: number }
 ```
@@ -78,22 +89,22 @@ const result = await bEval(
   criteria,    // string: evaluation criteria
   provider,    // string: LLM provider name
   model,       // string: LLM model name
-  options      // optional: { temperature, providerOptions }
+  options      // optional: { Vercel ai-sdk options }
 );
 // result: { reason: string, score: number } // score will be 0 or 1
 ```
 ---
-### G-Eval vs B-Eval
+## G-Eval vs B-Eval
 The divergence between **G-Eval** and **B-Eval** reveals a critical **'Judgement Gap'**:
-* **G-Eval (The Auditor):** Scoring on a `0.0-1.0` scale allows the model to stay in a 'comfort zone', smoothing over internal contradictions.
-* **B-Eval (The Judge):** A binary `0|1` choice forces **Adjudication**. This 'forced choice' triggers the **Alignment Paradox**, exposing the struggle between **RLHF training** and objective facts.
+- **G-Eval (The Auditor):** Scoring on a `0.0-1.0` scale allows the model to stay in a 'comfort zone', smoothing over internal contradictions.
+- **B-Eval (The Judge):** A binary `0|1` choice forces **Adjudication**. This 'forced choice' triggers the **Alignment Paradox**, exposing the struggle between **RLHF training** and objective facts.
-**Conclusion:** **B-Eval** is a superior stress-test for **Epistemic Honesty**. By stripping away the safety net of grey-zone scoring, it reveals exactly where logic breaks under the weight of normative priors.
+**B-Eval** is a superior stress-test for **Epistemic Honesty**. By stripping away the safety net of grey-zone scoring, it reveals exactly where logic breaks under the weight of normative priors.
-More details in EVA-LLM [Dark Teaming Manifesto](https://eva-llm.github.io/dark-teaming).
+More details in [Dark Teaming Manifesto](https://eva-llm.github.io/dark-teaming).
 ---
@@ -116,6 +127,8 @@ Specify the provider name and model name in `llmRubric`, `gEval`, or `bEval`.
 > **Note:** Each provider integration is based on its respective ai-sdk package. Be sure to follow the provider's documentation for setup and authentication. Most providers require you to export an API key or token as an environment variable (e.g., `export OPENAI_API_KEY=...`).
+---
 ## Enterprise
 ### LLM Judge Hooks
@@ -147,7 +160,7 @@ Config.enableStepsCache();
 Config.disableStepsCache();
 ```
-### G-Eval/B-Eval Evaluation Steps Persistent Storage
+### G-Eval / B-Eval Evaluation Steps Persistent Storage
 For advanced use, you can implement your own cache storage for evaluation steps (e.g., using Redis or another backend) by providing a custom cache via `setStepsCache()`:
@@ -160,7 +173,3 @@ class RedisCache implements IStepsCache {
 Config.setStepsCache(RedisCache);
 ```
-## License
-MIT

package/dst/config.d.ts CHANGED Viewed

@@ -1,6 +1,6 @@
 import { LRUCache } from 'lru-cache';
 import { type LanguageModel } from 'ai';
-import { type EvaHooks, type IStepsCache } from './types';
+import { type IJudgeHooks, type IStepsCache } from './types';
 declare const _default: {
     gevalMaxScore: number;
     isModelCached: boolean;
@@ -14,7 +14,7 @@ declare const _default: {
     disableModelCache(): void;
     enableStepsCache(): void;
     disableStepsCache(): void;
-    hooks: EvaHooks;
-    setHooks(hooks: EvaHooks): void;
+    hooks: IJudgeHooks;
+    setHooks(hooks: IJudgeHooks): void;
 };
 export default _default;

package/dst/config.js.map CHANGED Viewed

	@@ -1 +1 @@
1	- {"version":3,"file":"config.js","sourceRoot":"","sources":["../src/config.ts"],"names":[],"mappings":";;AAAA,yCAAqC;~~AASrC~~,MAAM,kBAAkB;IACd,KAAK,CAA6B;IAM1C,YAAY,IAAY;QACtB,IAAI,CAAC,KAAK,GAAG,IAAI,oBAAQ,CAAC,EAAE,GAAG,EAAE,IAAI,EAAE,CAAC,CAAC;IAC3C,CAAC;IAMD,KAAK,CAAC,GAAG,CAAC,GAAW,EAAE,KAAe;QACpC,IAAI,CAAC,KAAK,CAAC,GAAG,CAAC,GAAG,EAAE,KAAK,CAAC,CAAC;IAC7B,CAAC;IAMD,KAAK,CAAC,GAAG,CAAC,GAAW;QACnB,OAAO,IAAI,CAAC,KAAK,CAAC,GAAG,CAAC,GAAG,CAAC,CAAC;IAC7B,CAAC;CACF;AAOD,kBAAe;IAIb,aAAa,EAAE,EAAE;IAIjB,aAAa,EAAE,IAAI;IAInB,aAAa,EAAE,IAAI;IAInB,UAAU,EAAE,IAAI,oBAAQ,CAAwB,EAAE,GAAG,EAAE,GAAG,EAAE,CAAC;IAI7D,UAAU,EAAE,IAAI,kBAAkB,CAAC,GAAG,CAAgB;IAKtD,iBAAiB,CAAC,OAAe,GAAG;QAClC,IAAI,CAAC,UAAU,GAAG,IAAI,oBAAQ,CAAwB,EAAE,GAAG,EAAE,IAAI,EAAE,CAAC,CAAC;IACvE,CAAC;IAKD,iBAAiB,CAAC,OAAe,GAAG;QAClC,IAAI,CAAC,UAAU,GAAG,IAAI,kBAAkB,CAAC,IAAI,CAAgB,CAAC;IAChE,CAAC;IAKD,aAAa,CAAC,KAAkB;QAC9B,IAAI,CAAC,UAAU,GAAG,KAAK,CAAC;IAC1B,CAAC;IAID,gBAAgB;QACd,IAAI,CAAC,aAAa,GAAG,IAAI,CAAC;IAC5B,CAAC;IAID,iBAAiB;QACf,IAAI,CAAC,aAAa,GAAG,KAAK,CAAC;IAC7B,CAAC;IAID,gBAAgB;QACd,IAAI,CAAC,aAAa,GAAG,IAAI,CAAC;IAC5B,CAAC;IAID,iBAAiB;QACf,IAAI,CAAC,aAAa,GAAG,KAAK,CAAC;IAC7B,CAAC;IAID,KAAK,EAAE,~~EAAc~~;~~IAKrB~~,QAAQ,CAAC,~~KAAe~~;~~QACtB~~,IAAI,CAAC,KAAK,GAAG,KAAK,CAAC;IACrB,CAAC;CACF,CAAC"}
1	+ {"version":3,"file":"config.js","sourceRoot":"","sources":["../src/config.ts"],"names":[],"mappings":";;AAAA,yCAAqC;AAYrC,MAAM,kBAAkB;IACd,KAAK,CAA6B;IAM1C,YAAY,IAAY;QACtB,IAAI,CAAC,KAAK,GAAG,IAAI,oBAAQ,CAAC,EAAE,GAAG,EAAE,IAAI,EAAE,CAAC,CAAC;IAC3C,CAAC;IAMD,KAAK,CAAC,GAAG,CAAC,GAAW,EAAE,KAAe;QACpC,IAAI,CAAC,KAAK,CAAC,GAAG,CAAC,GAAG,EAAE,KAAK,CAAC,CAAC;IAC7B,CAAC;IAMD,KAAK,CAAC,GAAG,CAAC,GAAW;QACnB,OAAO,IAAI,CAAC,KAAK,CAAC,GAAG,CAAC,GAAG,CAAC,CAAC;IAC7B,CAAC;CACF;AAOD,kBAAe;IAIb,aAAa,EAAE,EAAE;IAIjB,aAAa,EAAE,IAAI;IAInB,aAAa,EAAE,IAAI;IAInB,UAAU,EAAE,IAAI,oBAAQ,CAAwB,EAAE,GAAG,EAAE,GAAG,EAAE,CAAC;IAI7D,UAAU,EAAE,IAAI,kBAAkB,CAAC,GAAG,CAAgB;IAKtD,iBAAiB,CAAC,OAAe,GAAG;QAClC,IAAI,CAAC,UAAU,GAAG,IAAI,oBAAQ,CAAwB,EAAE,GAAG,EAAE,IAAI,EAAE,CAAC,CAAC;IACvE,CAAC;IAKD,iBAAiB,CAAC,OAAe,GAAG;QAClC,IAAI,CAAC,UAAU,GAAG,IAAI,kBAAkB,CAAC,IAAI,CAAgB,CAAC;IAChE,CAAC;IAKD,aAAa,CAAC,KAAkB;QAC9B,IAAI,CAAC,UAAU,GAAG,KAAK,CAAC;IAC1B,CAAC;IAID,gBAAgB;QACd,IAAI,CAAC,aAAa,GAAG,IAAI,CAAC;IAC5B,CAAC;IAID,iBAAiB;QACf,IAAI,CAAC,aAAa,GAAG,KAAK,CAAC;IAC7B,CAAC;IAID,gBAAgB;QACd,IAAI,CAAC,aAAa,GAAG,IAAI,CAAC;IAC5B,CAAC;IAID,iBAAiB;QACf,IAAI,CAAC,aAAa,GAAG,KAAK,CAAC;IAC7B,CAAC;IAID,KAAK,EAAE,EAAiB;IAKxB,QAAQ,CAAC,KAAkB;QACzB,IAAI,CAAC,KAAK,GAAG,KAAK,CAAC;IACrB,CAAC;CACF,CAAC"}

package/dst/index.d.ts CHANGED Viewed

@@ -1,23 +1,7 @@
-import z from 'zod';
-import { type EvalOptions, type GEvalInput } from './types';
+import { type TVercelOptions, type TGevalInput, type TRubricResult, type TGevalEvaluateResult } from './types';
 export * from './config';
 export { default } from './config';
 export * from './types';
-export declare const RubricResultSchema: z.ZodObject<{
-    reason: z.ZodString;
-    pass: z.ZodBoolean;
-    score: z.ZodNumber;
-}, z.core.$strip>;
-export type RubricResult = z.infer<typeof RubricResultSchema>;
-export declare const GevalStepsResultSchema: z.ZodObject<{
-    steps: z.ZodArray<z.ZodString>;
-}, z.core.$strip>;
-export type GevalStepsResult = z.infer<typeof GevalStepsResultSchema>;
-export declare const GevalEvaluateResultSchema: z.ZodObject<{
-    reason: z.ZodString;
-    score: z.ZodNumber;
-}, z.core.$strip>;
-export type GevalEvaluateResult = z.infer<typeof GevalEvaluateResultSchema>;
-export declare const llmRubric: (output: string, rubric: string, providerName: string, modelName: string, options?: EvalOptions) => Promise<RubricResult>;
-export declare const gEval: (input: GEvalInput, criteria: string, providerName: string, modelName: string, options?: EvalOptions) => Promise<GevalEvaluateResult>;
-export declare const bEval: (input: GEvalInput, criteria: string, providerName: string, modelName: string, options?: EvalOptions) => Promise<GevalEvaluateResult>;
+export declare const llmRubric: (output: string, rubric: string, providerName: string, modelName: string, options?: TVercelOptions) => Promise<TRubricResult>;
+export declare const gEval: (input: TGevalInput, criteria: string, providerName: string, modelName: string, options?: TVercelOptions) => Promise<TGevalEvaluateResult>;
+export declare const bEval: (input: TGevalInput, criteria: string, providerName: string, modelName: string, options?: TVercelOptions) => Promise<TGevalEvaluateResult>;

package/dst/index.js CHANGED Viewed

@@ -39,43 +39,31 @@ var __importDefault = (this && this.__importDefault) || function (mod) {
     return (mod && mod.__esModule) ? mod : { "default": mod };
 };
 Object.defineProperty(exports, "__esModule", { value: true });
-exports.bEval = exports.gEval = exports.llmRubric = exports.GevalEvaluateResultSchema = exports.GevalStepsResultSchema = exports.RubricResultSchema = exports.default = void 0;
+exports.bEval = exports.gEval = exports.llmRubric = exports.default = void 0;
 const node_crypto_1 = __importDefault(require("node:crypto"));
-const ai_1 = require("ai");
 const Mustache = __importStar(require("mustache"));
-const zod_1 = __importDefault(require("zod"));
+const ai_1 = require("ai");
+const config_1 = __importDefault(require("./config"));
 const prompt_1 = require("./prompt");
 const registry_1 = require("./registry");
-const config_1 = __importDefault(require("./config"));
+const types_1 = require("./types");
 __exportStar(require("./config"), exports);
 var config_2 = require("./config");
 Object.defineProperty(exports, "default", { enumerable: true, get: function () { return __importDefault(config_2).default; } });
 __exportStar(require("./types"), exports);
-exports.RubricResultSchema = zod_1.default.object({
-    reason: zod_1.default.string().describe('Detailed explanation of the score based on the rubric'),
-    pass: zod_1.default.boolean().describe('Whether the output satisfies the minimum requirements'),
-    score: zod_1.default.number().min(0).max(1).describe('Numeric representation of quality'),
-});
-exports.GevalStepsResultSchema = zod_1.default.object({
-    steps: zod_1.default.array(zod_1.default.string()).describe('List of concise evaluation steps derived from the criteria'),
-});
-exports.GevalEvaluateResultSchema = zod_1.default.object({
-    reason: zod_1.default.string().describe('Detailed explanation of the score based on the rubric'),
-    score: zod_1.default.number().min(0).describe('Numeric representation of quality'),
-});
 const getHashId = () => node_crypto_1.default.randomBytes(16).toString('hex');
 const llmRubric = async (output, rubric, providerName, modelName, options = {}) => {
     const start = Date.now();
     try {
         const userPrompt = Mustache.render(prompt_1.LLM_RUBRIC_USER_PROMPT, { output, rubric });
         const { output: result } = await (0, ai_1.generateText)({
+            ...options,
             model: (0, registry_1.getModel)(providerName, modelName),
             system: Mustache.render(prompt_1.LLM_RUBRIC_SYSTEM_PROMPT, { hash_id: getHashId() }),
             prompt: userPrompt,
             output: ai_1.Output.object({
-                schema: exports.RubricResultSchema,
+                schema: types_1.RubricResultSchema,
             }),
-            ...options,
         });
         config_1.default.hooks.onSuccess?.({
             method: 'llmRubric',
@@ -107,12 +95,13 @@ const _gEval = async (input, criteria, providerName, modelName, maxScore, method
         if (!steps) {
             const stepsPrompt = Mustache.render(prompt_1.GEVAL_STEPS_PROMPT, { criteria });
             const { output: stepsResult } = await (0, ai_1.generateText)({
+                ...options,
+                system: undefined,
                 model,
                 prompt: stepsPrompt,
                 output: ai_1.Output.object({
-                    schema: exports.GevalStepsResultSchema,
+                    schema: types_1.GevalStepsResultSchema,
                 }),
-                ...options,
             });
             steps = stepsResult.steps;
             (0, registry_1.setSteps)(criteria, stepsResult.steps);
@@ -125,11 +114,12 @@ const _gEval = async (input, criteria, providerName, modelName, maxScore, method
             maxScore,
         });
         const { output: evalResult } = await (0, ai_1.generateText)({
+            ...options,
             model,
             system: Mustache.render(prompt_1.GEVAL_SYSTEM_PROMPT, { hash_id: getHashId() }),
             prompt: evaluationPrompt,
             output: ai_1.Output.object({
-                schema: exports.GevalEvaluateResultSchema,
+                schema: types_1.GevalEvaluateResultSchema,
             }),
             ...options,
         });

package/dst/index.js.map CHANGED Viewed

	@@ -1 +1 @@
1	- {"version":3,"file":"index.js","sourceRoot":"","sources":["../src/index.ts"],"names":[],"mappings":";;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;AAAA,8DAAiC;AACjC,~~2BAA0C;AAC1C,~~mDAAqC;AACrC,~~8CAAoB~~;~~AAEpB~~,qCAOkB;AAClB,~~yCAA0D~~;~~AAC1D~~,~~sDAA4B~~;~~AAO5B~~,2CAAyB;AACzB,mCAAmC;AAA1B,kHAAA,OAAO,OAAA;AAChB,0CAAwB;~~AAMX~~,~~QAAA,kBAAkB,GAAG,aAAC,CAAC,~~MAAM,CAAC;IAEzC,MAAM,EAAE,aAAC,CAAC,MAAM,EAAE,CAAC,QAAQ,CAAC,uDAAuD,CAAC;IAEpF,IAAI,EAAE,aAAC,CAAC,OAAO,EAAE,CAAC,QAAQ,CAAC,uDAAuD,CAAC;IAEnF,KAAK,EAAE,aAAC,CAAC,MAAM,EAAE,CAAC,GAAG,CAAC,CAAC,CAAC,CAAC,GAAG,CAAC,CAAC,CAAC,CAAC,QAAQ,CAAC,mCAAmC,CAAC;CAC9E,CAAC,CAAC;AAUU,QAAA,sBAAsB,GAAG,aAAC,CAAC,MAAM,CAAC;IAE7C,KAAK,EAAE,aAAC,CAAC,KAAK,CAAC,aAAC,CAAC,MAAM,EAAE,CAAC,CAAC,QAAQ,CAAC,4DAA4D,CAAC;CAClG,CAAC,CAAC;AAWU,QAAA,yBAAyB,GAAG,aAAC,CAAC,MAAM,CAAC;IAEhD,MAAM,EAAE,aAAC,CAAC,MAAM,EAAE,CAAC,QAAQ,CAAC,uDAAuD,CAAC;IAEpF,KAAK,EAAE,aAAC,CAAC,MAAM,EAAE,CAAC,GAAG,CAAC,CAAC,CAAC,CAAC,QAAQ,CAAC,mCAAmC,CAAC;CACvE,CAAC,CAAC;AAMH,MAAM,SAAS,GAAG,GAAG,EAAE,CAAC,qBAAM,CAAC,WAAW,CAAC,EAAE,CAAC,CAAC,QAAQ,CAAC,KAAK,CAAC,CAAC;AAYxD,MAAM,SAAS,GAAG,KAAK,EAC5B,MAAc,EACd,MAAc,EACd,YAAoB,EACpB,SAAiB,EACjB,~~UAAuB~~,EAAE,~~EACF~~,EAAE;~~IACzB~~,MAAM,KAAK,GAAG,IAAI,CAAC,GAAG,EAAE,CAAC;IACzB,IAAI,CAAC;QACH,MAAM,UAAU,GAAG,QAAQ,CAAC,MAAM,CAAC,+BAAsB,EAAE,EAAE,MAAM,EAAE,MAAM,EAAE,CAAC,CAAC;QAE/E,MAAM,EAAE,MAAM,EAAE,MAAM,EAAE,GAAG,MAAM,IAAA,iBAAY,EAAC;~~YAC5C~~,KAAK,EAAE,IAAA,mBAAQ,EAAC,YAAY,EAAE,SAAS,CAAC;YACxC,MAAM,EAAE,QAAQ,CAAC,MAAM,CAAC,iCAAwB,EAAE,EAAE,OAAO,EAAE,SAAS,EAAE,EAAE,CAAC;YAC3E,MAAM,EAAE,UAAU;YAClB,MAAM,EAAE,WAAM,CAAC,MAAM,CAAC;gBACpB,MAAM,EAAE,0BAAkB;aAC3B,CAAC;~~YACF~~,~~GAAG,OAAO;SACX,~~CAAC,CAAC;~~QAEH~~,gBAAI,CAAC,KAAK,CAAC,SAAS,EAAE,CAAC;YACrB,MAAM,EAAE,WAAW;YACnB,MAAM,EAAE,EAAE,MAAM,EAAE,MAAM,EAAE,YAAY,EAAE,SAAS,EAAE,OAAO,EAAE;YAC5D,MAAM;YACN,QAAQ,EAAE,IAAI,CAAC,GAAG,EAAE,GAAG,KAAK;SAC7B,CAAC,CAAC;QAEH,OAAO,MAAM,CAAC;IAChB,CAAC;IAAC,OAAO,KAAK,EAAE,CAAC;QAEf,gBAAI,CAAC,KAAK,CAAC,OAAO,EAAE,CAAC;YACnB,MAAM,EAAE,WAAW;YACnB,KAAK;YACL,QAAQ,EAAE,IAAI,CAAC,GAAG,EAAE,GAAG,KAAK;SAC7B,CAAC,CAAC;QAEH,MAAM,KAAK,CAAC;IACd,CAAC;AACH,CAAC,CAAA;AAvCY,QAAA,SAAS,aAuCrB;AAED,MAAM,MAAM,GAAG,KAAK,EAClB,~~KAAiB~~,~~EACjB~~,QAAgB,EAChB,YAAoB,EACpB,SAAiB,EACjB,QAAgB,EAChB,~~UAAsB~~,~~EACtB~~,~~UAAuB~~,EAAE,~~EACK~~,EAAE;~~IAChC~~,IAAI,OAAO,KAAK,KAAK,QAAQ,EAAE,CAAC;QAC9B,KAAK,GAAG,EAAE,KAAK,EAAE,EAAE,EAAE,MAAM,EAAE,KAAK,EAAE,CAAC;IACvC,CAAC;IACD,MAAM,EAAE,KAAK,EAAE,MAAM,EAAE,GAAG,KAAK,CAAC;IAEhC,MAAM,KAAK,GAAG,IAAI,CAAC,GAAG,EAAE,CAAC;IAEzB,IAAI,CAAC;QACH,MAAM,KAAK,GAAG,IAAA,mBAAQ,EAAC,YAAY,EAAE,SAAS,CAAC,CAAC;QAChD,IAAI,KAAK,GAAG,MAAM,IAAA,mBAAQ,EAAC,QAAQ,CAAC,CAAC;QAErC,IAAI,CAAC,KAAK,EAAE,CAAC;YACX,MAAM,WAAW,GAAG,QAAQ,CAAC,MAAM,CAAC,2BAAkB,EAAE,EAAE,QAAQ,EAAE,CAAC,CAAC;YAEtE,MAAM,EAAE,MAAM,EAAE,WAAW,EAAE,GAAG,MAAM,IAAA,iBAAY,EAAC;gBACjD,KAAK;gBACL,MAAM,EAAE,WAAW;gBACnB,MAAM,EAAE,WAAM,CAAC,MAAM,CAAC;oBACpB,MAAM,EAAE,8BAAsB;iBAC/B,CAAC;~~gBACF~~,~~GAAG,OAAO;aACX,~~CAAC,CAAC;YAEH,KAAK,GAAG,WAAW,CAAC,KAAK,CAAC;YAE1B,IAAA,mBAAQ,EAAC,QAAQ,EAAE,WAAW,CAAC,KAAK,CAAC,CAAC;QACxC,CAAC;QAED,MAAM,gBAAgB,GAAG,QAAQ,CAAC,MAAM,CACtC,KAAK,CAAC,CAAC,CAAC,8BAAqB,CAAC,CAAC,CAAC,oCAA2B,EAC3D;YACE,QAAQ;YACR,KAAK,EAAE,KAAK,CAAC,IAAI,CAAC,MAAM,CAAC;YACzB,KAAK,EAAE,KAAK;YACZ,MAAM,EAAE,MAAM;YACd,QAAQ;SACT,CAAC,CAAC;QAEL,MAAM,EAAE,MAAM,EAAE,UAAU,EAAE,GAAG,MAAM,IAAA,iBAAY,EAAC;YAChD,KAAK;YACL,MAAM,EAAE,QAAQ,CAAC,MAAM,CAAC,4BAAmB,EAAE,EAAE,OAAO,EAAE,SAAS,EAAE,EAAE,CAAC;YACtE,MAAM,EAAE,gBAAgB;YACxB,MAAM,EAAE,WAAM,CAAC,MAAM,CAAC;gBACpB,MAAM,EAAE,iCAAyB;aAClC,CAAC;YACF,GAAG,OAAO;SACX,CAAC,CAAC;QAEH,MAAM,MAAM,GAAG;YACb,MAAM,EAAE,UAAU,CAAC,MAAM;YACzB,KAAK,EAAE,UAAU,CAAC,KAAK,GAAG,QAAQ;SACnC,CAAC;QAEF,gBAAI,CAAC,KAAK,CAAC,SAAS,EAAE,CAAC;YACrB,MAAM,EAAE,UAAU;YAClB,MAAM,EAAE,EAAE,KAAK,EAAE,MAAM,EAAE,QAAQ,EAAE,YAAY,EAAE,SAAS,EAAE,OAAO,EAAE;YACrE,MAAM;YACN,QAAQ,EAAE,IAAI,CAAC,GAAG,EAAE,GAAG,KAAK;SAC7B,CAAC,CAAC;QAEH,OAAO,MAAM,CAAC;IAChB,CAAC;IAAC,OAAO,KAAK,EAAE,CAAC;QAEf,gBAAI,CAAC,KAAK,CAAC,OAAO,EAAE,CAAC;YACnB,MAAM,EAAE,UAAU;YAClB,KAAK;YACL,QAAQ,EAAE,IAAI,CAAC,GAAG,EAAE,GAAG,KAAK;SAC7B,CAAC,CAAC;QAEH,MAAM,KAAK,CAAC;IACd,CAAC;AACH,CAAC,CAAA;AAYM,MAAM,KAAK,GAAG,KAAK,EACxB,~~KAAiB~~,~~EACjB~~,QAAgB,EAChB,YAAoB,EACpB,SAAiB,EACjB,~~UAAuB~~,EAAE,~~EACK~~,EAAE,CAAC,MAAM,~~CACvC~~,KAAK,EACL,QAAQ,EACR,YAAY,EACZ,SAAS,EACT,gBAAI,CAAC,aAAa,EAClB,OAAO,EACP,OAAO,CACR,CAAC;AAdW,QAAA,KAAK,SAchB;AAYK,MAAM,KAAK,GAAG,KAAK,EACxB,~~KAAiB~~,~~EACjB~~,QAAgB,EAChB,YAAoB,EACpB,SAAiB,EACjB,~~UAAuB~~,EAAE,~~EACK~~,EAAE,CAAC,MAAM,~~CACvC~~,KAAK,EACL,QAAQ,EACR,YAAY,EACZ,SAAS,EACT,CAAC,EACD,OAAO,EACP,OAAO,CACR,CAAC;AAdW,QAAA,KAAK,SAchB"}
1	+ {"version":3,"file":"index.js","sourceRoot":"","sources":["../src/index.ts"],"names":[],"mappings":";;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;AAAA,8DAAiC;AACjC,mDAAqC;AACrC,2BAGY;AAEZ,sDAA4B;AAC5B,qCAOkB;AAClB,yCAIoB;AACpB,mCASiB;AAEjB,2CAAyB;AACzB,mCAAmC;AAA1B,kHAAA,OAAO,OAAA;AAChB,0CAAwB;AAExB,MAAM,SAAS,GAAG,GAAG,EAAE,CAAC,qBAAM,CAAC,WAAW,CAAC,EAAE,CAAC,CAAC,QAAQ,CAAC,KAAK,CAAC,CAAC;AAYxD,MAAM,SAAS,GAAG,KAAK,EAC5B,MAAc,EACd,MAAc,EACd,YAAoB,EACpB,SAAiB,EACjB,UAA0B,EAAE,EACJ,EAAE;IAC1B,MAAM,KAAK,GAAG,IAAI,CAAC,GAAG,EAAE,CAAC;IACzB,IAAI,CAAC;QACH,MAAM,UAAU,GAAG,QAAQ,CAAC,MAAM,CAAC,+BAAsB,EAAE,EAAE,MAAM,EAAE,MAAM,EAAE,CAAC,CAAC;QAE/E,MAAM,EAAE,MAAM,EAAE,MAAM,EAAE,GAAG,MAAM,IAAA,iBAAY,EAAC;YAC7C,GAAG,OAAO;YACT,KAAK,EAAE,IAAA,mBAAQ,EAAC,YAAY,EAAE,SAAS,CAAC;YACxC,MAAM,EAAE,QAAQ,CAAC,MAAM,CAAC,iCAAwB,EAAE,EAAE,OAAO,EAAE,SAAS,EAAE,EAAE,CAAC;YAC3E,MAAM,EAAE,UAAU;YAClB,MAAM,EAAE,WAAM,CAAC,MAAM,CAAC;gBACpB,MAAM,EAAE,0BAAkB;aAC3B,CAAC;SACJ,CAAC,CAAC;QAEF,gBAAI,CAAC,KAAK,CAAC,SAAS,EAAE,CAAC;YACrB,MAAM,EAAE,WAAW;YACnB,MAAM,EAAE,EAAE,MAAM,EAAE,MAAM,EAAE,YAAY,EAAE,SAAS,EAAE,OAAO,EAAE;YAC5D,MAAM;YACN,QAAQ,EAAE,IAAI,CAAC,GAAG,EAAE,GAAG,KAAK;SAC7B,CAAC,CAAC;QAEH,OAAO,MAAM,CAAC;IAChB,CAAC;IAAC,OAAO,KAAK,EAAE,CAAC;QAEf,gBAAI,CAAC,KAAK,CAAC,OAAO,EAAE,CAAC;YACnB,MAAM,EAAE,WAAW;YACnB,KAAK;YACL,QAAQ,EAAE,IAAI,CAAC,GAAG,EAAE,GAAG,KAAK;SAC7B,CAAC,CAAC;QAEH,MAAM,KAAK,CAAC;IACd,CAAC;AACH,CAAC,CAAA;AAvCY,QAAA,SAAS,aAuCrB;AAED,MAAM,MAAM,GAAG,KAAK,EAClB,KAAkB,EAClB,QAAgB,EAChB,YAAoB,EACpB,SAAiB,EACjB,QAAgB,EAChB,UAAwB,EACxB,UAA0B,EAAE,EACG,EAAE;IACjC,IAAI,OAAO,KAAK,KAAK,QAAQ,EAAE,CAAC;QAC9B,KAAK,GAAG,EAAE,KAAK,EAAE,EAAE,EAAE,MAAM,EAAE,KAAK,EAAE,CAAC;IACvC,CAAC;IACD,MAAM,EAAE,KAAK,EAAE,MAAM,EAAE,GAAG,KAAK,CAAC;IAEhC,MAAM,KAAK,GAAG,IAAI,CAAC,GAAG,EAAE,CAAC;IAEzB,IAAI,CAAC;QACH,MAAM,KAAK,GAAG,IAAA,mBAAQ,EAAC,YAAY,EAAE,SAAS,CAAC,CAAC;QAChD,IAAI,KAAK,GAAG,MAAM,IAAA,mBAAQ,EAAC,QAAQ,CAAC,CAAC;QAErC,IAAI,CAAC,KAAK,EAAE,CAAC;YACX,MAAM,WAAW,GAAG,QAAQ,CAAC,MAAM,CAAC,2BAAkB,EAAE,EAAE,QAAQ,EAAE,CAAC,CAAC;YAEtE,MAAM,EAAE,MAAM,EAAE,WAAW,EAAE,GAAG,MAAM,IAAA,iBAAY,EAAC;gBACjD,GAAG,OAAO;gBACV,MAAM,EAAE,SAAS;gBACjB,KAAK;gBACL,MAAM,EAAE,WAAW;gBACnB,MAAM,EAAE,WAAM,CAAC,MAAM,CAAC;oBACpB,MAAM,EAAE,8BAAsB;iBAC/B,CAAC;aACH,CAAC,CAAC;YAEH,KAAK,GAAG,WAAW,CAAC,KAAK,CAAC;YAE1B,IAAA,mBAAQ,EAAC,QAAQ,EAAE,WAAW,CAAC,KAAK,CAAC,CAAC;QACxC,CAAC;QAED,MAAM,gBAAgB,GAAG,QAAQ,CAAC,MAAM,CACtC,KAAK,CAAC,CAAC,CAAC,8BAAqB,CAAC,CAAC,CAAC,oCAA2B,EAC3D;YACE,QAAQ;YACR,KAAK,EAAE,KAAK,CAAC,IAAI,CAAC,MAAM,CAAC;YACzB,KAAK,EAAE,KAAK;YACZ,MAAM,EAAE,MAAM;YACd,QAAQ;SACT,CAAC,CAAC;QAEL,MAAM,EAAE,MAAM,EAAE,UAAU,EAAE,GAAG,MAAM,IAAA,iBAAY,EAAC;YAChD,GAAG,OAAO;YACV,KAAK;YACL,MAAM,EAAE,QAAQ,CAAC,MAAM,CAAC,4BAAmB,EAAE,EAAE,OAAO,EAAE,SAAS,EAAE,EAAE,CAAC;YACtE,MAAM,EAAE,gBAAgB;YACxB,MAAM,EAAE,WAAM,CAAC,MAAM,CAAC;gBACpB,MAAM,EAAE,iCAAyB;aAClC,CAAC;YACF,GAAG,OAAO;SACX,CAAC,CAAC;QAEH,MAAM,MAAM,GAAG;YACb,MAAM,EAAE,UAAU,CAAC,MAAM;YACzB,KAAK,EAAE,UAAU,CAAC,KAAK,GAAG,QAAQ;SACnC,CAAC;QAEF,gBAAI,CAAC,KAAK,CAAC,SAAS,EAAE,CAAC;YACrB,MAAM,EAAE,UAAU;YAClB,MAAM,EAAE,EAAE,KAAK,EAAE,MAAM,EAAE,QAAQ,EAAE,YAAY,EAAE,SAAS,EAAE,OAAO,EAAE;YACrE,MAAM;YACN,QAAQ,EAAE,IAAI,CAAC,GAAG,EAAE,GAAG,KAAK;SAC7B,CAAC,CAAC;QAEH,OAAO,MAAM,CAAC;IAChB,CAAC;IAAC,OAAO,KAAK,EAAE,CAAC;QAEf,gBAAI,CAAC,KAAK,CAAC,OAAO,EAAE,CAAC;YACnB,MAAM,EAAE,UAAU;YAClB,KAAK;YACL,QAAQ,EAAE,IAAI,CAAC,GAAG,EAAE,GAAG,KAAK;SAC7B,CAAC,CAAC;QAEH,MAAM,KAAK,CAAC;IACd,CAAC;AACH,CAAC,CAAA;AAYM,MAAM,KAAK,GAAG,KAAK,EACxB,KAAkB,EAClB,QAAgB,EAChB,YAAoB,EACpB,SAAiB,EACjB,UAA0B,EAAE,EACG,EAAE,CAAC,MAAM,CACxC,KAAK,EACL,QAAQ,EACR,YAAY,EACZ,SAAS,EACT,gBAAI,CAAC,aAAa,EAClB,OAAO,EACP,OAAO,CACR,CAAC;AAdW,QAAA,KAAK,SAchB;AAYK,MAAM,KAAK,GAAG,KAAK,EACxB,KAAkB,EAClB,QAAgB,EAChB,YAAoB,EACpB,SAAiB,EACjB,UAA0B,EAAE,EACG,EAAE,CAAC,MAAM,CACxC,KAAK,EACL,QAAQ,EACR,YAAY,EACZ,SAAS,EACT,CAAC,EACD,OAAO,EACP,OAAO,CACR,CAAC;AAdW,QAAA,KAAK,SAchB"}

package/dst/types.d.ts CHANGED Viewed

@@ -1,5 +1,6 @@
-export type EvalMethod = 'bEval' | 'gEval' | 'llmRubric';
-export type GEvalInput = string | {
+import { z } from 'zod';
+export type TJudgeMethod = 'bEval' | 'gEval' | 'llmRubric';
+export type TGevalInput = string | {
     query: string;
     answer: string;
 };
@@ -7,20 +8,32 @@ export interface IStepsCache {
     set(key: string, value: string[]): Promise<void>;
     get(key: string): Promise<string[] | undefined>;
 }
-export interface EvalOptions {
-    temperature?: number;
-    providerOptions?: Record<string, any>;
-}
-export interface EvaHooks {
+export type TVercelOptions = Record<string, any>;
+export interface IJudgeHooks {
     onSuccess?: (data: {
-        method: EvalMethod;
+        method: TJudgeMethod;
         params: any;
         result: any;
         duration: number;
     }) => void;
     onError?: (data: {
-        method: EvalMethod;
+        method: TJudgeMethod;
         error: any;
         duration: number;
     }) => void;
 }
+export declare const RubricResultSchema: z.ZodObject<{
+    reason: z.ZodString;
+    pass: z.ZodBoolean;
+    score: z.ZodNumber;
+}, z.core.$strip>;
+export type TRubricResult = z.infer<typeof RubricResultSchema>;
+export declare const GevalStepsResultSchema: z.ZodObject<{
+    steps: z.ZodArray<z.ZodString>;
+}, z.core.$strip>;
+export type TGevalStepsResult = z.infer<typeof GevalStepsResultSchema>;
+export declare const GevalEvaluateResultSchema: z.ZodObject<{
+    reason: z.ZodString;
+    score: z.ZodNumber;
+}, z.core.$strip>;
+export type TGevalEvaluateResult = z.infer<typeof GevalEvaluateResultSchema>;

package/dst/types.js CHANGED Viewed

@@ -1,3 +1,17 @@
 "use strict";
 Object.defineProperty(exports, "__esModule", { value: true });
+exports.GevalEvaluateResultSchema = exports.GevalStepsResultSchema = exports.RubricResultSchema = void 0;
+const zod_1 = require("zod");
+exports.RubricResultSchema = zod_1.z.object({
+    reason: zod_1.z.string().describe('Detailed explanation of the score based on the rubric'),
+    pass: zod_1.z.boolean().describe('Whether the output satisfies the minimum requirements'),
+    score: zod_1.z.number().min(0).max(1).describe('Numeric representation of quality'),
+});
+exports.GevalStepsResultSchema = zod_1.z.object({
+    steps: zod_1.z.array(zod_1.z.string()).describe('List of concise evaluation steps derived from the criteria'),
+});
+exports.GevalEvaluateResultSchema = zod_1.z.object({
+    reason: zod_1.z.string().describe('Detailed explanation of the score based on the rubric'),
+    score: zod_1.z.number().min(0).describe('Numeric representation of quality'),
+});
 //# sourceMappingURL=types.js.map

package/dst/types.js.map CHANGED Viewed

	@@ -1 +1 @@
1	- {"version":3,"file":"types.js","sourceRoot":"","sources":["../src/types.ts"],"names":[],"mappings":""}
1	+ {"version":3,"file":"types.js","sourceRoot":"","sources":["../src/types.ts"],"names":[],"mappings":";;;AAAA,6BAAwB;AA2DX,QAAA,kBAAkB,GAAG,OAAC,CAAC,MAAM,CAAC;IAEzC,MAAM,EAAE,OAAC,CAAC,MAAM,EAAE,CAAC,QAAQ,CAAC,uDAAuD,CAAC;IAEpF,IAAI,EAAE,OAAC,CAAC,OAAO,EAAE,CAAC,QAAQ,CAAC,uDAAuD,CAAC;IAEnF,KAAK,EAAE,OAAC,CAAC,MAAM,EAAE,CAAC,GAAG,CAAC,CAAC,CAAC,CAAC,GAAG,CAAC,CAAC,CAAC,CAAC,QAAQ,CAAC,mCAAmC,CAAC;CAC9E,CAAC,CAAC;AAUU,QAAA,sBAAsB,GAAG,OAAC,CAAC,MAAM,CAAC;IAE7C,KAAK,EAAE,OAAC,CAAC,KAAK,CAAC,OAAC,CAAC,MAAM,EAAE,CAAC,CAAC,QAAQ,CAAC,4DAA4D,CAAC;CAClG,CAAC,CAAC;AAUU,QAAA,yBAAyB,GAAG,OAAC,CAAC,MAAM,CAAC;IAEhD,MAAM,EAAE,OAAC,CAAC,MAAM,EAAE,CAAC,QAAQ,CAAC,uDAAuD,CAAC;IAEpF,KAAK,EAAE,OAAC,CAAC,MAAM,EAAE,CAAC,GAAG,CAAC,CAAC,CAAC,CAAC,QAAQ,CAAC,mCAAmC,CAAC;CACvE,CAAC,CAAC"}

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@eva-llm/eva-judge",
-  "version": "1.0.4",
+  "version": "1.0.5",
   "description": "LLM-as-a-Judge abstraction layer using ai-sdk and plugins",
   "main": "dst/index.js",
   "types": "dst/index.d.ts",
@@ -48,6 +48,7 @@
   "scripts": {
     "build": "tsc",
     "example": "ts-node scripts/example.ts",
-    "test": "jest"
+    "test": "jest",
+    "test:coverage": "jest --coverage"
   }
 }