npm - judgeval - Versions diffs - 0.2.0 → 0.2.2 - Mend

judgeval 0.2.0 → 0.2.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (62) hide show

package/README.md +95 -68
package/dist/cjs/common/tracer.js +235 -143
package/dist/cjs/common/tracer.js.map +1 -1
package/dist/cjs/constants.js +8 -5
package/dist/cjs/constants.js.map +1 -1
package/dist/cjs/data/datasets/eval-dataset-client.js +349 -0
package/dist/cjs/data/datasets/eval-dataset-client.js.map +1 -0
package/dist/cjs/data/datasets/eval-dataset.js +405 -0
package/dist/cjs/data/datasets/eval-dataset.js.map +1 -0
package/dist/cjs/data/example.js +22 -1
package/dist/cjs/data/example.js.map +1 -1
package/dist/cjs/e2etests/eval-operations.test.js +282 -0
package/dist/cjs/e2etests/eval-operations.test.js.map +1 -0
package/dist/cjs/e2etests/judgee-traces.test.js +278 -0
package/dist/cjs/e2etests/judgee-traces.test.js.map +1 -0
package/dist/cjs/index.js +1 -3
package/dist/cjs/index.js.map +1 -1
package/dist/cjs/judgment-client.js +326 -645
package/dist/cjs/judgment-client.js.map +1 -1
package/dist/cjs/scorers/api-scorer.js +56 -48
package/dist/cjs/scorers/api-scorer.js.map +1 -1
package/dist/cjs/scorers/base-scorer.js +66 -11
package/dist/cjs/scorers/base-scorer.js.map +1 -1
package/dist/esm/common/tracer.js +236 -144
package/dist/esm/common/tracer.js.map +1 -1
package/dist/esm/constants.js +7 -4
package/dist/esm/constants.js.map +1 -1
package/dist/esm/data/datasets/eval-dataset-client.js +342 -0
package/dist/esm/data/datasets/eval-dataset-client.js.map +1 -0
package/dist/esm/data/datasets/eval-dataset.js +375 -0
package/dist/esm/data/datasets/eval-dataset.js.map +1 -0
package/dist/esm/data/example.js +22 -1
package/dist/esm/data/example.js.map +1 -1
package/dist/esm/e2etests/eval-operations.test.js +254 -0
package/dist/esm/e2etests/eval-operations.test.js.map +1 -0
package/dist/esm/e2etests/judgee-traces.test.js +253 -0
package/dist/esm/e2etests/judgee-traces.test.js.map +1 -0
package/dist/esm/index.js +0 -1
package/dist/esm/index.js.map +1 -1
package/dist/esm/judgment-client.js +328 -647
package/dist/esm/judgment-client.js.map +1 -1
package/dist/esm/scorers/api-scorer.js +56 -48
package/dist/esm/scorers/api-scorer.js.map +1 -1
package/dist/esm/scorers/base-scorer.js +66 -11
package/dist/esm/scorers/base-scorer.js.map +1 -1
package/dist/types/common/tracer.d.ts +27 -14
package/dist/types/constants.d.ts +4 -4
package/dist/types/data/datasets/eval-dataset-client.d.ts +39 -0
package/dist/types/data/datasets/eval-dataset.d.ts +45 -0
package/dist/types/data/example.d.ts +24 -12
package/dist/types/e2etests/eval-operations.test.d.ts +5 -0
package/dist/types/e2etests/judgee-traces.test.d.ts +5 -0
package/dist/types/index.d.ts +0 -1
package/dist/types/judgment-client.d.ts +3 -47
package/dist/types/scorers/api-scorer.d.ts +15 -15
package/dist/types/scorers/base-scorer.d.ts +53 -10
package/package.json +2 -1
package/dist/cjs/scorers/exact-match-scorer.js +0 -84
package/dist/cjs/scorers/exact-match-scorer.js.map +0 -1
package/dist/esm/scorers/exact-match-scorer.js +0 -80
package/dist/esm/scorers/exact-match-scorer.js.map +0 -1
package/dist/types/scorers/exact-match-scorer.d.ts +0 -10

package/README.md CHANGED Viewed

@@ -152,21 +152,12 @@ You can retrieve past evaluation results using several methods:
 // Initialize the JudgmentClient
 const client = JudgmentClient.getInstance();
-// Method 1: Using pullEval
+// Using pullEval
 const results = await client.pullEval('my-project', 'my-eval-run');
-// Method 2: Using getEvalRun (alias for pullEval)
-const results = await client.getEvalRun('my-project', 'my-eval-run');
-// List all evaluation runs for a project
-const evalRuns = await client.listEvalRuns('my-project', 100, 0); // limit=100, offset=0
-// Get statistics for an evaluation run
-const stats = await client.getEvalRunStats('my-project', 'my-eval-run');
-// Export evaluation results to JSON or CSV
-const jsonExport = await client.exportEvalResults('my-project', 'my-eval-run', 'json');
-const csvExport = await client.exportEvalResults('my-project', 'my-eval-run', 'csv');
+// Export evaluation results to different formats
+const jsonData = await client.exportEvalResults('my-project', 'my-eval-run', 'json');
+const csvData = await client.exportEvalResults('my-project', 'my-eval-run', 'csv');
 ```
 The returned results include the evaluation run ID and a list of scoring results:
@@ -188,34 +179,49 @@ For a complete example of retrieving evaluation results, see `src/examples/resul
 ## Custom Scorers
-You can create custom scorers by extending the `JudgevalScorer` class. Here's an example of a custom scorer that checks for exact string matches:
+You can create custom scorers by extending the `JudgevalScorer` class. This implementation aligns with the Python SDK approach, making it easy to port scorers between languages.
+### Creating a Custom Scorer
+To create a custom scorer:
+1. **Extend the JudgevalScorer class**:
 ```typescript
-import { Example } from './data/example';
-import { JudgevalScorer } from './scorers/base-scorer';
-import { ScorerData } from './data/result';
+import { Example } from 'judgeval/data/example';
+import { JudgevalScorer } from 'judgeval/scorers/base-scorer';
+import { ScorerData } from 'judgeval/data/result';
-/**
- * ExactMatchScorer - A custom scorer that checks if the actual output exactly matches the expected output
- */
 class ExactMatchScorer extends JudgevalScorer {
-  constructor(threshold: number, additionalMetadata?: Record<string, any>, verbose: boolean = false) {
-    super('exact_match', threshold, additionalMetadata, verbose);
+  constructor(
+    threshold: number = 1.0,
+    additional_metadata?: Record<string, any>,
+    include_reason: boolean = true,
+    async_mode: boolean = true,
+    strict_mode: boolean = false,
+    verbose_mode: boolean = true
+  ) {
+    super('exact_match', threshold, additional_metadata, include_reason, async_mode, strict_mode, verbose_mode);
   }
   async scoreExample(example: Example): Promise<ScorerData> {
     try {
       // Check if the example has expected output
       if (!example.expectedOutput) {
+        this.error = "Missing expected output";
+        this.score = 0;
+        this.success = false;
+        this.reason = "Expected output is required for exact match scoring";
         return {
           name: this.type,
           threshold: this.threshold,
           success: false,
           score: 0,
-          reason: "Expected output is required for exact match scoring",
-          strict_mode: null,
+          reason: this.reason,
+          strict_mode: this.strict_mode,
           evaluation_model: "exact-match",
-          error: "Missing expected output",
+          error: this.error,
           evaluation_cost: null,
           verbose_logs: null,
           additional_metadata: this.additional_metadata || {}
@@ -231,35 +237,48 @@ class ExactMatchScorer extends JudgevalScorer {
       this.score = isMatch ? 1 : 0;
       // Generate a reason for the score
-      const reason = isMatch
+      this.reason = isMatch
         ? "The actual output exactly matches the expected output."
         : `The actual output "${actualOutput}" does not match the expected output "${expectedOutput}".`;
+      // Set success based on the score and threshold
+      this.success = this._successCheck();
+      // Generate verbose logs if verbose mode is enabled
+      if (this.verbose_mode) {
+        this.verbose_logs = `Comparing: "${actualOutput}" with "${expectedOutput}"`;
+      }
       // Return the scorer data
       return {
         name: this.type,
         threshold: this.threshold,
-        success: this.successCheck(),
+        success: this.success,
         score: this.score,
-        reason: reason,
-        strict_mode: null,
+        reason: this.include_reason ? this.reason : null,
+        strict_mode: this.strict_mode,
         evaluation_model: "exact-match",
         error: null,
         evaluation_cost: null,
-        verbose_logs: this.verbose ? `Comparing: "${actualOutput}" with "${expectedOutput}"` : null,
+        verbose_logs: this.verbose_mode ? this.verbose_logs : null,
         additional_metadata: this.additional_metadata || {}
       };
     } catch (error) {
       // Handle any errors during scoring
       const errorMessage = error instanceof Error ? error.message : String(error);
+      this.error = errorMessage;
+      this.score = 0;
+      this.success = false;
+      this.reason = `Error during scoring: ${errorMessage}`;
       return {
         name: this.type,
         threshold: this.threshold,
         success: false,
         score: 0,
-        reason: `Error during scoring: ${errorMessage}`,
-        strict_mode: null,
+        reason: this.reason,
+        strict_mode: this.strict_mode,
         evaluation_model: "exact-match",
         error: errorMessage,
         evaluation_cost: null,
@@ -268,9 +287,30 @@ class ExactMatchScorer extends JudgevalScorer {
       };
     }
   }
+  /**
+   * Get the name of the scorer
+   * This is equivalent to Python's __name__ property
+   */
+  get name(): string {
+    return "Exact Match Scorer";
+  }
 }
 ```
+2. **Implement required methods**:
+- `scoreExample(example: Example)`: The core method that evaluates an example and returns a score
+- `name`: A getter property that returns the human-readable name of your scorer
+3. **Set internal state**:
+Your implementation should set these internal properties:
+- `this.score`: The numerical score (typically between 0 and 1)
+- `this.success`: Whether the example passed the evaluation
+- `this.reason`: A human-readable explanation of the score
+- `this.error`: Any error that occurred during scoring
 ### Using Custom Scorers
 You can use custom scorers with the JudgmentClient just like any other scorer:
@@ -282,53 +322,40 @@ const examples = [
     .input("What is the capital of France?")
     .actualOutput("Paris is the capital of France.")
     .expectedOutput("Paris is the capital of France.")
-    .exampleIndex(0)
     .build(),
   // Add more examples...
 ];
 // Create a custom scorer
-const exactMatchScorer = new ExactMatchScorer(1.0, { description: "Checks for exact string match" }, true);
-// Initialize the JudgmentClient
-const client = JudgmentClient.getInstance();
-// Run evaluation with the custom scorer
-const results = await client.runEvaluation(
-  examples,
-  [exactMatchScorer],
-  "gpt-3.5-turbo", // Specify a valid model name
-  "my-project",
-  {
-    evalRunName: "custom-scorer-test",
-    logResults: true
-  }
+const exactMatchScorer = new ExactMatchScorer(
+  1.0,
+  { description: "Checks for exact string match" },
+  true,  // include_reason
+  true,  // async_mode
+  false, // strict_mode
+  true   // verbose_mode
 );
-```
-### Viewing Results
-After running an evaluation with custom scorers, you can view the results in the Judgment platform:
-```
-https://app.judgmentlabs.ai/app/experiment?project_name=my-project&eval_run_name=custom-scorer-test
+// Run evaluation with the custom scorer
+const results = await client.runEvaluation({
+  examples: examples,
+  scorers: [exactMatchScorer],
+  projectName: "my-project",
+  evalRunName: "custom-scorer-test",
+  useJudgment: false // Run locally, don't use Judgment API
+});
 ```
-You can also access the results programmatically:
+### Custom Scorer Parameters
-```typescript
-// Print the results
-console.log(results);
-// Get success rate
-const successCount = results.filter(r => {
-  return r.scorersData?.every(s => s.success) ?? false;
-}).length;
-console.log(`Success rate: ${successCount}/${examples.length} (${(successCount/examples.length*100).toFixed(2)}%)`);
-```
+- `threshold`: The minimum score required for success (0-1 for most scorers)
+- `additional_metadata`: Extra information to include with results
+- `include_reason`: Whether to include a reason for the score
+- `async_mode`: Whether to run the scorer asynchronously
+- `strict_mode`: If true, sets threshold to 1.0 for strict evaluation
+- `verbose_mode`: Whether to include detailed logs
-For a complete example of using custom scorers, see `src/examples/custom-scorer.ts`.
+For a complete example of creating and using custom scorers, see `src/examples/custom-scorer.ts`.
 ## Examples