npm - @pauly4010/evalai-sdk - Versions diffs - 1.4.0 → 1.4.1 - Mend

@pauly4010/evalai-sdk 1.4.0 → 1.4.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (9) hide show

package/CHANGELOG.md +12 -0
package/README.md +8 -3
package/dist/cli/check.d.ts +2 -2
package/dist/cli/check.js +14 -3
package/package.json +6 -3
package/.env.example +0 -0
package/ADDITIONAL_ISSUES_FOUND.md +0 -174
package/evalai-sdk-1.2.0.tgz +0 -0
package/postcss.config.mjs +0 -2

package/CHANGELOG.md CHANGED Viewed

@@ -5,6 +5,18 @@ All notable changes to the @pauly4010/evalai-sdk package will be documented in t
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
+## [1.4.1] - 2026-02-18
+### ✨ Added
+- **evalai check `--baseline production`** — Compare against latest run tagged with `environment=prod`
+- **Baseline missing handling** — Clear failure when baseline not found and comparison requested
+### 🔧 Changed
+- **Package hardening** — `files`, `module`, `sideEffects: false` for leaner npm publish
+- **CLI** — Passes `baseline` param to quality API for deterministic CI gates
 ## [1.3.0] - 2025-10-21
 ### ✨ Added

package/README.md CHANGED Viewed

@@ -501,7 +501,7 @@ console.log("Plan:", org.plan);
 console.log("Status:", org.status);
 ```
-## evalai CLI (v1.4.0)
+## evalai CLI (v1.4.1)
 The SDK includes a CLI for CI/CD evaluation gates. Install globally or use via `npx`:
@@ -527,14 +527,19 @@ Gate deployments on quality scores, regression, and compliance:
 | `--minN <n>` | Fail if total test cases &lt; n |
 | `--allowWeakEvidence` | Permit weak evidence level |
 | `--policy <name>` | Enforce HIPAA, SOC2, GDPR, PCI_DSS, FINRA_4511 |
-| `--baseline <mode>` | `published` or `previous` |
+| `--baseline <mode>` | `published`, `previous`, or `production` |
 | `--baseUrl <url>` | API base URL |
 **Exit codes:** 0=pass, 1=score below, 2=regression, 3=policy violation, 4=API error, 5=bad args, 6=low N, 7=weak evidence
 ## Changelog
-### v1.4.0 (Latest)
+### v1.4.1 (Latest)
+- **evalai check `--baseline production`** — Compare against latest prod-tagged run
+- **Package hardening** — Leaner npm publish with `files`, `sideEffects: false`
+### v1.4.0
 - **evalai CLI** — Command-line tool for CI/CD evaluation gates
   - `evalai check` — Gate deployments on quality scores, regression, and compliance

package/dist/cli/check.d.ts CHANGED Viewed

@@ -14,7 +14,7 @@
  *   --minN <n>           Fail if total test cases < n (low sample size)
  *   --allowWeakEvidence  If false (default), fail when evidenceLevel is 'weak'
  *   --policy <name>      Enforce a compliance policy (e.g. HIPAA, SOC2, GDPR)
- *   --baseline <mode>    Baseline comparison mode: "published" (default) or "previous"
+ *   --baseline <mode>    Baseline comparison mode: "published" (default), "previous", or "production"
  *   --evaluationId <id>  Required. The evaluation to gate on.
  *   --baseUrl <url>      API base URL (default: EVALAI_BASE_URL or http://localhost:3000)
  *   --apiKey <key>       API key (default: EVALAI_API_KEY env var)
@@ -52,7 +52,7 @@ export interface CheckArgs {
     allowWeakEvidence: boolean;
     evaluationId: string;
     policy?: string;
-    baseline: 'published' | 'previous';
+    baseline: 'published' | 'previous' | 'production';
 }
 export declare function parseArgs(argv: string[]): CheckArgs;
 export declare function runCheck(args: CheckArgs): Promise<number>;

package/dist/cli/check.js CHANGED Viewed

@@ -15,7 +15,7 @@
  *   --minN <n>           Fail if total test cases < n (low sample size)
  *   --allowWeakEvidence  If false (default), fail when evidenceLevel is 'weak'
  *   --policy <name>      Enforce a compliance policy (e.g. HIPAA, SOC2, GDPR)
- *   --baseline <mode>    Baseline comparison mode: "published" (default) or "previous"
+ *   --baseline <mode>    Baseline comparison mode: "published" (default), "previous", or "production"
  *   --evaluationId <id>  Required. The evaluation to gate on.
  *   --baseUrl <url>      API base URL (default: EVALAI_BASE_URL or http://localhost:3000)
  *   --apiKey <key>       API key (default: EVALAI_API_KEY env var)
@@ -73,7 +73,11 @@ function parseArgs(argv) {
     const allowWeakEvidence = args.allowWeakEvidence === 'true' || args.allowWeakEvidence === '1';
     const evaluationId = args.evaluationId || '';
     const policy = args.policy || undefined;
-    const baseline = (args.baseline === 'previous' ? 'previous' : 'published');
+    const baseline = (args.baseline === 'previous'
+        ? 'previous'
+        : args.baseline === 'production'
+            ? 'production'
+            : 'published');
     if (!apiKey) {
         console.error('Error: --apiKey or EVALAI_API_KEY is required');
         process.exit(exports.EXIT.BAD_ARGS);
@@ -95,7 +99,7 @@ function parseArgs(argv) {
 async function runCheck(args) {
     const headers = { Authorization: `Bearer ${args.apiKey}` };
     // ── 1. Fetch latest quality score ──
-    const scoreUrl = `${args.baseUrl}/api/quality?evaluationId=${args.evaluationId}&action=latest`;
+    const scoreUrl = `${args.baseUrl}/api/quality?evaluationId=${args.evaluationId}&action=latest&baseline=${args.baseline}`;
     let scoreRes;
     try {
         scoreRes = await fetch(scoreUrl, { headers });
@@ -115,7 +119,14 @@ async function runCheck(args) {
     const evidenceLevel = data?.evidenceLevel ?? null;
     const baselineScore = data?.baselineScore ?? null;
     const regressionDelta = data?.regressionDelta ?? null;
+    const baselineMissing = data?.baselineMissing === true;
     const breakdown = data?.breakdown ?? {};
+    // ── Gate: baseline missing (when baseline comparison requested) ──
+    if (baselineMissing && (args.baseline !== 'published' || args.maxDrop !== undefined)) {
+        console.error(`\n✗ FAILED: baseline (${args.baseline}) not found. ` +
+            `Ensure a baseline run exists (e.g. published run, previous run, or prod-tagged run).`);
+        return exports.EXIT.API_ERROR;
+    }
     // ── Gate: minN (low sample size) ──
     if (args.minN !== undefined && total !== null && total < args.minN) {
         console.error(`\n✗ FAILED: total test cases (${total}) < minN (${args.minN})`);

package/package.json CHANGED Viewed

@@ -1,9 +1,12 @@
 {
   "name": "@pauly4010/evalai-sdk",
-  "version": "1.4.0",
+  "version": "1.4.1",
   "description": "AI Evaluation Platform SDK - Complete API Coverage with Performance Optimizations",
-  "main": "./dist/index.js",
-  "types": "./dist/index.d.ts",
+  "main": "dist/index.js",
+  "module": "dist/index.js",
+  "types": "dist/index.d.ts",
+  "sideEffects": false,
+  "files": ["dist", "README.md", "CHANGELOG.md"],
   "bin": {
     "evalai": "./dist/cli/index.js"
   },

package/.env.example DELETED Viewed

Binary file

package/ADDITIONAL_ISSUES_FOUND.md DELETED Viewed

@@ -1,174 +0,0 @@
-# Additional Issues Found in Second Review
-## 🔴 Issues Discovered
-### 1. **process.env Usage in Browser Context** ⚠️ HIGH PRIORITY
-**Files**: `client.ts` (lines 105, 116, 178)
-**Problem**: The SDK uses `process.env` directly, which is undefined in browsers:
-```typescript
-// Line 105
-this.apiKey = config.apiKey || process.env.EVALAI_API_KEY || ...
-// Line 116
-const orgIdFromEnv = process.env.EVALAI_ORGANIZATION_ID || ...
-// Line 178 (in static init method)
-baseUrl: process.env.EVALAI_BASE_URL,
-```
-**Impact**:
-- Will cause "Cannot read property of undefined" errors in browsers
-- Breaks zero-config initialization in browsers
-- `AIEvalClient.init()` won't work in browsers
-**Severity**: HIGH - Core functionality breaks in browsers
----
-### 2. **Type Name Collision** 🟡 MEDIUM PRIORITY
-**Files**: `types.ts` (line 209) and `testing.ts` (line 27)
-**Problem**: Two different `TestCase` interfaces with same name but different purposes:
-**types.ts** (Database Model):
-```typescript
-export interface TestCase {
-  id: number;
-  evaluationId: number;
-  input: string;
-  expectedOutput: string | null;
-  metadata: Record<string, any> | null;
-  createdAt: string;
-}
-```
-**testing.ts** (Test Suite Model):
-```typescript
-export interface TestCase {
-  id?: string;
-  input: string;
-  expected?: string;
-  metadata?: Record<string, any>;
-  assertions?: ((output: string) => AssertionResult)[];
-}
-```
-**Impact**:
-- Confusing for developers
-- IDE autocomplete shows wrong interface
-- Only `types.ts` version is exported from index.ts (line 117)
-- Could cause type errors if both are imported
-**Severity**: MEDIUM - Causes confusion but only types.ts version is publicly exported
----
-### 3. **Dynamic Import Pattern in export.ts** 🟢 LOW PRIORITY
-**Files**: `export.ts` (lines 296, 316)
-**Pattern**:
-```typescript
-const fs = await import('fs');
-fs.writeFileSync(filePath, ...);
-```
-**Issue**:
-- Dynamic import returns a module namespace object
-- Works but is unusual pattern (normally use static imports in Node.js-only files)
-- Could fail in some bundler configurations
-**Impact**:
-- Works but non-standard
-- Tree-shaking friendly but unnecessary for Node.js-only code
-- Some bundlers might have issues
-**Severity**: LOW - Works but not best practice
----
-### 4. **TypeScript Module Configuration** 🟢 INFO
-**File**: `tsconfig.json`
-**Current**:
-```json
-{
-  "module": "commonjs"
-}
-```
-**Observation**:
-- Using CommonJS but package.json has ES module exports
-- CLI uses `.js` extensions in imports (which is correct for ES modules)
-- Mismatch between TypeScript config and runtime expectations
-**Impact**:
-- May cause issues with module resolution
-- CLI imports might not work as expected
-- Bundlers might be confused
-**Severity**: LOW - Currently working but could cause subtle issues
----
-## 📊 Summary
-| Issue | Severity | Impact | Affected |
-|-------|----------|--------|----------|
-| process.env in browser | 🔴 HIGH | Breaks in browsers | Core client |
-| TestCase collision | 🟡 MEDIUM | Developer confusion | Types |
-| Dynamic imports | 🟢 LOW | Unusual pattern | export.ts |
-| Module config | 🟢 INFO | Potential confusion | Build system |
----
-## ✅ Recommended Fixes
-### Fix 1: Safe process.env Access
-Add helper function:
-```typescript
-// utils.ts or client.ts
-function getEnvVar(name: string): string | undefined {
-  if (typeof process !== 'undefined' && process.env) {
-    return process.env[name];
-  }
-  return undefined;
-}
-```
-Then use:
-```typescript
-this.apiKey = config.apiKey || getEnvVar('EVALAI_API_KEY') || ...
-```
-### Fix 2: Rename Test Suite TestCase
-Rename in `testing.ts`:
-```typescript
-export interface TestSuiteCase {  // Was: TestCase
-  id?: string;
-  input: string;
-  expected?: string;
-  // ...
-}
-```
-### Fix 3: Static Imports in export.ts
-Since already checked for Node.js environment:
-```typescript
-import * as fs from 'fs';  // Instead of: const fs = await import('fs')
-```
-### Fix 4: Consider ES Modules
-Either:
-- Change tsconfig to `"module": "es2020"`
-- Or change package.json exports to use `.cjs` extensions

package/evalai-sdk-1.2.0.tgz DELETED Viewed

Binary file

package/postcss.config.mjs DELETED Viewed

	@@ -1,2 +0,0 @@
1	- // Empty PostCSS config to prevent inheriting root config
2	- export default { plugins: {} };