@pauly4010/evalai-sdk 1.4.1 → 1.5.5
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +85 -0
- package/README.md +205 -543
- package/dist/assertions.d.ts +2 -2
- package/dist/assertions.js +104 -71
- package/dist/batch.js +12 -17
- package/dist/cache.js +7 -11
- package/dist/cli/api.d.ts +108 -0
- package/dist/cli/api.js +130 -0
- package/dist/cli/check.d.ts +28 -13
- package/dist/cli/check.js +249 -142
- package/dist/cli/ci-context.d.ts +6 -0
- package/dist/cli/ci-context.js +110 -0
- package/dist/cli/config.d.ts +30 -0
- package/dist/cli/config.js +207 -0
- package/dist/cli/constants.d.ts +15 -0
- package/dist/cli/constants.js +18 -0
- package/dist/cli/doctor.d.ts +11 -0
- package/dist/cli/doctor.js +82 -0
- package/dist/cli/formatters/github.d.ts +8 -0
- package/dist/cli/formatters/github.js +130 -0
- package/dist/cli/formatters/human.d.ts +6 -0
- package/dist/cli/formatters/human.js +107 -0
- package/dist/cli/formatters/json.d.ts +6 -0
- package/dist/cli/formatters/json.js +10 -0
- package/dist/cli/formatters/pr-comment.d.ts +12 -0
- package/dist/cli/formatters/pr-comment.js +101 -0
- package/dist/cli/formatters/types.d.ts +100 -0
- package/dist/cli/formatters/types.js +5 -0
- package/dist/cli/gate.d.ts +21 -0
- package/dist/cli/gate.js +175 -0
- package/dist/cli/index.d.ts +1 -0
- package/dist/cli/index.js +67 -23
- package/dist/cli/init.d.ts +7 -0
- package/dist/cli/init.js +69 -0
- package/dist/cli/policy-packs.d.ts +23 -0
- package/dist/cli/policy-packs.js +83 -0
- package/dist/cli/profiles.d.ts +28 -0
- package/dist/cli/profiles.js +30 -0
- package/dist/cli/reason-codes.d.ts +17 -0
- package/dist/cli/reason-codes.js +19 -0
- package/dist/cli/render/snippet.d.ts +5 -0
- package/dist/cli/render/snippet.js +15 -0
- package/dist/cli/render/sort.d.ts +10 -0
- package/dist/cli/render/sort.js +24 -0
- package/dist/cli/report/build-check-report.d.ts +19 -0
- package/dist/cli/report/build-check-report.js +124 -0
- package/dist/cli/share.d.ts +17 -0
- package/dist/cli/share.js +83 -0
- package/dist/client.d.ts +2 -2
- package/dist/client.js +144 -132
- package/dist/context.d.ts +1 -1
- package/dist/context.js +4 -6
- package/dist/errors.d.ts +2 -0
- package/dist/errors.js +116 -107
- package/dist/export.d.ts +6 -6
- package/dist/export.js +39 -33
- package/dist/index.d.ts +25 -24
- package/dist/index.js +62 -56
- package/dist/integrations/anthropic.d.ts +1 -1
- package/dist/integrations/anthropic.js +23 -19
- package/dist/integrations/openai-eval.d.ts +57 -0
- package/dist/integrations/openai-eval.js +230 -0
- package/dist/integrations/openai.d.ts +1 -1
- package/dist/integrations/openai.js +23 -19
- package/dist/local.d.ts +2 -2
- package/dist/local.js +25 -25
- package/dist/logger.d.ts +1 -1
- package/dist/logger.js +24 -28
- package/dist/matchers/index.d.ts +1 -0
- package/dist/matchers/index.js +6 -0
- package/dist/matchers/to-pass-gate.d.ts +29 -0
- package/dist/matchers/to-pass-gate.js +35 -0
- package/dist/pagination.d.ts +1 -1
- package/dist/pagination.js +6 -6
- package/dist/snapshot.js +24 -24
- package/dist/streaming.js +11 -11
- package/dist/testing.d.ts +6 -2
- package/dist/testing.js +30 -12
- package/dist/types.d.ts +22 -22
- package/dist/types.js +13 -13
- package/dist/utils/input-hash.d.ts +8 -0
- package/dist/utils/input-hash.js +38 -0
- package/dist/version.d.ts +7 -0
- package/dist/version.js +10 -0
- package/dist/workflows.d.ts +7 -7
- package/dist/workflows.js +44 -44
- package/package.json +102 -90
- package/dist/__tests__/assertions.test.d.ts +0 -1
- package/dist/__tests__/assertions.test.js +0 -288
- package/dist/__tests__/client.test.d.ts +0 -1
- package/dist/__tests__/client.test.js +0 -185
- package/dist/__tests__/testing.test.d.ts +0 -1
- package/dist/__tests__/testing.test.js +0 -230
- package/dist/__tests__/workflows.test.d.ts +0 -1
- package/dist/__tests__/workflows.test.js +0 -222
package/CHANGELOG.md
CHANGED
|
@@ -5,6 +5,91 @@ All notable changes to the @pauly4010/evalai-sdk package will be documented in t
|
|
|
5
5
|
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
|
|
6
6
|
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
|
7
7
|
|
|
8
|
+
## [1.5.5] - 2026-02-19
|
|
9
|
+
|
|
10
|
+
### ✨ Added
|
|
11
|
+
|
|
12
|
+
#### Gate semantics (PASS / WARN / FAIL)
|
|
13
|
+
|
|
14
|
+
- **`--warnDrop <n>`** — Introduce a WARN band when score drops > `warnDrop` but < `maxDrop`
|
|
15
|
+
- **Gate verdicts:** PASS, WARN, FAIL
|
|
16
|
+
- **Profiles:** `strict` (warnDrop: 0), `balanced` (warnDrop: 1), `fast` (warnDrop: 2)
|
|
17
|
+
- **`--fail-on-flake`** — Fail the gate if any case is flagged as flaky (partial pass rate across determinism runs)
|
|
18
|
+
|
|
19
|
+
#### Determinism & flake intelligence
|
|
20
|
+
|
|
21
|
+
- **Adaptive variance thresholds** — Determinism audit passes if `absVariance ≤ 5` OR `relVariance ≤ 2%`
|
|
22
|
+
- **Per-case variance reporting** — Reports per-case pass rate across N runs and flags `[FLAKY]` cases
|
|
23
|
+
- **Golden dataset regression** — Added `evals/golden` with `pnpm eval:golden` to prevent semantic regressions
|
|
24
|
+
- **Golden drift output** — Writes `evals/golden/golden-results.json` with `currentScore`, `baselineScore`, `delta`, `passed`, and timestamps
|
|
25
|
+
|
|
26
|
+
#### CI audits & workflows
|
|
27
|
+
|
|
28
|
+
- **Nightly audits** — Added `audit-nightly.yml` for determinism + performance budgets (skips without `OPENAI_API_KEY`)
|
|
29
|
+
- **SDK compatibility matrix** — Added `sdk-compat.yml` to validate older SDK versions against current API
|
|
30
|
+
- **New audits:** `audit:retention`, `audit:migrations`, `audit:performance`, `audit:determinism`
|
|
31
|
+
|
|
32
|
+
#### Platform safety & governance (docs + proofs)
|
|
33
|
+
|
|
34
|
+
- **Audit trail docs** — Added `docs/audit-trail.md`
|
|
35
|
+
- **Observability docs** — Added `docs/observability.md` (log schema + requestId)
|
|
36
|
+
- **Retention docs** — Added `docs/data-retention.md`
|
|
37
|
+
- **Migration safety docs** — Added `docs/migration-safety.md`
|
|
38
|
+
- **Adoption benchmark** — Added `docs/adoption-benchmark.md`
|
|
39
|
+
- **Examples** — Added real-world example suites (RAG regression + agent tool-use)
|
|
40
|
+
|
|
41
|
+
### 🔧 Changed
|
|
42
|
+
|
|
43
|
+
- **Exit codes updated** — 0=pass, **8=warn**, failures remain as documented for score/regression/policy/API/config
|
|
44
|
+
- **GitHub + human formatters** — Render WARN state, top contributors, and flake indicators where available
|
|
45
|
+
- **Rate limiting** — Adds `Retry-After` header on 429 responses
|
|
46
|
+
- **RequestId propagation** — `EvalAIError` surfaces `requestId` from response body or `x-request-id` header
|
|
47
|
+
|
|
48
|
+
### 🧪 Testing
|
|
49
|
+
|
|
50
|
+
- Added tests for:
|
|
51
|
+
- access boundaries (no tenant info leak)
|
|
52
|
+
- rate-limit abuse patterns + `Retry-After`
|
|
53
|
+
- executor failure modes (timeouts / upstream 429 / malformed responses)
|
|
54
|
+
- error catalog stability + graceful handling of unknown codes
|
|
55
|
+
- exports contract (retention visibility, 410 semantics)
|
|
56
|
+
|
|
57
|
+
--
|
|
58
|
+
|
|
59
|
+
## [1.5.0] - 2026-02-18
|
|
60
|
+
|
|
61
|
+
### ✨ Added
|
|
62
|
+
|
|
63
|
+
#### evalai CLI — CI DevX
|
|
64
|
+
|
|
65
|
+
- **`--format github`** — GitHub Actions annotations + step summary (`$GITHUB_STEP_SUMMARY`)
|
|
66
|
+
- **`--format json`** — Machine-readable output only
|
|
67
|
+
- **`--onFail import`** — On gate failure, import run metadata + failures to dashboard (idempotent per CI run)
|
|
68
|
+
- **`--explain`** — Show score breakdown (contribPts) and thresholds
|
|
69
|
+
- **`evalai doctor`** — Verify CI setup (config, API key, quality endpoint, baseline)
|
|
70
|
+
- **Pinned CLI invocation** — Use `npx -y @pauly4010/evalai-sdk@^1` for stable CI (avoids surprise v2 breaks)
|
|
71
|
+
|
|
72
|
+
#### Documentation
|
|
73
|
+
|
|
74
|
+
- **README** — 3-section adoption flow: 60s local → optional CI gate → no lock-in
|
|
75
|
+
- **Init output** — Shows path written, pinned snippet with `--format github --onFail import`
|
|
76
|
+
- **openAIChatEval** — "Gate this in CI" hint uses pinned invocation
|
|
77
|
+
|
|
78
|
+
### 🔧 Changed
|
|
79
|
+
|
|
80
|
+
- **evalai init** — Output: "Wrote evalai.config.json at {path}", one next step, uninstall line
|
|
81
|
+
- **Baseline missing** — Treated as config failure (BAD_ARGS), not API error
|
|
82
|
+
- **parseArgs** — Returns `{ ok, args }` or `{ ok: false }` (no `process.exit` inside) for testability
|
|
83
|
+
|
|
84
|
+
### 📦 Internal
|
|
85
|
+
|
|
86
|
+
- Refactored `check.ts` into modules: `api.ts`, `gate.ts`, `report/build-check-report.ts`, `formatters/`
|
|
87
|
+
- Deterministic helpers: `truncateSnippet`, `sortFailedCases`
|
|
88
|
+
- Formatter tests: `json.test.ts`, `github.test.ts`
|
|
89
|
+
- Doctor tests: `doctor.test.ts`
|
|
90
|
+
|
|
91
|
+
---
|
|
92
|
+
|
|
8
93
|
## [1.4.1] - 2026-02-18
|
|
9
94
|
|
|
10
95
|
### ✨ Added
|