@pauly4010/evalai-sdk 1.4.1 → 1.5.5

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (95) hide show
  1. package/CHANGELOG.md +85 -0
  2. package/README.md +205 -543
  3. package/dist/assertions.d.ts +2 -2
  4. package/dist/assertions.js +104 -71
  5. package/dist/batch.js +12 -17
  6. package/dist/cache.js +7 -11
  7. package/dist/cli/api.d.ts +108 -0
  8. package/dist/cli/api.js +130 -0
  9. package/dist/cli/check.d.ts +28 -13
  10. package/dist/cli/check.js +249 -142
  11. package/dist/cli/ci-context.d.ts +6 -0
  12. package/dist/cli/ci-context.js +110 -0
  13. package/dist/cli/config.d.ts +30 -0
  14. package/dist/cli/config.js +207 -0
  15. package/dist/cli/constants.d.ts +15 -0
  16. package/dist/cli/constants.js +18 -0
  17. package/dist/cli/doctor.d.ts +11 -0
  18. package/dist/cli/doctor.js +82 -0
  19. package/dist/cli/formatters/github.d.ts +8 -0
  20. package/dist/cli/formatters/github.js +130 -0
  21. package/dist/cli/formatters/human.d.ts +6 -0
  22. package/dist/cli/formatters/human.js +107 -0
  23. package/dist/cli/formatters/json.d.ts +6 -0
  24. package/dist/cli/formatters/json.js +10 -0
  25. package/dist/cli/formatters/pr-comment.d.ts +12 -0
  26. package/dist/cli/formatters/pr-comment.js +101 -0
  27. package/dist/cli/formatters/types.d.ts +100 -0
  28. package/dist/cli/formatters/types.js +5 -0
  29. package/dist/cli/gate.d.ts +21 -0
  30. package/dist/cli/gate.js +175 -0
  31. package/dist/cli/index.d.ts +1 -0
  32. package/dist/cli/index.js +67 -23
  33. package/dist/cli/init.d.ts +7 -0
  34. package/dist/cli/init.js +69 -0
  35. package/dist/cli/policy-packs.d.ts +23 -0
  36. package/dist/cli/policy-packs.js +83 -0
  37. package/dist/cli/profiles.d.ts +28 -0
  38. package/dist/cli/profiles.js +30 -0
  39. package/dist/cli/reason-codes.d.ts +17 -0
  40. package/dist/cli/reason-codes.js +19 -0
  41. package/dist/cli/render/snippet.d.ts +5 -0
  42. package/dist/cli/render/snippet.js +15 -0
  43. package/dist/cli/render/sort.d.ts +10 -0
  44. package/dist/cli/render/sort.js +24 -0
  45. package/dist/cli/report/build-check-report.d.ts +19 -0
  46. package/dist/cli/report/build-check-report.js +124 -0
  47. package/dist/cli/share.d.ts +17 -0
  48. package/dist/cli/share.js +83 -0
  49. package/dist/client.d.ts +2 -2
  50. package/dist/client.js +144 -132
  51. package/dist/context.d.ts +1 -1
  52. package/dist/context.js +4 -6
  53. package/dist/errors.d.ts +2 -0
  54. package/dist/errors.js +116 -107
  55. package/dist/export.d.ts +6 -6
  56. package/dist/export.js +39 -33
  57. package/dist/index.d.ts +25 -24
  58. package/dist/index.js +62 -56
  59. package/dist/integrations/anthropic.d.ts +1 -1
  60. package/dist/integrations/anthropic.js +23 -19
  61. package/dist/integrations/openai-eval.d.ts +57 -0
  62. package/dist/integrations/openai-eval.js +230 -0
  63. package/dist/integrations/openai.d.ts +1 -1
  64. package/dist/integrations/openai.js +23 -19
  65. package/dist/local.d.ts +2 -2
  66. package/dist/local.js +25 -25
  67. package/dist/logger.d.ts +1 -1
  68. package/dist/logger.js +24 -28
  69. package/dist/matchers/index.d.ts +1 -0
  70. package/dist/matchers/index.js +6 -0
  71. package/dist/matchers/to-pass-gate.d.ts +29 -0
  72. package/dist/matchers/to-pass-gate.js +35 -0
  73. package/dist/pagination.d.ts +1 -1
  74. package/dist/pagination.js +6 -6
  75. package/dist/snapshot.js +24 -24
  76. package/dist/streaming.js +11 -11
  77. package/dist/testing.d.ts +6 -2
  78. package/dist/testing.js +30 -12
  79. package/dist/types.d.ts +22 -22
  80. package/dist/types.js +13 -13
  81. package/dist/utils/input-hash.d.ts +8 -0
  82. package/dist/utils/input-hash.js +38 -0
  83. package/dist/version.d.ts +7 -0
  84. package/dist/version.js +10 -0
  85. package/dist/workflows.d.ts +7 -7
  86. package/dist/workflows.js +44 -44
  87. package/package.json +102 -90
  88. package/dist/__tests__/assertions.test.d.ts +0 -1
  89. package/dist/__tests__/assertions.test.js +0 -288
  90. package/dist/__tests__/client.test.d.ts +0 -1
  91. package/dist/__tests__/client.test.js +0 -185
  92. package/dist/__tests__/testing.test.d.ts +0 -1
  93. package/dist/__tests__/testing.test.js +0 -230
  94. package/dist/__tests__/workflows.test.d.ts +0 -1
  95. package/dist/__tests__/workflows.test.js +0 -222
package/CHANGELOG.md CHANGED
@@ -5,6 +5,91 @@ All notable changes to the @pauly4010/evalai-sdk package will be documented in t
5
5
  The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
6
6
  and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
7
7
 
8
+ ## [1.5.5] - 2026-02-19
9
+
10
+ ### ✨ Added
11
+
12
+ #### Gate semantics (PASS / WARN / FAIL)
13
+
14
+ - **`--warnDrop <n>`** — Introduce a WARN band when score drops > `warnDrop` but < `maxDrop`
15
+ - **Gate verdicts:** PASS, WARN, FAIL
16
+ - **Profiles:** `strict` (warnDrop: 0), `balanced` (warnDrop: 1), `fast` (warnDrop: 2)
17
+ - **`--fail-on-flake`** — Fail the gate if any case is flagged as flaky (partial pass rate across determinism runs)
18
+
19
+ #### Determinism & flake intelligence
20
+
21
+ - **Adaptive variance thresholds** — Determinism audit passes if `absVariance ≤ 5` OR `relVariance ≤ 2%`
22
+ - **Per-case variance reporting** — Reports per-case pass rate across N runs and flags `[FLAKY]` cases
23
+ - **Golden dataset regression** — Added `evals/golden` with `pnpm eval:golden` to prevent semantic regressions
24
+ - **Golden drift output** — Writes `evals/golden/golden-results.json` with `currentScore`, `baselineScore`, `delta`, `passed`, and timestamps
25
+
26
+ #### CI audits & workflows
27
+
28
+ - **Nightly audits** — Added `audit-nightly.yml` for determinism + performance budgets (skips without `OPENAI_API_KEY`)
29
+ - **SDK compatibility matrix** — Added `sdk-compat.yml` to validate older SDK versions against current API
30
+ - **New audits:** `audit:retention`, `audit:migrations`, `audit:performance`, `audit:determinism`
31
+
32
+ #### Platform safety & governance (docs + proofs)
33
+
34
+ - **Audit trail docs** — Added `docs/audit-trail.md`
35
+ - **Observability docs** — Added `docs/observability.md` (log schema + requestId)
36
+ - **Retention docs** — Added `docs/data-retention.md`
37
+ - **Migration safety docs** — Added `docs/migration-safety.md`
38
+ - **Adoption benchmark** — Added `docs/adoption-benchmark.md`
39
+ - **Examples** — Added real-world example suites (RAG regression + agent tool-use)
40
+
41
+ ### 🔧 Changed
42
+
43
+ - **Exit codes updated** — 0=pass, **8=warn**, failures remain as documented for score/regression/policy/API/config
44
+ - **GitHub + human formatters** — Render WARN state, top contributors, and flake indicators where available
45
+ - **Rate limiting** — Adds `Retry-After` header on 429 responses
46
+ - **RequestId propagation** — `EvalAIError` surfaces `requestId` from response body or `x-request-id` header
47
+
48
+ ### 🧪 Testing
49
+
50
+ - Added tests for:
51
+ - access boundaries (no tenant info leak)
52
+ - rate-limit abuse patterns + `Retry-After`
53
+ - executor failure modes (timeouts / upstream 429 / malformed responses)
54
+ - error catalog stability + graceful handling of unknown codes
55
+ - exports contract (retention visibility, 410 semantics)
56
+
57
+ --
58
+
59
+ ## [1.5.0] - 2026-02-18
60
+
61
+ ### ✨ Added
62
+
63
+ #### evalai CLI — CI DevX
64
+
65
+ - **`--format github`** — GitHub Actions annotations + step summary (`$GITHUB_STEP_SUMMARY`)
66
+ - **`--format json`** — Machine-readable output only
67
+ - **`--onFail import`** — On gate failure, import run metadata + failures to dashboard (idempotent per CI run)
68
+ - **`--explain`** — Show score breakdown (contribPts) and thresholds
69
+ - **`evalai doctor`** — Verify CI setup (config, API key, quality endpoint, baseline)
70
+ - **Pinned CLI invocation** — Use `npx -y @pauly4010/evalai-sdk@^1` for stable CI (avoids surprise v2 breaks)
71
+
72
+ #### Documentation
73
+
74
+ - **README** — 3-section adoption flow: 60s local → optional CI gate → no lock-in
75
+ - **Init output** — Shows path written, pinned snippet with `--format github --onFail import`
76
+ - **openAIChatEval** — "Gate this in CI" hint uses pinned invocation
77
+
78
+ ### 🔧 Changed
79
+
80
+ - **evalai init** — Output: "Wrote evalai.config.json at {path}", one next step, uninstall line
81
+ - **Baseline missing** — Treated as config failure (BAD_ARGS), not API error
82
+ - **parseArgs** — Returns `{ ok, args }` or `{ ok: false }` (no `process.exit` inside) for testability
83
+
84
+ ### 📦 Internal
85
+
86
+ - Refactored `check.ts` into modules: `api.ts`, `gate.ts`, `report/build-check-report.ts`, `formatters/`
87
+ - Deterministic helpers: `truncateSnippet`, `sortFailedCases`
88
+ - Formatter tests: `json.test.ts`, `github.test.ts`
89
+ - Doctor tests: `doctor.test.ts`
90
+
91
+ ---
92
+
8
93
  ## [1.4.1] - 2026-02-18
9
94
 
10
95
  ### ✨ Added