@verica-app/cli 0.1.3 → 0.1.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -8,6 +8,24 @@ version and additive features/fixes bump the **patch** (see [Stability](./README
8
8
 
9
9
  ## [Unreleased]
10
10
 
11
+ ## [0.1.4] - 2026-06-19
12
+
13
+ ### Added
14
+
15
+ - `--reuse-if-unchanged` (with `--reuse-max-age <hrs>` and `--reuse-same-ref`) to
16
+ reuse a recent **completed** run instead of executing again when the config
17
+ (prompt version + model + sampling + dataset snapshot + graders) is unchanged.
18
+ Opt-in and freshness-bounded (default 24h, max 720) — re-running stays the
19
+ default, since reuse can't see provider drift. On a cache hit the `--json`
20
+ element gains `"reused": true` and a `reusedFrom` block (additive — a normal
21
+ run's payload is unchanged); the API answers `200` instead of `202`. Cannot be
22
+ combined with `--threshold` / `--baseline-*` (the reused verdict was frozen under
23
+ the prior gate).
24
+
25
+ - `--git-repo-url <url>` (auto-detected from `GITHUB_SERVER_URL`/`GITHUB_REPOSITORY`,
26
+ else GitLab's `CI_PROJECT_URL`) sent as `git.repoUrl`, so the run UI can link the
27
+ commit SHA to `<repoUrl>/commit/<sha>` and the branch to `<repoUrl>/tree/<ref>`.
28
+
11
29
  ## [0.1.3] - 2026-06-18
12
30
 
13
31
  ### Added
package/README.md CHANGED
@@ -47,12 +47,12 @@ evals:
47
47
  - id: eval_8x2k9d
48
48
  prompt: prompts/support-agent.txt
49
49
  systemPrompt: prompts/support-agent.system.txt
50
- tools: prompts/support-agent.tools.json # a path to a JSON file…
50
+ tools: prompts/support-agent.tools.json # a path to a JSON file…
51
51
  sampling: { temperature: 0.2, maxTokens: 512 }
52
52
  model: gpt-4.1-mini
53
53
  - id: eval_3p1m7q
54
54
  prompt: prompts/triage.txt
55
- tools: # …or an inline array
55
+ tools: # …or an inline array
56
56
  - name: get_order
57
57
  description: Look up an order by id
58
58
  parameters: { type: object, properties: { id: { type: string } }, required: [id] }
@@ -77,7 +77,8 @@ default — you don't configure a URL.
77
77
  - `--junit <file>` · `--junit-mode rows|gate` — JUnit report (default `rows`).
78
78
  - `--json` — machine-readable results on stdout.
79
79
  - `--threshold <0..1>` · `--baseline-ref <ref>` · `--baseline-run <id>` — override the gate per branch.
80
- - `--git-sha` / `--git-ref` — provenance (auto-detected from CI env otherwise).
80
+ - `--reuse-if-unchanged` · `--reuse-max-age <hrs>` · `--reuse-same-ref` — reuse a recent completed run instead of re-executing an unchanged config. See [Reuse](#reuse-skip-re-running-an-unchanged-config).
81
+ - `--git-sha` / `--git-ref` / `--git-repo-url` — provenance + the repo web base that links the SHA in the run UI (all auto-detected from CI env otherwise).
81
82
 
82
83
  > Local dev / self-hosting only: point the CLI at another instance with `--base-url`
83
84
  > (or the `VERICA_BASE_URL` env var). Clients never need this.
@@ -108,15 +109,54 @@ wrapper (auto-unwrapped), so you can paste your real schemas as-is:
108
109
 
109
110
  ```json
110
111
  [
111
- { "name": "get_order", "description": "Look up an order by id",
112
- "parameters": { "type": "object", "properties": { "id": { "type": "string" } }, "required": ["id"] } },
113
- { "type": "function", "function": { "name": "cancel_order", "description": "…", "parameters": { "type": "object" } } }
112
+ {
113
+ "name": "get_order",
114
+ "description": "Look up an order by id",
115
+ "parameters": {
116
+ "type": "object",
117
+ "properties": { "id": { "type": "string" } },
118
+ "required": ["id"]
119
+ }
120
+ },
121
+ {
122
+ "type": "function",
123
+ "function": { "name": "cancel_order", "description": "…", "parameters": { "type": "object" } }
124
+ }
114
125
  ]
115
126
  ```
116
127
 
117
- Tools are never executed — the model's *decision* to call one (and with which
128
+ Tools are never executed — the model's _decision_ to call one (and with which
118
129
  arguments) is what the eval grades.
119
130
 
131
+ ## Reuse (skip re-running an unchanged config)
132
+
133
+ By default every `verica run` executes — re-running an unchanged eval is often the
134
+ **point** in CI (it catches silent model drift and run-to-run variance, since an
135
+ eval's output isn't a pure function of its inputs). When you'd rather save the
136
+ tokens, opt in with `--reuse-if-unchanged`:
137
+
138
+ ```bash
139
+ verica run --eval eval_8x2k9d --model gpt-4.1-mini --reuse-if-unchanged --wait
140
+ # if the same config ran & completed in the last 24h, returns that verdict — no new run
141
+ ```
142
+
143
+ "Unchanged" means the **prompt version + model + sampling + dataset snapshot +
144
+ graders** all match a prior run (the gate is _not_ part of it — it decides the
145
+ verdict, not the output). On a hit the CLI exits on the prior run's frozen verdict
146
+ and the `--json` element carries `"reused": true` plus a `reusedFrom` block (the API
147
+ also answers `200` instead of `202`).
148
+
149
+ - `--reuse-max-age <hrs>` — how stale a reusable run may be (default **24**, max
150
+ **720**). There is no "forever": reuse can't see provider-side drift behind a
151
+ stable model id, so it's always bounded — that bound is your staleness budget.
152
+ - `--reuse-same-ref` — only reuse a run on the **same git ref**. Off by default: an
153
+ identical config produces the same output distribution regardless of branch.
154
+ - Only **completed** runs are reused (never a partial/failed one).
155
+ - Incompatible with `--threshold` / `--baseline-ref` / `--baseline-run` — a reused
156
+ verdict was frozen under its own gate, so it can't honor a new one.
157
+
158
+ Omit `--reuse-if-unchanged` (the default) any time you want a guaranteed fresh run.
159
+
120
160
  ## Exit codes
121
161
 
122
162
  `0` passed · `1` gate failed · `2` validation/transport error.
package/dist/cli.js CHANGED
@@ -4086,7 +4086,13 @@ var runRequestSchema = external_exports.object({
4086
4086
  /** Commit provenance, stamped on the run. */
4087
4087
  git: external_exports.object({
4088
4088
  sha: external_exports.string().optional(),
4089
- ref: external_exports.string().optional()
4089
+ ref: external_exports.string().optional(),
4090
+ /**
4091
+ * The repository's web base URL (e.g. `https://github.com/acme/widgets`) so
4092
+ * the run UI can link the SHA → `<repoUrl>/commit/<sha>` and the branch →
4093
+ * `<repoUrl>/tree/<ref>`. The CLI auto-detects it from CI env.
4094
+ */
4095
+ repoUrl: external_exports.string().optional()
4090
4096
  }).optional(),
4091
4097
  /** CLI gate overrides (precedence over the eval's pass_condition). */
4092
4098
  gate: external_exports.object({
@@ -4096,6 +4102,23 @@ var runRequestSchema = external_exports.object({
4096
4102
  baselineRef: external_exports.string().optional(),
4097
4103
  /** Pin a specific baseline run (wins over baselineRef). */
4098
4104
  baselineRunId: external_exports.string().optional()
4105
+ }).optional(),
4106
+ /**
4107
+ * Opt-in cost control: when the merged config (prompt version + model +
4108
+ * sampling + dataset snapshot + graders) matches a recent COMPLETED run, the
4109
+ * server returns that run's frozen verdict instead of executing again. NOT a
4110
+ * default — an eval's output isn't a pure function of its config (generation +
4111
+ * judge are non-deterministic, the model endpoint drifts), so reuse is always
4112
+ * the caller's explicit choice and is bounded by `maxAgeHours`. Incompatible
4113
+ * with `gate` (the cached verdict was frozen under the old gate).
4114
+ */
4115
+ reuse: external_exports.object({
4116
+ /** Turn reuse on. The trigger — everything else is just tuning. */
4117
+ ifUnchanged: external_exports.boolean().optional(),
4118
+ /** Max age (hours) of a reusable run; server default 24, cap 720 (30d). No "infinite reuse". */
4119
+ maxAgeHours: external_exports.number().positive().max(720).optional(),
4120
+ /** Also require the prior run's git ref to match (per-branch isolation); default false. */
4121
+ sameRef: external_exports.boolean().optional()
4099
4122
  }).optional()
4100
4123
  });
4101
4124
  var runAcceptedSchema = external_exports.object({
@@ -4104,7 +4127,23 @@ var runAcceptedSchema = external_exports.object({
4104
4127
  promptVersion: external_exports.number().int(),
4105
4128
  /** Whether a NEW prompt version was created (vs. the current one reused). */
4106
4129
  created: external_exports.boolean(),
4107
- resultUrl: external_exports.string()
4130
+ resultUrl: external_exports.string(),
4131
+ /**
4132
+ * Whether this response reuses a prior run instead of executing a new one (a
4133
+ * cache hit on `reuse.ifUnchanged`). The HTTP status reflects it too: 200 when
4134
+ * reused, 202 when a fresh run was enqueued. Optional so an older API that
4135
+ * predates reuse (omitting it) reads as `false`.
4136
+ */
4137
+ reused: external_exports.boolean().optional(),
4138
+ /** Provenance of the reused run — present iff `reused` is true. */
4139
+ reusedFrom: external_exports.object({
4140
+ runId: external_exports.string(),
4141
+ /** ISO timestamp the reused run finished — shows how stale the verdict is. */
4142
+ finishedAt: external_exports.string(),
4143
+ status: external_exports.literal("completed"),
4144
+ gitSha: external_exports.string().nullable(),
4145
+ gitRef: external_exports.string().nullable()
4146
+ }).optional()
4108
4147
  });
4109
4148
  var runStatusSchema = external_exports.enum([
4110
4149
  "queued",
@@ -4409,16 +4448,21 @@ async function runCommand(opts) {
4409
4448
  const entries = await resolveEntries(opts);
4410
4449
  const git = resolveGit(opts);
4411
4450
  const gate = resolveGate(opts);
4451
+ const reuse = resolveReuse(opts);
4412
4452
  const suites = [];
4413
4453
  const summaries = [];
4414
4454
  let worst = EXIT.pass;
4415
4455
  for (const entry of entries) {
4416
4456
  try {
4417
- const body = await buildRequest(entry, { samplingFile: opts.samplingFile, git, gate });
4457
+ const body = await buildRequest(entry, { samplingFile: opts.samplingFile, git, gate, reuse });
4418
4458
  const accepted = await client.triggerRun(entry.id, body);
4419
- err(
4420
- `\u25B6 ${entry.id}: run ${accepted.runId} queued (prompt v${accepted.promptVersion}${accepted.created ? ", new version" : ", reused"})`
4421
- );
4459
+ const promptNote = `prompt v${accepted.promptVersion}${accepted.created ? ", new version" : ""}`;
4460
+ if (accepted.reused) {
4461
+ err(`\u267B ${entry.id}: run ${accepted.runId} reused (${promptNote})`);
4462
+ if (accepted.reusedFrom) err(` \u21B3 a completed run from ${accepted.reusedFrom.finishedAt}`);
4463
+ } else {
4464
+ err(`\u25B6 ${entry.id}: run ${accepted.runId} queued (${promptNote})`);
4465
+ }
4422
4466
  err(` ${accepted.resultUrl}`);
4423
4467
  if (!opts.wait) {
4424
4468
  summaries.push(buildSummary(entry.id, { status: "queued", accepted }));
@@ -4435,7 +4479,7 @@ async function runCommand(opts) {
4435
4479
  opts.junitMode === "rows" ? rowsSuite(entry.id, await client.getResults(accepted.runId)) : gateSuite(entry.id, run)
4436
4480
  );
4437
4481
  }
4438
- summaries.push(buildSummary(entry.id, { status: "waited", runId: accepted.runId, run }));
4482
+ summaries.push(buildSummary(entry.id, { status: "waited", accepted, run }));
4439
4483
  } catch (e) {
4440
4484
  worst = EXIT.error;
4441
4485
  const message = e instanceof Error ? e.message : String(e);
@@ -4487,19 +4531,34 @@ async function buildRequest(entry, ctx) {
4487
4531
  ...prompt ? { prompt } : {},
4488
4532
  ...sampling ? { samplingParams: sampling } : {},
4489
4533
  ...ctx.git ? { git: ctx.git } : {},
4490
- ...ctx.gate ? { gate: ctx.gate } : {}
4534
+ ...ctx.gate ? { gate: ctx.gate } : {},
4535
+ ...ctx.reuse ? { reuse: ctx.reuse } : {}
4491
4536
  };
4492
4537
  }
4493
4538
  function buildSummary(evalId, outcome) {
4494
4539
  switch (outcome.status) {
4495
4540
  case "queued":
4496
- return { evalId, runId: outcome.accepted.runId, resultUrl: outcome.accepted.resultUrl };
4541
+ return {
4542
+ evalId,
4543
+ runId: outcome.accepted.runId,
4544
+ resultUrl: outcome.accepted.resultUrl,
4545
+ ...reuseFields(outcome.accepted)
4546
+ };
4497
4547
  case "waited":
4498
- return { evalId, runId: outcome.runId, ...outcome.run };
4548
+ return {
4549
+ evalId,
4550
+ runId: outcome.accepted.runId,
4551
+ ...outcome.run,
4552
+ ...reuseFields(outcome.accepted)
4553
+ };
4499
4554
  case "error":
4500
4555
  return { evalId, error: outcome.message };
4501
4556
  }
4502
4557
  }
4558
+ function reuseFields(accepted) {
4559
+ if (!accepted.reused) return {};
4560
+ return { reused: true, ...accepted.reusedFrom ? { reusedFrom: accepted.reusedFrom } : {} };
4561
+ }
4503
4562
  async function resolveTools(tools) {
4504
4563
  if (tools === void 0) return void 0;
4505
4564
  const raw = typeof tools === "string" ? JSON.parse(await readFile2(tools, "utf8")) : tools;
@@ -4508,8 +4567,21 @@ async function resolveTools(tools) {
4508
4567
  function resolveGit(opts) {
4509
4568
  const sha = opts.gitSha ?? process.env.GITHUB_SHA ?? process.env.CI_COMMIT_SHA;
4510
4569
  const ref = opts.gitRef ?? process.env.GITHUB_REF ?? process.env.CI_COMMIT_REF_NAME;
4511
- if (!sha && !ref) return void 0;
4512
- return { ...sha ? { sha } : {}, ...ref ? { ref } : {} };
4570
+ const repoUrl = resolveRepoUrl(opts);
4571
+ if (!sha && !ref && !repoUrl) return void 0;
4572
+ return {
4573
+ ...sha ? { sha } : {},
4574
+ ...ref ? { ref } : {},
4575
+ ...repoUrl ? { repoUrl } : {}
4576
+ };
4577
+ }
4578
+ function resolveRepoUrl(opts) {
4579
+ if (opts.gitRepoUrl) return opts.gitRepoUrl.replace(/\/+$/, "");
4580
+ const server = process.env.GITHUB_SERVER_URL;
4581
+ const repo = process.env.GITHUB_REPOSITORY;
4582
+ if (server && repo) return `${server.replace(/\/+$/, "")}/${repo}`;
4583
+ if (process.env.CI_PROJECT_URL) return process.env.CI_PROJECT_URL.replace(/\/+$/, "");
4584
+ return void 0;
4513
4585
  }
4514
4586
  function resolveGate(opts) {
4515
4587
  const gate = {};
@@ -4518,6 +4590,14 @@ function resolveGate(opts) {
4518
4590
  if (opts.baselineRun !== void 0) gate.baselineRunId = opts.baselineRun;
4519
4591
  return Object.keys(gate).length > 0 ? gate : void 0;
4520
4592
  }
4593
+ function resolveReuse(opts) {
4594
+ if (!opts.reuseIfUnchanged) return void 0;
4595
+ return {
4596
+ ifUnchanged: true,
4597
+ ...opts.reuseMaxAgeHours !== void 0 ? { maxAgeHours: opts.reuseMaxAgeHours } : {},
4598
+ ...opts.reuseSameRef ? { sameRef: true } : {}
4599
+ };
4600
+ }
4521
4601
  function pct(n) {
4522
4602
  return n == null ? "?" : `${(n * 100).toFixed(1)}%`;
4523
4603
  }
@@ -4562,8 +4642,16 @@ Options:
4562
4642
  --threshold <0..1> Override the gate's minimum pass rate.
4563
4643
  --baseline-ref <ref> No-regression baseline = last run on this git ref.
4564
4644
  --baseline-run <id> No-regression baseline = this specific run.
4645
+ --reuse-if-unchanged Reuse a recent completed run instead of executing again
4646
+ when the config (prompt + model + sampling + dataset +
4647
+ graders) is unchanged. Off by default. Incompatible with
4648
+ --threshold / --baseline-*.
4649
+ --reuse-max-age <hrs> Max age of a reusable run (default 24, max 720).
4650
+ --reuse-same-ref Only reuse a run on the same git ref (default: any ref).
4565
4651
  --git-sha <sha> Commit SHA (else auto-detected from CI env).
4566
4652
  --git-ref <ref> Git ref (else auto-detected from CI env).
4653
+ --git-repo-url <url> Repo web base for the SHA link in the run UI (e.g.
4654
+ https://github.com/acme/widgets). Auto-detected from CI env.
4567
4655
  --base-url <url> Override the API base URL (dev/self-host only).
4568
4656
  --poll-interval <sec> Initial poll interval (default 3).
4569
4657
  --timeout <sec> Max wait (default 1800).
@@ -4598,8 +4686,12 @@ async function main() {
4598
4686
  threshold: { type: "string" },
4599
4687
  "baseline-ref": { type: "string" },
4600
4688
  "baseline-run": { type: "string" },
4689
+ "reuse-if-unchanged": { type: "boolean", default: false },
4690
+ "reuse-max-age": { type: "string" },
4691
+ "reuse-same-ref": { type: "boolean", default: false },
4601
4692
  "git-sha": { type: "string" },
4602
4693
  "git-ref": { type: "string" },
4694
+ "git-repo-url": { type: "string" },
4603
4695
  "base-url": { type: "string" },
4604
4696
  "poll-interval": { type: "string" },
4605
4697
  timeout: { type: "string" },
@@ -4618,6 +4710,17 @@ async function main() {
4618
4710
  if (values.threshold !== void 0 && threshold === void 0) {
4619
4711
  throw new Error(`--threshold must be a number between 0 and 1 (got "${values.threshold}").`);
4620
4712
  }
4713
+ const reuseMaxAge = finiteNumber(values["reuse-max-age"]);
4714
+ if (values["reuse-max-age"] !== void 0 && (reuseMaxAge === void 0 || reuseMaxAge <= 0 || reuseMaxAge > 720)) {
4715
+ throw new Error(
4716
+ `--reuse-max-age must be a number of hours in (0, 720] (got "${values["reuse-max-age"]}").`
4717
+ );
4718
+ }
4719
+ if (values["reuse-if-unchanged"] && (threshold !== void 0 || values["baseline-ref"] !== void 0 || values["baseline-run"] !== void 0)) {
4720
+ throw new Error(
4721
+ "--reuse-if-unchanged cannot be combined with --threshold / --baseline-ref / --baseline-run (a reused verdict was frozen under the prior gate)."
4722
+ );
4723
+ }
4621
4724
  const opts = {
4622
4725
  baseUrl,
4623
4726
  token,
@@ -4635,8 +4738,12 @@ async function main() {
4635
4738
  threshold,
4636
4739
  baselineRef: values["baseline-ref"],
4637
4740
  baselineRun: values["baseline-run"],
4741
+ reuseIfUnchanged: values["reuse-if-unchanged"] ?? false,
4742
+ reuseMaxAgeHours: reuseMaxAge,
4743
+ reuseSameRef: values["reuse-same-ref"] ?? false,
4638
4744
  gitSha: values["git-sha"],
4639
4745
  gitRef: values["git-ref"],
4746
+ gitRepoUrl: values["git-repo-url"],
4640
4747
  pollIntervalMs: (finiteNumber(values["poll-interval"]) ?? 3) * 1e3,
4641
4748
  timeoutMs: (finiteNumber(values.timeout) ?? 1800) * 1e3
4642
4749
  };
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@verica-app/cli",
3
- "version": "0.1.3",
3
+ "version": "0.1.4",
4
4
  "private": false,
5
5
  "description": "Run a Verica eval from CI and block the merge on the result.",
6
6
  "license": "MIT",