npm - @verica-app/cli - Versions diffs - 0.1.3 → 0.1.5 - Mend

@verica-app/cli 0.1.3 → 0.1.5

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (4) hide show

package/CHANGELOG.md CHANGED Viewed

@@ -8,6 +8,24 @@ version and additive features/fixes bump the **patch** (see [Stability](./README
 ## [Unreleased]
+## [0.1.4] - 2026-06-19
+### Added
+- `--reuse-if-unchanged` (with `--reuse-max-age <hrs>` and `--reuse-same-ref`) to
+  reuse a recent **completed** run instead of executing again when the config
+  (prompt version + model + sampling + dataset snapshot + graders) is unchanged.
+  Opt-in and freshness-bounded (default 24h, max 720) — re-running stays the
+  default, since reuse can't see provider drift. On a cache hit the `--json`
+  element gains `"reused": true` and a `reusedFrom` block (additive — a normal
+  run's payload is unchanged); the API answers `200` instead of `202`. Cannot be
+  combined with `--threshold` / `--baseline-*` (the reused verdict was frozen under
+  the prior gate).
+- `--git-repo-url <url>` (auto-detected from `GITHUB_SERVER_URL`/`GITHUB_REPOSITORY`,
+  else GitLab's `CI_PROJECT_URL`) sent as `git.repoUrl`, so the run UI can link the
+  commit SHA to `<repoUrl>/commit/<sha>` and the branch to `<repoUrl>/tree/<ref>`.
 ## [0.1.3] - 2026-06-18
 ### Added

package/README.md CHANGED Viewed

@@ -47,12 +47,12 @@ evals:
   - id: eval_8x2k9d
     prompt: prompts/support-agent.txt
     systemPrompt: prompts/support-agent.system.txt
-    tools: prompts/support-agent.tools.json   # a path to a JSON file…
+    tools: prompts/support-agent.tools.json # a path to a JSON file…
     sampling: { temperature: 0.2, maxTokens: 512 }
     model: gpt-4.1-mini
   - id: eval_3p1m7q
     prompt: prompts/triage.txt
-    tools:                                      # …or an inline array
+    tools: # …or an inline array
       - name: get_order
         description: Look up an order by id
         parameters: { type: object, properties: { id: { type: string } }, required: [id] }
@@ -77,7 +77,8 @@ default — you don't configure a URL.
 - `--junit <file>` · `--junit-mode rows|gate` — JUnit report (default `rows`).
 - `--json` — machine-readable results on stdout.
 - `--threshold <0..1>` · `--baseline-ref <ref>` · `--baseline-run <id>` — override the gate per branch.
-- `--git-sha` / `--git-ref` — provenance (auto-detected from CI env otherwise).
+- `--reuse-if-unchanged` · `--reuse-max-age <hrs>` · `--reuse-same-ref` — reuse a recent completed run instead of re-executing an unchanged config. See [Reuse](#reuse-skip-re-running-an-unchanged-config).
+- `--git-sha` / `--git-ref` / `--git-repo-url` — provenance + the repo web base that links the SHA in the run UI (all auto-detected from CI env otherwise).
 > Local dev / self-hosting only: point the CLI at another instance with `--base-url`
 > (or the `VERICA_BASE_URL` env var). Clients never need this.
@@ -108,15 +109,58 @@ wrapper (auto-unwrapped), so you can paste your real schemas as-is:
 ```json
 [
-  { "name": "get_order", "description": "Look up an order by id",
-    "parameters": { "type": "object", "properties": { "id": { "type": "string" } }, "required": ["id"] } },
-  { "type": "function", "function": { "name": "cancel_order", "description": "…", "parameters": { "type": "object" } } }
+  {
+    "name": "get_order",
+    "description": "Look up an order by id",
+    "parameters": {
+      "type": "object",
+      "properties": { "id": { "type": "string" } },
+      "required": ["id"]
+    }
+  },
+  {
+    "type": "function",
+    "function": { "name": "cancel_order", "description": "…", "parameters": { "type": "object" } }
+  }
 ]
 ```
-Tools are never executed — the model's *decision* to call one (and with which
+Tools are never executed — the model's _decision_ to call one (and with which
 arguments) is what the eval grades.
+## Reuse (skip re-running an unchanged config)
+By default every `verica run` executes — re-running an unchanged eval is often the
+**point** in CI (it catches silent model drift and run-to-run variance, since an
+eval's output isn't a pure function of its inputs). When you'd rather save the
+tokens, opt in with `--reuse-if-unchanged`:
+```bash
+verica run --eval eval_8x2k9d --model gpt-4.1-mini --reuse-if-unchanged --wait
+# if the same config ran & completed in the last 24h, returns that verdict — no new run
+```
+"Unchanged" means the **prompt version + model + sampling + dataset snapshot +
+graders** all match a prior run (the gate is _not_ part of it — it decides the
+verdict, not the output). On a hit the CLI exits on the prior run's frozen verdict
+and the `--json` element carries `"reused": true` plus a `reusedFrom` block (the API
+also answers `200` instead of `202`).
+- `--reuse-max-age <hrs>` — how stale a reusable run may be (default **24**, max
+  **720**). There is no "forever": reuse can't see provider-side drift behind a
+  stable model id, so it's always bounded — that bound is your staleness budget.
+- `--reuse-same-ref` — only reuse a run on the **same git ref**. Off by default: an
+  identical config produces the same output distribution regardless of branch.
+- Only **completed** runs are reused (never a partial/failed one).
+- Incompatible with `--threshold` / `--baseline-ref` / `--baseline-run`. Reuse hands
+  back a _prior_ run's verdict, frozen under the gate that applied when it ran, so a
+  new `--threshold` can't be recomputed against it. `--baseline-ref` is worse than
+  stale: no-regression compares against the _last run on the ref_ — a moving target —
+  so a cached verdict can never be a fresh no-regression check. Gate on either → run
+  fresh (omit reuse).
+Omit `--reuse-if-unchanged` (the default) any time you want a guaranteed fresh run.
 ## Exit codes
 `0` passed · `1` gate failed · `2` validation/transport error.
@@ -136,8 +180,8 @@ During `0.x` the **minor** version is the breaking lever, so pin accordingly:
 We bump the **minor** for any breaking change (flags, output shapes, push behavior) and
 the **patch** for additive features and fixes. **1.0** will freeze the commands, flags,
-exit codes, and output shapes under standard semver. See
-[CHANGELOG.md](./CHANGELOG.md) for what changed in each release.
+exit codes, and output shapes under standard semver. See the bundled `CHANGELOG.md`
+for what changed in each release.
 MIT licensed. There's no IP in the client — the engine, graders, gate, and crypto all
 run server-side behind the token API.

package/dist/cli.js CHANGED Viewed

@@ -4086,7 +4086,13 @@ var runRequestSchema = external_exports.object({
   /** Commit provenance, stamped on the run. */
   git: external_exports.object({
     sha: external_exports.string().optional(),
-    ref: external_exports.string().optional()
+    ref: external_exports.string().optional(),
+    /**
+     * The repository's web base URL (e.g. `https://github.com/acme/widgets`) so
+     * the run UI can link the SHA → `<repoUrl>/commit/<sha>` and the branch →
+     * `<repoUrl>/tree/<ref>`. The CLI auto-detects it from CI env.
+     */
+    repoUrl: external_exports.string().optional()
   }).optional(),
   /** CLI gate overrides (precedence over the eval's pass_condition). */
   gate: external_exports.object({
@@ -4096,6 +4102,25 @@ var runRequestSchema = external_exports.object({
     baselineRef: external_exports.string().optional(),
     /** Pin a specific baseline run (wins over baselineRef). */
     baselineRunId: external_exports.string().optional()
+  }).optional(),
+  /**
+   * Opt-in cost control: when the merged config (prompt version + model +
+   * sampling + dataset snapshot + graders) matches a recent COMPLETED run, the
+   * server returns that run's frozen verdict instead of executing again. NOT a
+   * default — an eval's output isn't a pure function of its config (generation +
+   * judge are non-deterministic, the model endpoint drifts), so reuse is always
+   * the caller's explicit choice and is bounded by `maxAgeHours`. Incompatible
+   * with `gate`: a reused verdict is frozen under its own gate (a new threshold
+   * can't be recomputed), and no-regression compares against a moving baseline
+   * (the last run on the ref), so a cache can never be a fresh gated check.
+   */
+  reuse: external_exports.object({
+    /** Turn reuse on. The trigger — everything else is just tuning. */
+    ifUnchanged: external_exports.boolean().optional(),
+    /** Max age (hours) of a reusable run; server default 24, cap 720 (30d). No "infinite reuse". */
+    maxAgeHours: external_exports.number().positive().max(720).optional(),
+    /** Also require the prior run's git ref to match (per-branch isolation); default false. */
+    sameRef: external_exports.boolean().optional()
   }).optional()
 });
 var runAcceptedSchema = external_exports.object({
@@ -4104,7 +4129,23 @@ var runAcceptedSchema = external_exports.object({
   promptVersion: external_exports.number().int(),
   /** Whether a NEW prompt version was created (vs. the current one reused). */
   created: external_exports.boolean(),
-  resultUrl: external_exports.string()
+  resultUrl: external_exports.string(),
+  /**
+   * Whether this response reuses a prior run instead of executing a new one (a
+   * cache hit on `reuse.ifUnchanged`). The HTTP status reflects it too: 200 when
+   * reused, 202 when a fresh run was enqueued. Optional so an older API that
+   * predates reuse (omitting it) reads as `false`.
+   */
+  reused: external_exports.boolean().optional(),
+  /** Provenance of the reused run — present iff `reused` is true. */
+  reusedFrom: external_exports.object({
+    runId: external_exports.string(),
+    /** ISO timestamp the reused run finished — shows how stale the verdict is. */
+    finishedAt: external_exports.string(),
+    status: external_exports.literal("completed"),
+    gitSha: external_exports.string().nullable(),
+    gitRef: external_exports.string().nullable()
+  }).optional()
 });
 var runStatusSchema = external_exports.enum([
   "queued",
@@ -4409,16 +4450,21 @@ async function runCommand(opts) {
   const entries = await resolveEntries(opts);
   const git = resolveGit(opts);
   const gate = resolveGate(opts);
+  const reuse = resolveReuse(opts);
   const suites = [];
   const summaries = [];
   let worst = EXIT.pass;
   for (const entry of entries) {
     try {
-      const body = await buildRequest(entry, { samplingFile: opts.samplingFile, git, gate });
+      const body = await buildRequest(entry, { samplingFile: opts.samplingFile, git, gate, reuse });
       const accepted = await client.triggerRun(entry.id, body);
-      err(
-        `\u25B6 ${entry.id}: run ${accepted.runId} queued (prompt v${accepted.promptVersion}${accepted.created ? ", new version" : ", reused"})`
-      );
+      const promptNote = `prompt v${accepted.promptVersion}${accepted.created ? ", new version" : ""}`;
+      if (accepted.reused) {
+        err(`\u267B ${entry.id}: run ${accepted.runId} reused (${promptNote})`);
+        if (accepted.reusedFrom) err(`  \u21B3 a completed run from ${accepted.reusedFrom.finishedAt}`);
+      } else {
+        err(`\u25B6 ${entry.id}: run ${accepted.runId} queued (${promptNote})`);
+      }
       err(`  ${accepted.resultUrl}`);
       if (!opts.wait) {
         summaries.push(buildSummary(entry.id, { status: "queued", accepted }));
@@ -4435,7 +4481,7 @@ async function runCommand(opts) {
           opts.junitMode === "rows" ? rowsSuite(entry.id, await client.getResults(accepted.runId)) : gateSuite(entry.id, run)
         );
       }
-      summaries.push(buildSummary(entry.id, { status: "waited", runId: accepted.runId, run }));
+      summaries.push(buildSummary(entry.id, { status: "waited", accepted, run }));
     } catch (e) {
       worst = EXIT.error;
       const message = e instanceof Error ? e.message : String(e);
@@ -4487,19 +4533,34 @@ async function buildRequest(entry, ctx) {
     ...prompt ? { prompt } : {},
     ...sampling ? { samplingParams: sampling } : {},
     ...ctx.git ? { git: ctx.git } : {},
-    ...ctx.gate ? { gate: ctx.gate } : {}
+    ...ctx.gate ? { gate: ctx.gate } : {},
+    ...ctx.reuse ? { reuse: ctx.reuse } : {}
   };
 }
 function buildSummary(evalId, outcome) {
   switch (outcome.status) {
     case "queued":
-      return { evalId, runId: outcome.accepted.runId, resultUrl: outcome.accepted.resultUrl };
+      return {
+        evalId,
+        runId: outcome.accepted.runId,
+        resultUrl: outcome.accepted.resultUrl,
+        ...reuseFields(outcome.accepted)
+      };
     case "waited":
-      return { evalId, runId: outcome.runId, ...outcome.run };
+      return {
+        evalId,
+        runId: outcome.accepted.runId,
+        ...outcome.run,
+        ...reuseFields(outcome.accepted)
+      };
     case "error":
       return { evalId, error: outcome.message };
   }
 }
+function reuseFields(accepted) {
+  if (!accepted.reused) return {};
+  return { reused: true, ...accepted.reusedFrom ? { reusedFrom: accepted.reusedFrom } : {} };
+}
 async function resolveTools(tools) {
   if (tools === void 0) return void 0;
   const raw = typeof tools === "string" ? JSON.parse(await readFile2(tools, "utf8")) : tools;
@@ -4508,8 +4569,21 @@ async function resolveTools(tools) {
 function resolveGit(opts) {
   const sha = opts.gitSha ?? process.env.GITHUB_SHA ?? process.env.CI_COMMIT_SHA;
   const ref = opts.gitRef ?? process.env.GITHUB_REF ?? process.env.CI_COMMIT_REF_NAME;
-  if (!sha && !ref) return void 0;
-  return { ...sha ? { sha } : {}, ...ref ? { ref } : {} };
+  const repoUrl = resolveRepoUrl(opts);
+  if (!sha && !ref && !repoUrl) return void 0;
+  return {
+    ...sha ? { sha } : {},
+    ...ref ? { ref } : {},
+    ...repoUrl ? { repoUrl } : {}
+  };
+}
+function resolveRepoUrl(opts) {
+  if (opts.gitRepoUrl) return opts.gitRepoUrl.replace(/\/+$/, "");
+  const server = process.env.GITHUB_SERVER_URL;
+  const repo = process.env.GITHUB_REPOSITORY;
+  if (server && repo) return `${server.replace(/\/+$/, "")}/${repo}`;
+  if (process.env.CI_PROJECT_URL) return process.env.CI_PROJECT_URL.replace(/\/+$/, "");
+  return void 0;
 }
 function resolveGate(opts) {
   const gate = {};
@@ -4518,6 +4592,14 @@ function resolveGate(opts) {
   if (opts.baselineRun !== void 0) gate.baselineRunId = opts.baselineRun;
   return Object.keys(gate).length > 0 ? gate : void 0;
 }
+function resolveReuse(opts) {
+  if (!opts.reuseIfUnchanged) return void 0;
+  return {
+    ifUnchanged: true,
+    ...opts.reuseMaxAgeHours !== void 0 ? { maxAgeHours: opts.reuseMaxAgeHours } : {},
+    ...opts.reuseSameRef ? { sameRef: true } : {}
+  };
+}
 function pct(n) {
   return n == null ? "?" : `${(n * 100).toFixed(1)}%`;
 }
@@ -4562,8 +4644,16 @@ Options:
   --threshold <0..1>     Override the gate's minimum pass rate.
   --baseline-ref <ref>   No-regression baseline = last run on this git ref.
   --baseline-run <id>    No-regression baseline = this specific run.
+  --reuse-if-unchanged   Reuse a recent completed run instead of executing again
+                         when the config (prompt + model + sampling + dataset +
+                         graders) is unchanged. Off by default. Incompatible with
+                         --threshold / --baseline-*.
+  --reuse-max-age <hrs>  Max age of a reusable run (default 24, max 720).
+  --reuse-same-ref       Only reuse a run on the same git ref (default: any ref).
   --git-sha <sha>        Commit SHA (else auto-detected from CI env).
   --git-ref <ref>        Git ref (else auto-detected from CI env).
+  --git-repo-url <url>   Repo web base for the SHA link in the run UI (e.g.
+                         https://github.com/acme/widgets). Auto-detected from CI env.
   --base-url <url>       Override the API base URL (dev/self-host only).
   --poll-interval <sec>  Initial poll interval (default 3).
   --timeout <sec>        Max wait (default 1800).
@@ -4598,8 +4688,12 @@ async function main() {
       threshold: { type: "string" },
       "baseline-ref": { type: "string" },
       "baseline-run": { type: "string" },
+      "reuse-if-unchanged": { type: "boolean", default: false },
+      "reuse-max-age": { type: "string" },
+      "reuse-same-ref": { type: "boolean", default: false },
       "git-sha": { type: "string" },
       "git-ref": { type: "string" },
+      "git-repo-url": { type: "string" },
       "base-url": { type: "string" },
       "poll-interval": { type: "string" },
       timeout: { type: "string" },
@@ -4618,6 +4712,17 @@ async function main() {
   if (values.threshold !== void 0 && threshold === void 0) {
     throw new Error(`--threshold must be a number between 0 and 1 (got "${values.threshold}").`);
   }
+  const reuseMaxAge = finiteNumber(values["reuse-max-age"]);
+  if (values["reuse-max-age"] !== void 0 && (reuseMaxAge === void 0 || reuseMaxAge <= 0 || reuseMaxAge > 720)) {
+    throw new Error(
+      `--reuse-max-age must be a number of hours in (0, 720] (got "${values["reuse-max-age"]}").`
+    );
+  }
+  if (values["reuse-if-unchanged"] && (threshold !== void 0 || values["baseline-ref"] !== void 0 || values["baseline-run"] !== void 0)) {
+    throw new Error(
+      "--reuse-if-unchanged cannot be combined with --threshold / --baseline-ref / --baseline-run: a reused verdict is frozen under its own gate, and no-regression compares against a moving baseline \u2014 neither can be recomputed. Gate on these? Run fresh (drop --reuse-if-unchanged)."
+    );
+  }
   const opts = {
     baseUrl,
     token,
@@ -4635,8 +4740,12 @@ async function main() {
     threshold,
     baselineRef: values["baseline-ref"],
     baselineRun: values["baseline-run"],
+    reuseIfUnchanged: values["reuse-if-unchanged"] ?? false,
+    reuseMaxAgeHours: reuseMaxAge,
+    reuseSameRef: values["reuse-same-ref"] ?? false,
     gitSha: values["git-sha"],
     gitRef: values["git-ref"],
+    gitRepoUrl: values["git-repo-url"],
     pollIntervalMs: (finiteNumber(values["poll-interval"]) ?? 3) * 1e3,
     timeoutMs: (finiteNumber(values.timeout) ?? 1800) * 1e3
   };

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@verica-app/cli",
-  "version": "0.1.3",
+  "version": "0.1.5",
   "private": false,
   "description": "Run a Verica eval from CI and block the merge on the result.",
   "license": "MIT",