npm - @verica-app/cli - Versions diffs - 0.1.2 → 0.1.4 - Mend

@verica-app/cli 0.1.2 → 0.1.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (4) hide show

package/CHANGELOG.md ADDED Viewed

@@ -0,0 +1,84 @@
+# Changelog
+All notable changes to `@verica-app/cli` are documented here.
+The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).
+This package is **pre-1.0**: while on `0.x`, a breaking change bumps the **minor**
+version and additive features/fixes bump the **patch** (see [Stability](./README.md#stability)).
+## [Unreleased]
+## [0.1.4] - 2026-06-19
+### Added
+- `--reuse-if-unchanged` (with `--reuse-max-age <hrs>` and `--reuse-same-ref`) to
+  reuse a recent **completed** run instead of executing again when the config
+  (prompt version + model + sampling + dataset snapshot + graders) is unchanged.
+  Opt-in and freshness-bounded (default 24h, max 720) — re-running stays the
+  default, since reuse can't see provider drift. On a cache hit the `--json`
+  element gains `"reused": true` and a `reusedFrom` block (additive — a normal
+  run's payload is unchanged); the API answers `200` instead of `202`. Cannot be
+  combined with `--threshold` / `--baseline-*` (the reused verdict was frozen under
+  the prior gate).
+- `--git-repo-url <url>` (auto-detected from `GITHUB_SERVER_URL`/`GITHUB_REPOSITORY`,
+  else GitLab's `CI_PROJECT_URL`) sent as `git.repoUrl`, so the run UI can link the
+  commit SHA to `<repoUrl>/commit/<sha>` and the branch to `<repoUrl>/tree/<ref>`.
+## [0.1.3] - 2026-06-18
+### Added
+- `--tools <file>` flag and `tools:` in `.verica.yml` to push a prompt's tool
+  definitions from the repo. The manifest accepts either a **path to a JSON file**
+  or an **inline array**. Each tool may use Verica's flat shape
+  (`{ name, description, parameters }`) **or** the OpenAI wrapper
+  (`{ type: "function", function: { … } }`), which is auto-unwrapped — paste your
+  real schemas as-is.
+### Changed
+- **Prompt push is now field-level.** `--prompt` (user template), `--system-prompt`,
+  and `--tools` are independent: push only the fields you changed and every omitted
+  field is **inherited from the eval's current prompt version**. A new version is
+  created only if the merged content differs.
+  - Previously, a push that included `--prompt` without `--system-prompt` produced a
+    version with **no system prompt** (it was dropped, not inherited). If your CI
+    relied on that, pass all the fields you intend to set.
+  - `--system-prompt` (or `--tools`) can now be pushed **on its own** — earlier the
+    CLI sent a prompt block only when `--prompt` was present.
+- Prompt templates reference dataset columns by their **bare name** (`{{ pais }}`),
+  and grader/judge prompts reference the model output via `{{ output.text }}` /
+  `{{ output.tool_calls }}`. The legacy `{{ item.* }}` / `{{ sample.* }}` forms still
+  resolve, so existing prompt files keep working.
+## [0.1.2] - 2026-06-18
+### Changed
+- Default to the hosted API (`https://verica.app`). `VERICA_BASE_URL` and `--base-url`
+  are now overrides for local dev / self-hosting only — clients no longer configure a URL.
+## [0.1.1] - 2026-06-18
+### Changed
+- Published under the `@verica-app` scope, with a tag-triggered release workflow
+  (`cli-v*` → `npm publish --provenance`). No behavior change.
+## [0.1.0] - 2026-06-18
+### Added
+- Initial release. The `run` command:
+  - `--eval <id>` (single) or `--manifest <file>` (`.verica.yml`, multi-prompt).
+  - Pushes prompt content (`--prompt` / `--system-prompt`), versioned by content equality.
+  - `--model` · `--sampling <file.json>` for execution config.
+  - `--wait` polls to a terminal status; the exit code reflects the gate.
+  - `--junit <file>` · `--junit-mode rows|gate` for a JUnit report; `--json` for
+    machine-readable results.
+  - `--threshold` · `--baseline-ref` · `--baseline-run` to override the gate per branch.
+  - Git provenance (`--git-sha` / `--git-ref`) auto-detected from common CI env vars.
+  - Exit codes: `0` passed · `1` gate failed · `2` validation/transport error.
+  - `VERICA_TOKEN` is the only required secret; provider keys stay in Verica (BYOK).

package/README.md CHANGED Viewed

@@ -27,6 +27,8 @@ npm i -D @verica-app/cli
 verica run \
   --eval eval_8x2k9d \
   --prompt prompts/support-agent.txt \
+  --system-prompt prompts/support-agent.system.txt \  # optional
+  --tools prompts/support-agent.tools.json \          # optional
   --model gpt-4.1-mini \
   --wait \
   --junit verica-results.xml \
@@ -44,10 +46,16 @@ verica run --manifest .verica.yml --wait --junit report.xml
 evals:
   - id: eval_8x2k9d
     prompt: prompts/support-agent.txt
+    systemPrompt: prompts/support-agent.system.txt
+    tools: prompts/support-agent.tools.json # a path to a JSON file…
     sampling: { temperature: 0.2, maxTokens: 512 }
     model: gpt-4.1-mini
   - id: eval_3p1m7q
     prompt: prompts/triage.txt
+    tools: # …or an inline array
+      - name: get_order
+        description: Look up an order by id
+        parameters: { type: object, properties: { id: { type: string } }, required: [id] }
     model: claude-sonnet-4-6
 ```
@@ -63,20 +71,113 @@ default — you don't configure a URL.
 ## Key flags
 - `--eval <id>` / `--manifest <file>` — what to run.
-- `--prompt <file>` / `--system-prompt <file>` — prompt content to push (versioned by content).
+- `--prompt <file>` / `--system-prompt <file>` / `--tools <file>` — prompt content to push (versioned by content). See [Prompt content](#prompt-content-what-you-push).
 - `--model <model>` · `--sampling <file.json>` — execution config.
 - `--wait` — poll to completion; the exit code reflects the gate.
 - `--junit <file>` · `--junit-mode rows|gate` — JUnit report (default `rows`).
 - `--json` — machine-readable results on stdout.
 - `--threshold <0..1>` · `--baseline-ref <ref>` · `--baseline-run <id>` — override the gate per branch.
-- `--git-sha` / `--git-ref` — provenance (auto-detected from CI env otherwise).
+- `--reuse-if-unchanged` · `--reuse-max-age <hrs>` · `--reuse-same-ref` — reuse a recent completed run instead of re-executing an unchanged config. See [Reuse](#reuse-skip-re-running-an-unchanged-config).
+- `--git-sha` / `--git-ref` / `--git-repo-url` — provenance + the repo web base that links the SHA in the run UI (all auto-detected from CI env otherwise).
 > Local dev / self-hosting only: point the CLI at another instance with `--base-url`
 > (or the `VERICA_BASE_URL` env var). Clients never need this.
+## Prompt content (what you push)
+The repo owns the **prompt**: the user template (`--prompt`), the system prompt
+(`--system-prompt`), and the tool definitions (`--tools`). The **dataset, graders,
+gate, and any few-shot/simulated turns stay in Verica** — they're the test scenario,
+managed by whoever owns the eval.
+Each of the three prompt fields is **independent and optional**: push the ones you
+changed and every omitted field is **inherited from the current version**. A push
+creates a new prompt version only if the merged content actually differs.
+```bash
+verica run --eval eval_8x2k9d --system-prompt prompts/agent.system.txt --model gpt-4.1-mini --wait
+# user template + tools inherited; only the system prompt re-versions
+```
+Templates reference dataset columns by name — e.g. `What is the capital of {{ pais }}?`
+(the column is `pais`). Grader/judge prompts can also reference the model output via
+`{{ output.text }}` / `{{ output.tool_calls }}`.
+**Tools** are pushed as JSON — a `--tools <file>`, or a path / inline array under
+`tools:` in the manifest. Each entry may be Verica's flat shape **or** the OpenAI
+wrapper (auto-unwrapped), so you can paste your real schemas as-is:
+```json
+[
+  {
+    "name": "get_order",
+    "description": "Look up an order by id",
+    "parameters": {
+      "type": "object",
+      "properties": { "id": { "type": "string" } },
+      "required": ["id"]
+    }
+  },
+  {
+    "type": "function",
+    "function": { "name": "cancel_order", "description": "…", "parameters": { "type": "object" } }
+  }
+]
+```
+Tools are never executed — the model's _decision_ to call one (and with which
+arguments) is what the eval grades.
+## Reuse (skip re-running an unchanged config)
+By default every `verica run` executes — re-running an unchanged eval is often the
+**point** in CI (it catches silent model drift and run-to-run variance, since an
+eval's output isn't a pure function of its inputs). When you'd rather save the
+tokens, opt in with `--reuse-if-unchanged`:
+```bash
+verica run --eval eval_8x2k9d --model gpt-4.1-mini --reuse-if-unchanged --wait
+# if the same config ran & completed in the last 24h, returns that verdict — no new run
+```
+"Unchanged" means the **prompt version + model + sampling + dataset snapshot +
+graders** all match a prior run (the gate is _not_ part of it — it decides the
+verdict, not the output). On a hit the CLI exits on the prior run's frozen verdict
+and the `--json` element carries `"reused": true` plus a `reusedFrom` block (the API
+also answers `200` instead of `202`).
+- `--reuse-max-age <hrs>` — how stale a reusable run may be (default **24**, max
+  **720**). There is no "forever": reuse can't see provider-side drift behind a
+  stable model id, so it's always bounded — that bound is your staleness budget.
+- `--reuse-same-ref` — only reuse a run on the **same git ref**. Off by default: an
+  identical config produces the same output distribution regardless of branch.
+- Only **completed** runs are reused (never a partial/failed one).
+- Incompatible with `--threshold` / `--baseline-ref` / `--baseline-run` — a reused
+  verdict was frozen under its own gate, so it can't honor a new one.
+Omit `--reuse-if-unchanged` (the default) any time you want a guaranteed fresh run.
 ## Exit codes
 `0` passed · `1` gate failed · `2` validation/transport error.
+## Stability
+This CLI is **pre-1.0 (`0.x`)**. The command surface, the `--json` payload, the JUnit
+output, and the prompt-push behavior are still settling and may change. Exit codes
+(`0`/`1`/`2`) are stable.
+During `0.x` the **minor** version is the breaking lever, so pin accordingly:
+```jsonc
+// package.json
+"@verica-app/cli": "~0.1"   // >=0.1.0 <0.2.0 — gets patches, not breaking minors
+```
+We bump the **minor** for any breaking change (flags, output shapes, push behavior) and
+the **patch** for additive features and fixes. **1.0** will freeze the commands, flags,
+exit codes, and output shapes under standard semver. See
+[CHANGELOG.md](./CHANGELOG.md) for what changed in each release.
 MIT licensed. There's no IP in the client — the engine, graders, gate, and crypto all
 run server-side behind the token API.

package/dist/cli.js CHANGED Viewed

@@ -4067,9 +4067,16 @@ var samplingParamsSchema = external_exports.object({
   reasoning: external_exports.boolean().optional()
 });
 var runRequestSchema = external_exports.object({
-  /** Omit to run the eval's current (UI-managed) prompt unchanged. */
+  /**
+   * Prompt content to push. Omit the whole block to run the eval's current
+   * (UI-managed) prompt unchanged. Each field is independent: supply only the
+   * ones you're changing — every omitted field (template / systemPrompt / tools)
+   * is inherited from the current version, and a new version is created only if
+   * the merged result differs (e.g. push just `systemPrompt` to re-version the
+   * system prompt while keeping the user template).
+   */
   prompt: external_exports.object({
-    template: external_exports.string(),
+    template: external_exports.string().optional(),
     systemPrompt: external_exports.string().optional(),
     tools: external_exports.array(toolDefinitionSchema).optional()
   }).optional(),
@@ -4079,7 +4086,13 @@ var runRequestSchema = external_exports.object({
   /** Commit provenance, stamped on the run. */
   git: external_exports.object({
     sha: external_exports.string().optional(),
-    ref: external_exports.string().optional()
+    ref: external_exports.string().optional(),
+    /**
+     * The repository's web base URL (e.g. `https://github.com/acme/widgets`) so
+     * the run UI can link the SHA → `<repoUrl>/commit/<sha>` and the branch →
+     * `<repoUrl>/tree/<ref>`. The CLI auto-detects it from CI env.
+     */
+    repoUrl: external_exports.string().optional()
   }).optional(),
   /** CLI gate overrides (precedence over the eval's pass_condition). */
   gate: external_exports.object({
@@ -4089,6 +4102,23 @@ var runRequestSchema = external_exports.object({
     baselineRef: external_exports.string().optional(),
     /** Pin a specific baseline run (wins over baselineRef). */
     baselineRunId: external_exports.string().optional()
+  }).optional(),
+  /**
+   * Opt-in cost control: when the merged config (prompt version + model +
+   * sampling + dataset snapshot + graders) matches a recent COMPLETED run, the
+   * server returns that run's frozen verdict instead of executing again. NOT a
+   * default — an eval's output isn't a pure function of its config (generation +
+   * judge are non-deterministic, the model endpoint drifts), so reuse is always
+   * the caller's explicit choice and is bounded by `maxAgeHours`. Incompatible
+   * with `gate` (the cached verdict was frozen under the old gate).
+   */
+  reuse: external_exports.object({
+    /** Turn reuse on. The trigger — everything else is just tuning. */
+    ifUnchanged: external_exports.boolean().optional(),
+    /** Max age (hours) of a reusable run; server default 24, cap 720 (30d). No "infinite reuse". */
+    maxAgeHours: external_exports.number().positive().max(720).optional(),
+    /** Also require the prior run's git ref to match (per-branch isolation); default false. */
+    sameRef: external_exports.boolean().optional()
   }).optional()
 });
 var runAcceptedSchema = external_exports.object({
@@ -4097,7 +4127,23 @@ var runAcceptedSchema = external_exports.object({
   promptVersion: external_exports.number().int(),
   /** Whether a NEW prompt version was created (vs. the current one reused). */
   created: external_exports.boolean(),
-  resultUrl: external_exports.string()
+  resultUrl: external_exports.string(),
+  /**
+   * Whether this response reuses a prior run instead of executing a new one (a
+   * cache hit on `reuse.ifUnchanged`). The HTTP status reflects it too: 200 when
+   * reused, 202 when a fresh run was enqueued. Optional so an older API that
+   * predates reuse (omitting it) reads as `false`.
+   */
+  reused: external_exports.boolean().optional(),
+  /** Provenance of the reused run — present iff `reused` is true. */
+  reusedFrom: external_exports.object({
+    runId: external_exports.string(),
+    /** ISO timestamp the reused run finished — shows how stale the verdict is. */
+    finishedAt: external_exports.string(),
+    status: external_exports.literal("completed"),
+    gitSha: external_exports.string().nullable(),
+    gitRef: external_exports.string().nullable()
+  }).optional()
 });
 var runStatusSchema = external_exports.enum([
   "queued",
@@ -4289,6 +4335,7 @@ function parseManifest(raw, source = ".verica.yml") {
       id: e.id,
       prompt: typeof e.prompt === "string" ? e.prompt : void 0,
       systemPrompt: typeof e.systemPrompt === "string" ? e.systemPrompt : void 0,
+      tools: typeof e.tools === "string" || Array.isArray(e.tools) ? e.tools : void 0,
       model: typeof e.model === "string" ? e.model : void 0,
       sampling: e.sampling ?? void 0
     };
@@ -4298,6 +4345,27 @@ async function loadManifest(path) {
   return parseManifest(await readFile(path, "utf8"), path);
 }
+// src/tools.ts
+function unwrap(entry) {
+  if (entry !== null && typeof entry === "object" && entry.type === "function" && typeof entry.function === "object" && entry.function !== null) {
+    return entry.function;
+  }
+  return entry;
+}
+function normalizeToolDefinitions(raw) {
+  if (!Array.isArray(raw)) {
+    throw new Error("tools must be a JSON array of tool definitions.");
+  }
+  return raw.map((entry, i) => {
+    const parsed = toolDefinitionSchema.safeParse(unwrap(entry));
+    if (!parsed.success) {
+      const why = parsed.error.issues.map((issue) => issue.message).join("; ");
+      throw new Error(`tools[${i}] is not a valid tool definition: ${why}`);
+    }
+    return parsed.data;
+  });
+}
 // src/junit.ts
 function esc(s) {
   return s.replace(/&/g, "&amp;").replace(/</g, "&lt;").replace(/>/g, "&gt;").replace(/"/g, "&quot;").replace(/'/g, "&apos;");
@@ -4380,19 +4448,24 @@ async function runCommand(opts) {
   const entries = await resolveEntries(opts);
   const git = resolveGit(opts);
   const gate = resolveGate(opts);
+  const reuse = resolveReuse(opts);
   const suites = [];
   const summaries = [];
   let worst = EXIT.pass;
   for (const entry of entries) {
     try {
-      const body = await buildRequest(entry, { samplingFile: opts.samplingFile, git, gate });
+      const body = await buildRequest(entry, { samplingFile: opts.samplingFile, git, gate, reuse });
       const accepted = await client.triggerRun(entry.id, body);
-      err(
-        `\u25B6 ${entry.id}: run ${accepted.runId} queued (prompt v${accepted.promptVersion}${accepted.created ? ", new version" : ", reused"})`
-      );
+      const promptNote = `prompt v${accepted.promptVersion}${accepted.created ? ", new version" : ""}`;
+      if (accepted.reused) {
+        err(`\u267B ${entry.id}: run ${accepted.runId} reused (${promptNote})`);
+        if (accepted.reusedFrom) err(`  \u21B3 a completed run from ${accepted.reusedFrom.finishedAt}`);
+      } else {
+        err(`\u25B6 ${entry.id}: run ${accepted.runId} queued (${promptNote})`);
+      }
       err(`  ${accepted.resultUrl}`);
       if (!opts.wait) {
-        summaries.push({ evalId: entry.id, runId: accepted.runId, resultUrl: accepted.resultUrl });
+        summaries.push(buildSummary(entry.id, { status: "queued", accepted }));
         continue;
       }
       const run = await pollUntilTerminal(client, accepted.runId, {
@@ -4406,12 +4479,12 @@ async function runCommand(opts) {
           opts.junitMode === "rows" ? rowsSuite(entry.id, await client.getResults(accepted.runId)) : gateSuite(entry.id, run)
         );
       }
-      summaries.push({ evalId: entry.id, runId: accepted.runId, ...run });
+      summaries.push(buildSummary(entry.id, { status: "waited", accepted, run }));
     } catch (e) {
       worst = EXIT.error;
       const message = e instanceof Error ? e.message : String(e);
       err(`\u2717 ${entry.id}: ${message}`);
-      summaries.push({ evalId: entry.id, error: message });
+      summaries.push(buildSummary(entry.id, { status: "error", message }));
     }
   }
   if (opts.junitFile && suites.length > 0) {
@@ -4432,6 +4505,7 @@ async function resolveEntries(opts) {
       id: opts.evalId,
       prompt: opts.promptFile,
       systemPrompt: opts.systemPromptFile,
+      tools: opts.toolsFile,
       model: opts.model
     }
   ];
@@ -4442,23 +4516,72 @@ async function buildRequest(entry, ctx) {
   }
   const template = entry.prompt ? await readFile2(entry.prompt, "utf8") : void 0;
   const systemPrompt = entry.systemPrompt ? await readFile2(entry.systemPrompt, "utf8") : void 0;
+  const tools = await resolveTools(entry.tools);
   let sampling = entry.sampling;
   if (!sampling && ctx.samplingFile) {
     sampling = JSON.parse(await readFile2(ctx.samplingFile, "utf8"));
   }
+  const prompt = template !== void 0 || systemPrompt !== void 0 || tools !== void 0 ? {
+    ...template !== void 0 ? { template } : {},
+    ...systemPrompt !== void 0 ? { systemPrompt } : {},
+    ...tools !== void 0 ? { tools } : {}
+  } : void 0;
   return {
     model: entry.model,
-    ...template !== void 0 ? { prompt: { template, ...systemPrompt !== void 0 ? { systemPrompt } : {} } } : {},
+    ...prompt ? { prompt } : {},
     ...sampling ? { samplingParams: sampling } : {},
     ...ctx.git ? { git: ctx.git } : {},
-    ...ctx.gate ? { gate: ctx.gate } : {}
+    ...ctx.gate ? { gate: ctx.gate } : {},
+    ...ctx.reuse ? { reuse: ctx.reuse } : {}
   };
 }
+function buildSummary(evalId, outcome) {
+  switch (outcome.status) {
+    case "queued":
+      return {
+        evalId,
+        runId: outcome.accepted.runId,
+        resultUrl: outcome.accepted.resultUrl,
+        ...reuseFields(outcome.accepted)
+      };
+    case "waited":
+      return {
+        evalId,
+        runId: outcome.accepted.runId,
+        ...outcome.run,
+        ...reuseFields(outcome.accepted)
+      };
+    case "error":
+      return { evalId, error: outcome.message };
+  }
+}
+function reuseFields(accepted) {
+  if (!accepted.reused) return {};
+  return { reused: true, ...accepted.reusedFrom ? { reusedFrom: accepted.reusedFrom } : {} };
+}
+async function resolveTools(tools) {
+  if (tools === void 0) return void 0;
+  const raw = typeof tools === "string" ? JSON.parse(await readFile2(tools, "utf8")) : tools;
+  return normalizeToolDefinitions(raw);
+}
 function resolveGit(opts) {
   const sha = opts.gitSha ?? process.env.GITHUB_SHA ?? process.env.CI_COMMIT_SHA;
   const ref = opts.gitRef ?? process.env.GITHUB_REF ?? process.env.CI_COMMIT_REF_NAME;
-  if (!sha && !ref) return void 0;
-  return { ...sha ? { sha } : {}, ...ref ? { ref } : {} };
+  const repoUrl = resolveRepoUrl(opts);
+  if (!sha && !ref && !repoUrl) return void 0;
+  return {
+    ...sha ? { sha } : {},
+    ...ref ? { ref } : {},
+    ...repoUrl ? { repoUrl } : {}
+  };
+}
+function resolveRepoUrl(opts) {
+  if (opts.gitRepoUrl) return opts.gitRepoUrl.replace(/\/+$/, "");
+  const server = process.env.GITHUB_SERVER_URL;
+  const repo = process.env.GITHUB_REPOSITORY;
+  if (server && repo) return `${server.replace(/\/+$/, "")}/${repo}`;
+  if (process.env.CI_PROJECT_URL) return process.env.CI_PROJECT_URL.replace(/\/+$/, "");
+  return void 0;
 }
 function resolveGate(opts) {
   const gate = {};
@@ -4467,6 +4590,14 @@ function resolveGate(opts) {
   if (opts.baselineRun !== void 0) gate.baselineRunId = opts.baselineRun;
   return Object.keys(gate).length > 0 ? gate : void 0;
 }
+function resolveReuse(opts) {
+  if (!opts.reuseIfUnchanged) return void 0;
+  return {
+    ifUnchanged: true,
+    ...opts.reuseMaxAgeHours !== void 0 ? { maxAgeHours: opts.reuseMaxAgeHours } : {},
+    ...opts.reuseSameRef ? { sameRef: true } : {}
+  };
+}
 function pct(n) {
   return n == null ? "?" : `${(n * 100).toFixed(1)}%`;
 }
@@ -4494,8 +4625,13 @@ Usage:
 Options:
   --eval <id>            Eval to run (or use --manifest for many).
-  --prompt <file>        Prompt template file to push (versioned by content).
-  --system-prompt <file> System-prompt file (optional).
+  --prompt <file>        User prompt (template) file to push (versioned by content).
+  --system-prompt <file> System-prompt file. Either prompt file is optional and
+                         independent: push one and the other is inherited from
+                         the current version (omit both to run it unchanged).
+  --tools <file>         Tool definitions to push: a JSON file (Verica-flat or
+                         OpenAI {type:function,\u2026} entries). Omit to inherit the
+                         current version's tools. Inline arrays: .verica.yml only.
   --model <model>        Model to sample under (overrides the manifest).
   --sampling <file>      JSON sampling params (temperature, maxTokens, \u2026).
   --manifest <file>      .verica.yml mapping prompts \u2192 eval IDs (multi-prompt).
@@ -4506,8 +4642,16 @@ Options:
   --threshold <0..1>     Override the gate's minimum pass rate.
   --baseline-ref <ref>   No-regression baseline = last run on this git ref.
   --baseline-run <id>    No-regression baseline = this specific run.
+  --reuse-if-unchanged   Reuse a recent completed run instead of executing again
+                         when the config (prompt + model + sampling + dataset +
+                         graders) is unchanged. Off by default. Incompatible with
+                         --threshold / --baseline-*.
+  --reuse-max-age <hrs>  Max age of a reusable run (default 24, max 720).
+  --reuse-same-ref       Only reuse a run on the same git ref (default: any ref).
   --git-sha <sha>        Commit SHA (else auto-detected from CI env).
   --git-ref <ref>        Git ref (else auto-detected from CI env).
+  --git-repo-url <url>   Repo web base for the SHA link in the run UI (e.g.
+                         https://github.com/acme/widgets). Auto-detected from CI env.
   --base-url <url>       Override the API base URL (dev/self-host only).
   --poll-interval <sec>  Initial poll interval (default 3).
   --timeout <sec>        Max wait (default 1800).
@@ -4531,6 +4675,7 @@ async function main() {
       eval: { type: "string" },
       prompt: { type: "string" },
       "system-prompt": { type: "string" },
+      tools: { type: "string" },
       model: { type: "string" },
       sampling: { type: "string" },
       manifest: { type: "string" },
@@ -4541,8 +4686,12 @@ async function main() {
       threshold: { type: "string" },
       "baseline-ref": { type: "string" },
       "baseline-run": { type: "string" },
+      "reuse-if-unchanged": { type: "boolean", default: false },
+      "reuse-max-age": { type: "string" },
+      "reuse-same-ref": { type: "boolean", default: false },
       "git-sha": { type: "string" },
       "git-ref": { type: "string" },
+      "git-repo-url": { type: "string" },
       "base-url": { type: "string" },
       "poll-interval": { type: "string" },
       timeout: { type: "string" },
@@ -4561,12 +4710,24 @@ async function main() {
   if (values.threshold !== void 0 && threshold === void 0) {
     throw new Error(`--threshold must be a number between 0 and 1 (got "${values.threshold}").`);
   }
+  const reuseMaxAge = finiteNumber(values["reuse-max-age"]);
+  if (values["reuse-max-age"] !== void 0 && (reuseMaxAge === void 0 || reuseMaxAge <= 0 || reuseMaxAge > 720)) {
+    throw new Error(
+      `--reuse-max-age must be a number of hours in (0, 720] (got "${values["reuse-max-age"]}").`
+    );
+  }
+  if (values["reuse-if-unchanged"] && (threshold !== void 0 || values["baseline-ref"] !== void 0 || values["baseline-run"] !== void 0)) {
+    throw new Error(
+      "--reuse-if-unchanged cannot be combined with --threshold / --baseline-ref / --baseline-run (a reused verdict was frozen under the prior gate)."
+    );
+  }
   const opts = {
     baseUrl,
     token,
     evalId: values.eval,
     promptFile: values.prompt,
     systemPromptFile: values["system-prompt"],
+    toolsFile: values.tools,
     samplingFile: values.sampling,
     model: values.model,
     manifestFile: values.manifest,
@@ -4577,8 +4738,12 @@ async function main() {
     threshold,
     baselineRef: values["baseline-ref"],
     baselineRun: values["baseline-run"],
+    reuseIfUnchanged: values["reuse-if-unchanged"] ?? false,
+    reuseMaxAgeHours: reuseMaxAge,
+    reuseSameRef: values["reuse-same-ref"] ?? false,
     gitSha: values["git-sha"],
     gitRef: values["git-ref"],
+    gitRepoUrl: values["git-repo-url"],
     pollIntervalMs: (finiteNumber(values["poll-interval"]) ?? 3) * 1e3,
     timeoutMs: (finiteNumber(values.timeout) ?? 1800) * 1e3
   };

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@verica-app/cli",
-  "version": "0.1.2",
+  "version": "0.1.4",
   "private": false,
   "description": "Run a Verica eval from CI and block the merge on the result.",
   "license": "MIT",
@@ -12,18 +12,15 @@
     "prompt",
     "testing"
   ],
-  "repository": {
-    "type": "git",
-    "url": "git+https://github.com/mtn-labs/evals.git",
-    "directory": "packages/cli"
-  },
-  "homepage": "https://github.com/mtn-labs/evals/tree/main/packages/cli#readme",
+  "homepage": "https://verica.app",
   "type": "module",
   "bin": {
     "verica": "./dist/cli.js"
   },
   "files": [
-    "dist"
+    "dist",
+    "README.md",
+    "CHANGELOG.md"
   ],
   "publishConfig": {
     "access": "public"