npm - @gianfrancopiana/openclaw-autoresearch - Versions diffs - 1.0.2 → 1.0.4 - Mend

@gianfrancopiana/openclaw-autoresearch 1.0.2 → 1.0.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (20) hide show

package/README.md CHANGED Viewed

@@ -2,7 +2,7 @@
 Autonomous experiment loop for any optimization target.
-Faithful OpenClaw port of [`davebcn87/pi-autoresearch`](https://github.com/davebcn87/pi-autoresearch).
+Faithful OpenClaw port of [`davebcn87/pi-autoresearch`](https://github.com/davebcn87/pi-autoresearch), including upstream statistical confidence scoring.
 ## How it works
@@ -13,19 +13,20 @@ Three tools drive the loop:
 | Tool | What it does |
 |---|---|
 | `init_experiment` | Configures the session: name, primary metric, unit, direction (lower/higher). Re-calling starts a new segment. |
-| `run_experiment` | Executes a shell command, times it, captures stdout/stderr, returns pass/fail via exit code. |
-| `log_experiment` | Records the result. `keep` auto-commits to git. `discard`/`crash` log without committing. Tracks secondary metrics alongside the primary. |
+| `run_experiment` | Executes a shell command, times it, captures stdout/stderr, parses `METRIC name=number` lines, and opens a pending experiment window that must be logged before another run can start. |
+| `log_experiment` | Records the pending run. `keep` auto-commits to git. `discard`/`crash` log without committing. If the prior `run_experiment` captured the primary metric, `log_experiment` can infer `commit` and `metric` automatically. After 3+ runs in a segment, it also reports a confidence score for the best improvement versus noise. |
 Each tool also accepts an optional `cwd` so callers can target a nested repo explicitly instead of relying on the current session working directory.
-All state lives in four repo-root files:
+All state lives in five repo-root files:
 | File | Purpose |
 |---|---|
-| `autoresearch.md` | Session doc: objective, metrics, files in scope, constraints, what's been tried. A fresh agent reads this to resume. |
+| `autoresearch.md` | Session doc. The plugin keeps the Metrics, How to Run, What's Been Tried, and Plugin Checkpoint sections synchronized so resumes are less agent-dependent. |
 | `autoresearch.sh` | Benchmark script. Outputs `METRIC name=number` lines. |
 | `autoresearch.jsonl` | Structured log: config headers + experiment entries (metric, status, timestamp, segment, commit hash). |
 | `autoresearch.ideas.md` | Backlog of promising ideas not yet tried. Optional. |
+| `autoresearch.checkpoint.json` | Plugin-managed checkpoint: latest logged state, recent runs, and any pending unlogged run. |
 The design is file-first: any agent can pick up the repo-root files and continue the loop without prior context.
@@ -72,6 +73,15 @@ Verify:
 Prefer the explicit `/autoresearch` command surface in OpenClaw. The auto-generated native skill alias `/autoresearch_create` may not trigger reliably on some hosts, so use `/skill autoresearch-create` if you need to invoke the skill directly.
+## Workflow Guarantees
+- `run_experiment` refuses to start a second run until the previous one is logged.
+- `run_experiment` parses `METRIC name=number` lines and stores a pending run so `log_experiment` can default from the actual benchmark output.
+- During active autoresearch mode, raw benchmark execution through OpenClaw `exec`/`bash` is blocked. Use `run_experiment` instead.
+- `autoresearch_status` warns when a pending run is unlogged or git history has moved ahead of the last logged experiment.
+- After 3+ positive-metric runs in a segment, `log_experiment`, `autoresearch_status`, and the synced session doc report a MAD-based confidence score so the agent can distinguish likely wins from noise.
+- The plugin updates `autoresearch.checkpoint.json` and refreshes plugin-managed sections in `autoresearch.md` after init, run, and log transitions.
 ## Use
 In the repo you want to optimize:
@@ -80,7 +90,7 @@ In the repo you want to optimize:
 2. Run `/autoresearch` or `/autoresearch setup <goal>`.
 3. Send a normal message with the goal, command, metric (+ direction), files in scope, and constraints.
 4. If you need the raw skill invocation, use `/skill autoresearch-create`.
-5. The agent writes `autoresearch.md` and `autoresearch.sh`, runs a baseline, then starts looping.
+5. The agent writes `autoresearch.md` and `autoresearch.sh`, runs a baseline with `run_experiment`, then records it with `log_experiment`.
 6. Use `/autoresearch` or `/autoresearch status` to re-prime context on a later turn.
 To resume an existing session, a new agent reads the repo-root files and continues from where the last one stopped.
@@ -99,6 +109,7 @@ This port preserves upstream semantics, names, and file contracts while adapting
 - upstream repo: `https://github.com/davebcn87/pi-autoresearch`
 - pinned upstream commit: `2227029fa5712944a36938b5fe59f709cb30ed22` (`2227029f`)
+- later upstream parity cherry-pick: confidence scoring from `cf1bbf03debca8f3fb2cca2c3e799b9e23320f87` (`cf1bbf0`, March 19, 2026)
 ## Validation

package/docs/non-parity.md CHANGED Viewed

@@ -6,6 +6,7 @@ Pinned upstream reference:
 - Repo: `https://github.com/davebcn87/pi-autoresearch`
 - Commit: `2227029fa5712944a36938b5fe59f709cb30ed22` (`2227029f`)
+- Later upstream semantics also ported: confidence scoring from `cf1bbf03debca8f3fb2cca2c3e799b9e23320f87` (`cf1bbf0`)
 ## Principle

package/extensions/openclaw-autoresearch/src/checkpoint.ts ADDED Viewed

@@ -0,0 +1,82 @@
+import * as fs from "node:fs";
+import { AUTORESEARCH_ROOT_FILES, getAutoresearchRootFilePath } from "./files.js";
+import type {
+  AutoresearchRunSnapshot,
+  AutoresearchStateSnapshot,
+} from "./state.js";
+import type { PendingExperimentRun } from "./runtime-state.js";
+export type AutoresearchCheckpoint = {
+  readonly version: 1;
+  readonly updatedAt: number;
+  readonly sessionStartCommit: string | null;
+  readonly session: {
+    readonly name: string | null;
+    readonly metricName: string;
+    readonly metricUnit: string;
+    readonly bestDirection: "lower" | "higher";
+    readonly currentSegment: number;
+    readonly currentRunCount: number;
+    readonly totalRunCount: number;
+    readonly currentBaselineMetric: number | null;
+    readonly currentBestMetric: number | null;
+    readonly confidence: number | null;
+  };
+  readonly lastLoggedRun: AutoresearchRunSnapshot | null;
+  readonly recentLoggedRuns: readonly AutoresearchRunSnapshot[];
+  readonly pendingRun: PendingExperimentRun | null;
+};
+export function readAutoresearchCheckpoint(cwd: string): AutoresearchCheckpoint | null {
+  const checkpointPath = getAutoresearchRootFilePath(cwd, "checkpoint");
+  if (!fs.existsSync(checkpointPath)) {
+    return null;
+  }
+  try {
+    const parsed = JSON.parse(fs.readFileSync(checkpointPath, "utf8")) as AutoresearchCheckpoint;
+    return parsed?.version === 1 ? parsed : null;
+  } catch {
+    return null;
+  }
+}
+export function writeAutoresearchCheckpoint(options: {
+  cwd: string;
+  state: AutoresearchStateSnapshot;
+  sessionStartCommit: string | null;
+  recentLoggedRuns: readonly AutoresearchRunSnapshot[];
+  pendingRun: PendingExperimentRun | null;
+}): AutoresearchCheckpoint {
+  const checkpoint: AutoresearchCheckpoint = {
+    version: 1,
+    updatedAt: Date.now(),
+    sessionStartCommit: options.sessionStartCommit,
+    session: {
+      name: options.state.name,
+      metricName: options.state.metricName,
+      metricUnit: options.state.metricUnit,
+      bestDirection: options.state.bestDirection,
+      currentSegment: options.state.currentSegment,
+      currentRunCount: options.state.currentRunCount,
+      totalRunCount: options.state.totalRunCount,
+      currentBaselineMetric: options.state.currentBaselineMetric,
+      currentBestMetric: options.state.currentBestMetric,
+      confidence: options.state.confidence,
+    },
+    lastLoggedRun: options.state.lastRun,
+    recentLoggedRuns: options.recentLoggedRuns,
+    pendingRun: options.pendingRun,
+  };
+  const checkpointPath = getAutoresearchRootFilePath(options.cwd, "checkpoint");
+  fs.writeFileSync(checkpointPath, `${JSON.stringify(checkpoint, null, 2)}\n`);
+  return checkpoint;
+}
+export function deleteAutoresearchCheckpoint(cwd: string): void {
+  const checkpointPath = getAutoresearchRootFilePath(cwd, "checkpoint");
+  if (fs.existsSync(checkpointPath)) {
+    fs.unlinkSync(checkpointPath);
+  }
+}

package/extensions/openclaw-autoresearch/src/confidence.ts ADDED Viewed

@@ -0,0 +1,82 @@
+export type ConfidenceRun = {
+  readonly metric: number;
+  readonly status: string;
+};
+export function computeConfidence(
+  runs: readonly ConfidenceRun[],
+  direction: "lower" | "higher",
+): number | null {
+  const usableRuns = runs.filter((run) => Number.isFinite(run.metric) && run.metric > 0);
+  if (usableRuns.length < 3) {
+    return null;
+  }
+  const baseline = runs.find((run) => Number.isFinite(run.metric));
+  if (!baseline) {
+    return null;
+  }
+  const values = usableRuns.map((run) => run.metric);
+  const median = sortedMedian(values);
+  const deviations = values.map((value) => Math.abs(value - median));
+  const mad = sortedMedian(deviations);
+  if (mad === 0) {
+    return null;
+  }
+  let bestKept: number | null = null;
+  for (const run of usableRuns) {
+    if (run.status !== "keep") {
+      continue;
+    }
+    if (bestKept === null || isBetter(run.metric, bestKept, direction)) {
+      bestKept = run.metric;
+    }
+  }
+  if (bestKept === null || bestKept === baseline.metric) {
+    return null;
+  }
+  return Math.abs(bestKept - baseline.metric) / mad;
+}
+export function formatConfidenceLine(
+  confidence: number | null,
+  label = "Confidence",
+): string {
+  return confidence === null ? `${label}: n/a` : `${label}: ${describeConfidence(confidence)}`;
+}
+export function describeConfidence(confidence: number): string {
+  const rendered = confidence.toFixed(1);
+  if (confidence >= 2.0) {
+    return `${rendered}x noise floor - improvement is likely real`;
+  }
+  if (confidence >= 1.0) {
+    return `${rendered}x noise floor - improvement is above noise but marginal`;
+  }
+  return `${rendered}x noise floor - improvement is within noise. Consider re-running to confirm before keeping`;
+}
+function sortedMedian(values: readonly number[]): number {
+  if (values.length === 0) {
+    return 0;
+  }
+  const sorted = [...values].sort((left, right) => left - right);
+  const mid = Math.floor(sorted.length / 2);
+  return sorted.length % 2 === 0
+    ? (sorted[mid - 1] + sorted[mid]) / 2
+    : sorted[mid];
+}
+function isBetter(
+  current: number,
+  best: number,
+  direction: "lower" | "higher",
+): boolean {
+  return direction === "lower" ? current < best : current > best;
+}

package/extensions/openclaw-autoresearch/src/files.ts CHANGED Viewed

@@ -5,6 +5,7 @@ export const AUTORESEARCH_ROOT_FILES = {
   runnerScript: "autoresearch.sh",
   resultsLog: "autoresearch.jsonl",
   ideasBacklog: "autoresearch.ideas.md",
+  checkpoint: "autoresearch.checkpoint.json",
 } as const;
 export type AutoresearchRootFileKey = keyof typeof AUTORESEARCH_ROOT_FILES;

package/extensions/openclaw-autoresearch/src/git.ts CHANGED Viewed

@@ -19,6 +19,11 @@ export type GitKeepResult = {
   readonly command: GitCommandResult;
 };
+export type GitRuntimeOptions = {
+  runCommandWithTimeout: RunCommandWithTimeout;
+  cwd: string;
+};
 async function runGitCommand(
   runCommandWithTimeout: RunCommandWithTimeout,
   cwd: string,
@@ -40,6 +45,31 @@ async function runGitCommand(
   };
 }
+export async function readShortHeadCommit(options: GitRuntimeOptions): Promise<string | null> {
+  const result = await runGitCommand(options.runCommandWithTimeout, options.cwd, [
+    "rev-parse",
+    "--short=7",
+    "HEAD",
+  ]);
+  return result.code === 0 && result.stdout.trim().length > 0 ? result.stdout.trim() : null;
+}
+export async function countCommitsSince(
+  options: GitRuntimeOptions & { sinceCommit: string },
+): Promise<number | null> {
+  const result = await runGitCommand(options.runCommandWithTimeout, options.cwd, [
+    "rev-list",
+    "--count",
+    `${options.sinceCommit}..HEAD`,
+  ]);
+  if (result.code !== 0) {
+    return null;
+  }
+  const count = Number.parseInt(result.stdout.trim(), 10);
+  return Number.isFinite(count) ? count : null;
+}
 export async function commitKeptExperiment(options: {
   runCommandWithTimeout: RunCommandWithTimeout;
   cwd: string;

package/extensions/openclaw-autoresearch/src/hooks.ts CHANGED Viewed

@@ -1,6 +1,7 @@
 import type { OpenClawPluginApi } from "openclaw/plugin-sdk";
 import { AUTORESEARCH_ROOT_FILES } from "./files.js";
 import { reconstructStateFromJsonl } from "./state.js";
+import { formatConfidenceLine } from "./confidence.js";
 import {
   clearAutoresearchRuntimeState,
   consumeAutoresearchContinuationReminder,
@@ -64,6 +65,30 @@ export function registerAutoresearchHooks(api: OpenClawPluginApi): void {
       queueAutoresearchSteer(cwd, messageText);
     });
+    hookApi.on("before_tool_call", (event, ctx) => {
+      const cwd = resolveHookCwd(api, ctx);
+      if (cwd === null || !shouldEnforceAutoresearchMode(cwd)) {
+        return;
+      }
+      const record = event as Record<string, unknown>;
+      const toolName = typeof record.toolName === "string" ? record.toolName : "";
+      if (toolName !== "exec" && toolName !== "bash") {
+        return;
+      }
+      const command = extractToolCommand(record.params);
+      if (!command || !looksLikeExperimentCommand(command)) {
+        return;
+      }
+      return {
+        block: true,
+        blockReason:
+          "Autoresearch mode blocks raw benchmark execution through exec/bash. Use run_experiment so the result is captured and log_experiment can enforce the experiment lifecycle.",
+      };
+    });
     hookApi.on("agent_end", (_event, ctx) => {
       const cwd = resolveHookCwd(api, ctx);
       if (cwd === null) {
@@ -112,11 +137,7 @@ export function registerAutoresearchHooks(api: OpenClawPluginApi): void {
 export function buildBeforePromptBuildContext(cwd: string): string | null {
   const state = reconstructStateFromJsonl(cwd);
   const runtimeState = getAutoresearchRuntimeState(cwd);
-  const modeEnabled =
-    runtimeState.mode === "on" ||
-    (runtimeState.mode !== "off" && (state.mode === "active" || state.hasSessionDoc));
-  if (!modeEnabled) {
+  if (!shouldEnforceAutoresearchMode(cwd, state, runtimeState)) {
     return null;
   }
@@ -144,7 +165,14 @@ export function buildBeforePromptBuildContext(cwd: string): string | null {
       `Read ${AUTORESEARCH_ROOT_FILES.sessionDoc} before resuming or changing the experiment loop, and re-read it after compaction.`,
       "Resume the autonomous upstream loop: edit, run_experiment, log_experiment, keep/discard/crash, repeat.",
       "Use init_experiment, run_experiment, and log_experiment for experiment state changes. Never stop unless the user explicitly interrupts the loop.",
+      "Never run benchmark or test commands through raw exec/bash during autoresearch mode. Use run_experiment so the plugin can capture metrics, enforce logging, and preserve resumable state.",
+      "After every run_experiment, call log_experiment before starting another run. If METRIC lines were captured, log_experiment can infer commit and metric from the pending run.",
     );
+    if (state.confidence !== null) {
+      lines.push(
+        `${formatConfidenceLine(state.confidence, "Current confidence")}. Treat low-confidence wins as provisional and re-run before keeping when the score is below 1.0x.`,
+      );
+    }
     if (pendingCommand?.args) {
       lines.push(`Additional resume instruction from /autoresearch: ${pendingCommand.args}`);
     }
@@ -232,3 +260,49 @@ function firstString(...values: unknown[]): string | null {
 function isCommandLikeMessage(text: string): boolean {
   return /^[\/!]/.test(text.trim());
 }
+function shouldEnforceAutoresearchMode(
+  cwd: string,
+  state = reconstructStateFromJsonl(cwd),
+  runtimeState = getAutoresearchRuntimeState(cwd),
+): boolean {
+  return (
+    runtimeState.mode === "on" ||
+    runtimeState.runInFlight ||
+    runtimeState.pendingRun !== null ||
+    (runtimeState.mode !== "off" && (state.mode === "active" || state.hasSessionDoc))
+  );
+}
+function extractToolCommand(params: unknown): string | null {
+  if (!params || typeof params !== "object") {
+    return null;
+  }
+  const record = params as Record<string, unknown>;
+  for (const key of ["command", "cmd", "args"]) {
+    const value = record[key];
+    if (typeof value === "string" && value.trim().length > 0) {
+      return value.trim();
+    }
+  }
+  return null;
+}
+function looksLikeExperimentCommand(command: string): boolean {
+  const normalized = command.trim();
+  if (!normalized) {
+    return false;
+  }
+  const readOnlyPatterns = [
+    /^(pwd|ls|find|rg|grep|sed|cat|head|tail|wc|stat)\b/,
+    /^git\s+(status|diff|show|log|rev-parse|branch|remote)\b/,
+  ];
+  if (readOnlyPatterns.some((pattern) => pattern.test(normalized))) {
+    return false;
+  }
+  return true;
+}

package/extensions/openclaw-autoresearch/src/logging.ts CHANGED Viewed

@@ -47,6 +47,7 @@ export type AutoresearchResultEntry = {
   readonly description: string;
   readonly timestamp: number;
   readonly segment: number;
+  readonly confidence: number | null;
 };
 export function appendResultEntry(cwd: string, entry: AutoresearchResultEntry): void {

package/extensions/openclaw-autoresearch/src/metrics.ts ADDED Viewed

@@ -0,0 +1,24 @@
+const METRIC_LINE_RE =
+  /^METRIC\s+([A-Za-z0-9_.\-µ]+)\s*=\s*(-?(?:\d+\.?\d*|\.\d+)(?:[eE][+-]?\d+)?)\s*$/;
+export function parseMetricLines(output: string): Record<string, number> {
+  const metrics = new Map<string, number>();
+  for (const rawLine of output.split(/\r?\n/)) {
+    const line = rawLine.trim();
+    const match = METRIC_LINE_RE.exec(line);
+    if (!match) {
+      continue;
+    }
+    const [, name, valueText] = match;
+    const value = Number(valueText);
+    if (!name || !Number.isFinite(value)) {
+      continue;
+    }
+    metrics.set(name, value);
+  }
+  return Object.fromEntries(metrics.entries());
+}

package/extensions/openclaw-autoresearch/src/runtime-state.ts CHANGED Viewed

@@ -7,12 +7,26 @@ export type PendingAutoresearchCommand =
     }
   | null;
+export type PendingExperimentRun = {
+  readonly command: string;
+  readonly commit: string | null;
+  readonly primaryMetric: number | null;
+  readonly metrics: Record<string, number>;
+  readonly durationSeconds: number;
+  readonly exitCode: number | null;
+  readonly passed: boolean;
+  readonly timedOut: boolean;
+  readonly tailOutput: string;
+  readonly capturedAt: number;
+};
 export type AutoresearchRuntimeSnapshot = {
   readonly mode: AutoresearchRuntimeMode;
   readonly runInFlight: boolean;
   readonly queuedSteers: readonly string[];
   readonly needsContinuationReminder: boolean;
   readonly pendingCommand: PendingAutoresearchCommand;
+  readonly pendingRun: PendingExperimentRun | null;
 };
 type MutableAutoresearchRuntimeState = {
@@ -21,6 +35,7 @@ type MutableAutoresearchRuntimeState = {
   queuedSteers: string[];
   needsContinuationReminder: boolean;
   pendingCommand: PendingAutoresearchCommand;
+  pendingRun: PendingExperimentRun | null;
 };
 const MAX_QUEUED_STEERS = 20;
@@ -33,6 +48,7 @@ function createDefaultRuntimeState(): MutableAutoresearchRuntimeState {
     queuedSteers: [],
     needsContinuationReminder: false,
     pendingCommand: null,
+    pendingRun: null,
   };
 }
@@ -53,6 +69,7 @@ export function getAutoresearchRuntimeState(cwd: string): AutoresearchRuntimeSna
     queuedSteers: [...state.queuedSteers],
     needsContinuationReminder: state.needsContinuationReminder,
     pendingCommand: state.pendingCommand,
+    pendingRun: state.pendingRun,
   };
 }
@@ -138,6 +155,26 @@ export function consumeAutoresearchContinuationReminder(cwd: string): boolean {
   return needsReminder;
 }
+export function setAutoresearchPendingRun(
+  cwd: string,
+  pendingRun: PendingExperimentRun | null,
+): AutoresearchRuntimeSnapshot {
+  const state = getMutableRuntimeState(cwd);
+  state.pendingRun = pendingRun;
+  return getAutoresearchRuntimeState(cwd);
+}
+export function getAutoresearchPendingRun(cwd: string): PendingExperimentRun | null {
+  return getMutableRuntimeState(cwd).pendingRun;
+}
+export function consumeAutoresearchPendingRun(cwd: string): PendingExperimentRun | null {
+  const state = getMutableRuntimeState(cwd);
+  const pendingRun = state.pendingRun;
+  state.pendingRun = null;
+  return pendingRun;
+}
 export function clearAutoresearchRuntimeState(cwd: string): void {
   runtimeStates.delete(cwd);
 }

package/extensions/openclaw-autoresearch/src/session-doc.ts ADDED Viewed

@@ -0,0 +1,104 @@
+import * as fs from "node:fs";
+import { AUTORESEARCH_ROOT_FILES, getAutoresearchRootFilePath } from "./files.js";
+import type { AutoresearchCheckpoint } from "./checkpoint.js";
+import { formatConfidenceLine } from "./confidence.js";
+export function syncAutoresearchSessionDoc(
+  cwd: string,
+  checkpoint: AutoresearchCheckpoint,
+): void {
+  const sessionDocPath = getAutoresearchRootFilePath(cwd, "sessionDoc");
+  const existing = fs.existsSync(sessionDocPath) ? fs.readFileSync(sessionDocPath, "utf8") : "";
+  let doc = ensureTitle(existing, checkpoint.session.name);
+  doc = upsertSection(
+    doc,
+    "## Metrics",
+    [
+      `- **Primary**: ${checkpoint.session.metricName} (${checkpoint.session.metricUnit || "unitless"}, ${checkpoint.session.bestDirection} is better)`,
+    ].join("\n"),
+  );
+  doc = upsertSection(
+    doc,
+    "## How to Run",
+    `\`${AUTORESEARCH_ROOT_FILES.runnerScript}\` — should emit \`METRIC name=number\` lines for ${checkpoint.session.metricName}.`,
+  );
+  doc = upsertSection(doc, "## What's Been Tried", buildTriedSection(checkpoint));
+  doc = upsertSection(doc, "## Plugin Checkpoint", buildCheckpointSection(checkpoint));
+  fs.writeFileSync(sessionDocPath, `${doc.trimEnd()}\n`);
+}
+function ensureTitle(doc: string, sessionName: string | null): string {
+  const trimmed = doc.trim();
+  if (!trimmed) {
+    return `# Autoresearch: ${sessionName ?? "Session"}\n`;
+  }
+  if (/^#\s+/m.test(trimmed)) {
+    return trimmed;
+  }
+  return `# Autoresearch: ${sessionName ?? "Session"}\n\n${trimmed}`;
+}
+function upsertSection(doc: string, heading: string, body: string): string {
+  const escapedHeading = heading.replace(/[.*+?^${}()|[\]\\]/g, "\\$&");
+  const sectionRe = new RegExp(`(^${escapedHeading}\\n)([\\s\\S]*?)(?=^##\\s|\\Z)`, "m");
+  const rendered = `${heading}\n${body.trim()}\n\n`;
+  if (sectionRe.test(doc)) {
+    return doc.replace(sectionRe, rendered);
+  }
+  return `${doc.trimEnd()}\n\n${rendered}`;
+}
+function buildTriedSection(checkpoint: AutoresearchCheckpoint): string {
+  if (checkpoint.recentLoggedRuns.length === 0) {
+    return "- No logged experiments yet.";
+  }
+  return checkpoint.recentLoggedRuns
+    .map((run) => {
+      const metricUnit = checkpoint.session.metricUnit;
+      const renderedMetric =
+        metricUnit && metricUnit.length > 0 ? `${run.metric}${metricUnit}` : `${run.metric}`;
+      return `- #${run.run} ${run.status} ${renderedMetric} ${run.commit} — ${run.description}`;
+    })
+    .join("\n");
+}
+function buildCheckpointSection(checkpoint: AutoresearchCheckpoint): string {
+  const lines = [
+    `- Last updated: ${new Date(checkpoint.updatedAt).toISOString()}`,
+    `- Runs tracked: ${checkpoint.session.currentRunCount} current / ${checkpoint.session.totalRunCount} total`,
+    `- Baseline: ${formatMetric(checkpoint.session.currentBaselineMetric, checkpoint.session.metricUnit)}`,
+    `- Best kept: ${formatMetric(checkpoint.session.currentBestMetric, checkpoint.session.metricUnit)}`,
+    `- ${formatConfidenceLine(checkpoint.session.confidence)}`,
+  ];
+  if (checkpoint.lastLoggedRun) {
+    lines.push(
+      `- Last logged run: #${checkpoint.lastLoggedRun.run} ${checkpoint.lastLoggedRun.status} ${checkpoint.lastLoggedRun.commit} — ${checkpoint.lastLoggedRun.description}`,
+    );
+  }
+  if (checkpoint.pendingRun) {
+    lines.push(
+      `- Pending run awaiting log_experiment: ${checkpoint.pendingRun.command} (${formatMetric(checkpoint.pendingRun.primaryMetric, checkpoint.session.metricUnit)})`,
+    );
+  }
+  return lines.join("\n");
+}
+function formatMetric(value: number | null, unit: string): string {
+  if (value === null) {
+    return "n/a";
+  }
+  return `${value}${unit}`;
+}