npm - @sndrgrdn/opencode-autoresearch - Versions diffs - 0.1.0 - Mend

@sndrgrdn/opencode-autoresearch 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (5) hide show

package/README.md +164 -0
package/dist/index.d.ts +154 -0
package/dist/index.js +787 -0
package/package.json +40 -0
package/skills/autoresearch/SKILL.md +110 -0

package/README.md ADDED Viewed

@@ -0,0 +1,164 @@
+# Autoresearch Plugin for OpenCode
+An OpenCode plugin that implements an autonomous keep/discard experiment loop for optimizing code through iterative experimentation.
+## Features
+- **Experiment Tracking**: Record and track experiments with metrics in JSONL format
+- **Git Integration**: Keep or discard experiments with proper git safety
+- **Metric Parsing**: Automatically parse `METRIC name=value` output lines
+- **Markdown Documentation**: Auto-generated human-readable experiment logs
+- **Checks Support**: Optional validation via `autoresearch.checks.sh`
+- **AI Skill**: Included skill provides guided workflows and best practices
+## Installation
+### Local Development
+Add to your OpenCode configuration:
+```json
+{
+  "plugin": ["file:///path/to/oc-autoresearch"]
+}
+```
+### From npm (when published)
+```json
+{
+  "plugin": ["@sndrgrdn/opencode-autoresearch"]
+}
+```
+### Using the Skill
+Copy the skill to your OpenCode skills directory:
+```bash
+mkdir -p ~/.config/opencode/skills/autoresearch
+cp node_modules/@sndrgrdn/opencode-autoresearch/skills/autoresearch/SKILL.md ~/.config/opencode/skills/autoresearch/
+```
+The skill provides guided workflows and best practices for the autoresearch loop.
+## Tools
+### init_experiment
+Initialize a new autoresearch experiment session.
+```json
+{
+  "name": "optimize-parser",
+  "metric_name": "parse_time",
+  "metric_unit": "ms",
+  "direction": "lower",
+  "command": "node benchmark.js",
+  "branch": "experiment/parser-opt",
+  "files_in_scope": ["src/parser.js", "src/lexer.js"]
+}
+```
+### run_experiment
+Execute the experiment command and capture metrics.
+```json
+{
+  "timeout_seconds": 600,
+  "checks_timeout_seconds": 300
+}
+```
+### log_experiment
+Log the experiment result with decision.
+```json
+{
+  "run_id": "uuid-from-run",
+  "commit": "abc123",
+  "metric": 45.2,
+  "status": "keep",
+  "description": "Refactored parse loop to use iterator",
+  "metrics": {
+    "memory_mb": 128
+  }
+}
+```
+### keep_experiment
+Commit the current experiment changes.
+```json
+{
+  "commit_message": "perf(parser): optimize parse loop using iterator"
+}
+```
+### discard_experiment
+Discard uncommitted changes (requires confirmation).
+```json
+{
+  "confirmation": "DISCARD"
+}
+```
+### autoresearch_status
+Get current session status including metrics and counts.
+## Workflow
+1. **Initialize**: `init_experiment` creates `autoresearch.jsonl` and `autoresearch.md`
+2. **Baseline**: `run_experiment` to establish baseline metrics
+3. **Log**: `log_experiment` to record the baseline
+4. **Iterate**:
+   - Edit code
+   - `run_experiment` to measure
+   - `log_experiment` to record decision
+   - `keep_experiment` or `discard_experiment`
+5. **Status**: `autoresearch_status` to review progress
+## Metric Format
+Experiments should output metrics in this format:
+```
+METRIC parse_time=45.2
+METRIC memory_mb=128
+METRIC throughput=1000
+```
+## Files Created
+- `autoresearch.jsonl` - Append-only event log
+- `autoresearch.md` - Human-readable experiment notes
+- `autoresearch.checks.sh` - Optional validation script
+## Development
+```bash
+# Install dependencies
+bun install
+# Type check
+bun run typecheck
+# Build
+bun run build
+# Smoke test
+bun run smoke
+# Watch mode
+bun run dev
+```
+## License
+MIT

package/dist/index.d.ts ADDED Viewed

@@ -0,0 +1,154 @@
+import type { Plugin } from "@opencode-ai/plugin";
+import { z as zod } from "zod";
+export declare const DirectionSchema: zod.ZodEnum<{
+    lower: "lower";
+    higher: "higher";
+}>;
+export type Direction = zod.infer<typeof DirectionSchema>;
+export declare const ExperimentStatusSchema: zod.ZodEnum<{
+    keep: "keep";
+    discard: "discard";
+    crash: "crash";
+    checks_failed: "checks_failed";
+}>;
+export type ExperimentStatus = zod.infer<typeof ExperimentStatusSchema>;
+export declare const RunStatusSchema: zod.ZodEnum<{
+    crash: "crash";
+    checks_failed: "checks_failed";
+    ok: "ok";
+    timeout: "timeout";
+}>;
+export type RunStatus = zod.infer<typeof RunStatusSchema>;
+export declare const ConfigEventSchema: zod.ZodObject<{
+    type: zod.ZodLiteral<"config">;
+    timestamp: zod.ZodISODateTime;
+    name: zod.ZodString;
+    metric_name: zod.ZodString;
+    metric_unit: zod.ZodOptional<zod.ZodString>;
+    direction: zod.ZodEnum<{
+        lower: "lower";
+        higher: "higher";
+    }>;
+    command: zod.ZodOptional<zod.ZodString>;
+    checks_command: zod.ZodOptional<zod.ZodString>;
+    branch: zod.ZodOptional<zod.ZodString>;
+    files_in_scope: zod.ZodOptional<zod.ZodArray<zod.ZodString>>;
+    segment: zod.ZodNumber;
+}, zod.core.$strip>;
+export type ConfigEvent = zod.infer<typeof ConfigEventSchema>;
+export declare const MetricsSchema: zod.ZodRecord<zod.ZodString, zod.ZodNumber>;
+export type Metrics = Record<string, number>;
+export declare const RunEventSchema: zod.ZodObject<{
+    type: zod.ZodLiteral<"run">;
+    timestamp: zod.ZodISODateTime;
+    run_id: zod.ZodString;
+    segment: zod.ZodNumber;
+    command: zod.ZodString;
+    status: zod.ZodEnum<{
+        crash: "crash";
+        checks_failed: "checks_failed";
+        ok: "ok";
+        timeout: "timeout";
+    }>;
+    duration_seconds: zod.ZodNumber;
+    timed_out: zod.ZodOptional<zod.ZodBoolean>;
+    exit_code: zod.ZodOptional<zod.ZodNumber>;
+    metrics: zod.ZodRecord<zod.ZodString, zod.ZodNumber>;
+    checks_pass: zod.ZodOptional<zod.ZodBoolean>;
+    checks_duration_seconds: zod.ZodOptional<zod.ZodNumber>;
+    log_tail: zod.ZodOptional<zod.ZodString>;
+}, zod.core.$strip>;
+export type RunEvent = zod.infer<typeof RunEventSchema>;
+export declare const ExperimentEventSchema: zod.ZodObject<{
+    type: zod.ZodLiteral<"experiment">;
+    timestamp: zod.ZodISODateTime;
+    run_id: zod.ZodString;
+    segment: zod.ZodNumber;
+    commit_before: zod.ZodString;
+    commit_after: zod.ZodOptional<zod.ZodString>;
+    metric: zod.ZodNumber;
+    metrics: zod.ZodOptional<zod.ZodRecord<zod.ZodString, zod.ZodNumber>>;
+    status: zod.ZodEnum<{
+        keep: "keep";
+        discard: "discard";
+        crash: "crash";
+        checks_failed: "checks_failed";
+    }>;
+    description: zod.ZodString;
+}, zod.core.$strip>;
+export type ExperimentEvent = zod.infer<typeof ExperimentEventSchema>;
+export declare const EventSchema: zod.ZodUnion<readonly [zod.ZodObject<{
+    type: zod.ZodLiteral<"config">;
+    timestamp: zod.ZodISODateTime;
+    name: zod.ZodString;
+    metric_name: zod.ZodString;
+    metric_unit: zod.ZodOptional<zod.ZodString>;
+    direction: zod.ZodEnum<{
+        lower: "lower";
+        higher: "higher";
+    }>;
+    command: zod.ZodOptional<zod.ZodString>;
+    checks_command: zod.ZodOptional<zod.ZodString>;
+    branch: zod.ZodOptional<zod.ZodString>;
+    files_in_scope: zod.ZodOptional<zod.ZodArray<zod.ZodString>>;
+    segment: zod.ZodNumber;
+}, zod.core.$strip>, zod.ZodObject<{
+    type: zod.ZodLiteral<"run">;
+    timestamp: zod.ZodISODateTime;
+    run_id: zod.ZodString;
+    segment: zod.ZodNumber;
+    command: zod.ZodString;
+    status: zod.ZodEnum<{
+        crash: "crash";
+        checks_failed: "checks_failed";
+        ok: "ok";
+        timeout: "timeout";
+    }>;
+    duration_seconds: zod.ZodNumber;
+    timed_out: zod.ZodOptional<zod.ZodBoolean>;
+    exit_code: zod.ZodOptional<zod.ZodNumber>;
+    metrics: zod.ZodRecord<zod.ZodString, zod.ZodNumber>;
+    checks_pass: zod.ZodOptional<zod.ZodBoolean>;
+    checks_duration_seconds: zod.ZodOptional<zod.ZodNumber>;
+    log_tail: zod.ZodOptional<zod.ZodString>;
+}, zod.core.$strip>, zod.ZodObject<{
+    type: zod.ZodLiteral<"experiment">;
+    timestamp: zod.ZodISODateTime;
+    run_id: zod.ZodString;
+    segment: zod.ZodNumber;
+    commit_before: zod.ZodString;
+    commit_after: zod.ZodOptional<zod.ZodString>;
+    metric: zod.ZodNumber;
+    metrics: zod.ZodOptional<zod.ZodRecord<zod.ZodString, zod.ZodNumber>>;
+    status: zod.ZodEnum<{
+        keep: "keep";
+        discard: "discard";
+        crash: "crash";
+        checks_failed: "checks_failed";
+    }>;
+    description: zod.ZodString;
+}, zod.core.$strip>]>;
+export type Event = zod.infer<typeof EventSchema>;
+export interface ExperimentSession {
+    segment: number;
+    config?: ConfigEvent;
+    runs: RunEvent[];
+    experiments: ExperimentEvent[];
+}
+export interface StatusSummary {
+    segment: number;
+    total_runs: number;
+    keep_count: number;
+    discard_count: number;
+    crash_count: number;
+    checks_failed_count: number;
+    baseline_metric: number | null;
+    best_metric: number | null;
+    best_run_id: string | null;
+    current_branch: string | null;
+    git_dirty: boolean;
+    direction: Direction | null;
+    metric_name: string | null;
+}
+export declare const AutoresearchPlugin: Plugin;
+export default AutoresearchPlugin;

package/dist/index.js ADDED Viewed

@@ -0,0 +1,787 @@
+import { tool } from "@opencode-ai/plugin";
+import { randomUUID } from "crypto";
+import { existsSync } from "fs";
+import { readFile, appendFile, writeFile } from "fs/promises";
+import { z as zod } from "zod";
+// --- types ---
+export const DirectionSchema = zod.enum(["lower", "higher"]);
+export const ExperimentStatusSchema = zod.enum([
+    "keep",
+    "discard",
+    "crash",
+    "checks_failed",
+]);
+export const RunStatusSchema = zod.enum([
+    "ok",
+    "timeout",
+    "crash",
+    "checks_failed",
+]);
+export const ConfigEventSchema = zod.object({
+    type: zod.literal("config"),
+    timestamp: zod.iso.datetime(),
+    name: zod.string(),
+    metric_name: zod.string(),
+    metric_unit: zod.string().optional(),
+    direction: DirectionSchema,
+    command: zod.string().optional(),
+    checks_command: zod.string().optional(),
+    branch: zod.string().optional(),
+    files_in_scope: zod.array(zod.string()).optional(),
+    segment: zod.number().int().nonnegative(),
+});
+export const MetricsSchema = zod.record(zod.string(), zod.number());
+export const RunEventSchema = zod.object({
+    type: zod.literal("run"),
+    timestamp: zod.iso.datetime(),
+    run_id: zod.string(),
+    segment: zod.number().int().nonnegative(),
+    command: zod.string(),
+    status: RunStatusSchema,
+    duration_seconds: zod.number(),
+    timed_out: zod.boolean().optional(),
+    exit_code: zod.number().optional(),
+    metrics: MetricsSchema,
+    checks_pass: zod.boolean().optional(),
+    checks_duration_seconds: zod.number().optional(),
+    log_tail: zod.string().optional(),
+});
+export const ExperimentEventSchema = zod.object({
+    type: zod.literal("experiment"),
+    timestamp: zod.iso.datetime(),
+    run_id: zod.string(),
+    segment: zod.number().int().nonnegative(),
+    commit_before: zod.string(),
+    commit_after: zod.string().optional(),
+    metric: zod.number(),
+    metrics: MetricsSchema.optional(),
+    status: ExperimentStatusSchema,
+    description: zod.string(),
+});
+export const EventSchema = zod.union([
+    ConfigEventSchema,
+    RunEventSchema,
+    ExperimentEventSchema,
+]);
+// --- state ---
+const JSONL_FILE = "autoresearch.jsonl";
+const DEFAULT_COMMAND_TIMEOUT_SECONDS = 600;
+const DEFAULT_CHECKS_TIMEOUT_SECONDS = 300;
+function normalizeTimeoutSeconds(value, fallback) {
+    if (!Number.isFinite(value) || value <= 0) {
+        return fallback;
+    }
+    return Math.max(1, Math.floor(value));
+}
+async function loadEvents(directory) {
+    const filePath = `${directory}/${JSONL_FILE}`;
+    if (!existsSync(filePath)) {
+        return [];
+    }
+    const content = await readFile(filePath, "utf-8");
+    const lines = content.split("\n").filter((line) => line.trim() !== "");
+    const events = [];
+    let malformedLineCount = 0;
+    for (const line of lines) {
+        try {
+            const parsed = JSON.parse(line);
+            const result = EventSchema.safeParse(parsed);
+            if (result.success) {
+                events.push(result.data);
+            }
+            else {
+                malformedLineCount += 1;
+            }
+        }
+        catch {
+            malformedLineCount += 1;
+        }
+    }
+    if (malformedLineCount > 0) {
+        console.warn(`Skipped ${malformedLineCount} malformed autoresearch event(s) in ${filePath}`);
+    }
+    return events;
+}
+async function appendEvent(directory, event) {
+    const filePath = `${directory}/${JSONL_FILE}`;
+    const line = JSON.stringify(event) + "\n";
+    await appendFile(filePath, line, "utf-8");
+}
+function getCurrentSegment(events) {
+    const configEvents = events.filter((e) => e.type === "config");
+    if (configEvents.length === 0) {
+        return 0;
+    }
+    return Math.max(...configEvents.map((e) => e.segment));
+}
+function getSession(events, segment) {
+    const config = events.find((e) => e.type === "config" && e.segment === segment);
+    const runs = events.filter((e) => e.type === "run" && e.segment === segment);
+    const experiments = events.filter((e) => e.type === "experiment" && e.segment === segment);
+    return { segment, config, runs, experiments };
+}
+function getCurrentConfig(events) {
+    const segment = getCurrentSegment(events);
+    return events.find((e) => e.type === "config" && e.segment === segment);
+}
+function findRun(events, runId) {
+    return events.find((e) => e.type === "run" && e.run_id === runId);
+}
+function computeStatusSummary(events, currentBranch, gitDirty) {
+    const segment = getCurrentSegment(events);
+    const session = getSession(events, segment);
+    const config = session.config;
+    const keep_count = session.experiments.filter((e) => e.status === "keep").length;
+    const discard_count = session.experiments.filter((e) => e.status === "discard").length;
+    const crash_count = session.experiments.filter((e) => e.status === "crash").length;
+    const checks_failed_count = session.experiments.filter((e) => e.status === "checks_failed").length;
+    const baselineRun = session.runs.find((r) => r.status === "ok");
+    const baseline_metric = config && baselineRun ? (baselineRun.metrics[config.metric_name] ?? null) : null;
+    let best_metric = null;
+    let best_run_id = null;
+    if (config) {
+        const validExperiments = session.experiments.filter((e) => e.status === "keep");
+        if (validExperiments.length > 0) {
+            if (config.direction === "lower") {
+                const best = validExperiments.reduce((min, e) => e.metric < min.metric ? e : min);
+                best_metric = best.metric;
+                best_run_id = best.run_id;
+            }
+            else {
+                const best = validExperiments.reduce((max, e) => e.metric > max.metric ? e : max);
+                best_metric = best.metric;
+                best_run_id = best.run_id;
+            }
+        }
+    }
+    return {
+        segment,
+        total_runs: session.runs.length,
+        keep_count,
+        discard_count,
+        crash_count,
+        checks_failed_count,
+        baseline_metric,
+        best_metric,
+        best_run_id,
+        current_branch: currentBranch,
+        git_dirty: gitDirty,
+        direction: config?.direction ?? null,
+        metric_name: config?.metric_name ?? null,
+    };
+}
+async function getGitStatus($, directory) {
+    try {
+        const result = await $ `git -C ${directory} status --porcelain -b`
+            .quiet()
+            .nothrow();
+        if (result.exitCode !== 0) {
+            return {
+                isRepo: false,
+                branch: null,
+                isDirty: false,
+                hasStaged: false,
+                hasUnstaged: false,
+                untrackedFiles: [],
+            };
+        }
+        const output = result.stdout.toString();
+        const lines = output.split("\n");
+        let branch = null;
+        const statusLines = [];
+        for (const line of lines) {
+            if (line.startsWith("## ")) {
+                const match = line.match(/^##\s+(\S+)/);
+                if (match) {
+                    const branchName = match[1].replace(/^\*/, "").trim();
+                    if (branchName) {
+                        branch = branchName;
+                    }
+                }
+            }
+            else if (line.trim()) {
+                statusLines.push(line);
+            }
+        }
+        let hasStaged = false;
+        let hasUnstaged = false;
+        const untrackedFiles = [];
+        for (const line of statusLines) {
+            if (line.startsWith("??") && line.length > 3) {
+                untrackedFiles.push(line.slice(3));
+            }
+            else if (line.length > 1) {
+                const status = line[1];
+                if (status === "M" || status === "D") {
+                    hasUnstaged = true;
+                }
+                if (line[0] === " " && (status === "M" || status === "D")) {
+                    hasStaged = true;
+                }
+            }
+        }
+        return {
+            isRepo: true,
+            branch,
+            isDirty: statusLines.length > 0,
+            hasStaged,
+            hasUnstaged,
+            untrackedFiles,
+        };
+    }
+    catch {
+        return {
+            isRepo: false,
+            branch: null,
+            isDirty: false,
+            hasStaged: false,
+            hasUnstaged: false,
+            untrackedFiles: [],
+        };
+    }
+}
+async function stageAll($, directory) {
+    const result = await $ `git -C ${directory} add -A`.quiet().nothrow();
+    if (result.exitCode !== 0) {
+        throw new Error(`Failed to stage changes: ${result.stderr.toString()}`);
+    }
+}
+async function commit($, directory, message) {
+    const result = await $ `git -C ${directory} commit -m ${message}`
+        .quiet()
+        .nothrow();
+    if (result.exitCode !== 0) {
+        throw new Error(`Failed to commit: ${result.stderr.toString()}`);
+    }
+    const commitResult = await $ `git -C ${directory} rev-parse --short HEAD`.quiet();
+    return commitResult.stdout.toString().trim();
+}
+async function discardChanges($, directory) {
+    await $ `git -C ${directory} reset --hard`.quiet().nothrow();
+    await $ `git -C ${directory} clean -fd`.quiet().nothrow();
+}
+// --- metrics ---
+const METRIC_LINE_REGEX = /^METRIC\s+(\w+)\s*=\s*(-?\d+(?:\.\d+)?)$/;
+function parseMetrics(output) {
+    const metrics = {};
+    const lines = output.split("\n");
+    for (const line of lines) {
+        const match = line.trim().match(METRIC_LINE_REGEX);
+        if (match) {
+            const [, name, valueStr] = match;
+            const value = parseFloat(valueStr);
+            if (!isNaN(value)) {
+                metrics[name] = value;
+            }
+        }
+    }
+    return metrics;
+}
+const CHECKS_SCRIPT = "autoresearch.checks.sh";
+async function runProcessWithTimeout($, directory, command, timeoutSeconds) {
+    const cmd = command.join(" ");
+    const timeoutMs = Math.max(1000, timeoutSeconds * 1000);
+    const shellPromise = $ `bash -c ${cmd}`
+        .cwd(directory)
+        .quiet()
+        .nothrow();
+    const timeoutPromise = new Promise((_, reject) => {
+        setTimeout(() => reject(new Error("TIMEOUT")), timeoutMs);
+    });
+    try {
+        const result = await Promise.race([shellPromise, timeoutPromise]);
+        return {
+            exitCode: result.exitCode,
+            stdout: result.stdout?.toString() || "",
+            stderr: result.stderr?.toString() || "",
+            timedOut: false,
+        };
+    }
+    catch (error) {
+        if (error instanceof Error && error.message === "TIMEOUT") {
+            return {
+                exitCode: -1,
+                stdout: "",
+                stderr: "",
+                timedOut: true,
+            };
+        }
+        throw error;
+    }
+}
+async function runChecks($, directory, timeoutSeconds = 300, checksCommand) {
+    const scriptPath = `${directory}/${CHECKS_SCRIPT}`;
+    if (!checksCommand && !existsSync(scriptPath)) {
+        return {
+            pass: true,
+            duration_seconds: 0,
+            output: "No checks script found",
+        };
+    }
+    const startTime = Date.now();
+    try {
+        const result = checksCommand
+            ? await runProcessWithTimeout($, directory, ["bash", "-c", checksCommand], timeoutSeconds)
+            : await runProcessWithTimeout($, directory, ["bash", scriptPath], timeoutSeconds);
+        const duration_seconds = (Date.now() - startTime) / 1000;
+        if (result.timedOut) {
+            return {
+                pass: false,
+                duration_seconds,
+                output: result.stdout,
+                error: "Checks timed out",
+                timedOut: true,
+            };
+        }
+        if (result.exitCode !== 0) {
+            return {
+                pass: false,
+                duration_seconds,
+                output: result.stdout,
+                error: result.stderr || "Checks failed",
+            };
+        }
+        return {
+            pass: true,
+            duration_seconds,
+            output: result.stdout,
+        };
+    }
+    catch (error) {
+        const duration_seconds = (Date.now() - startTime) / 1000;
+        return {
+            pass: false,
+            duration_seconds,
+            output: "",
+            error: error instanceof Error ? error.message : String(error),
+        };
+    }
+}
+// --- markdown ---
+const MARKDOWN_FILE = "autoresearch.md";
+async function loadMarkdown(directory) {
+    const filePath = `${directory}/${MARKDOWN_FILE}`;
+    if (!existsSync(filePath)) {
+        return "";
+    }
+    return readFile(filePath, "utf-8");
+}
+async function saveMarkdown(directory, content) {
+    const filePath = `${directory}/${MARKDOWN_FILE}`;
+    await writeFile(filePath, content, "utf-8");
+}
+function generateMarkdownTemplate(config) {
+    const timestamp = new Date().toISOString();
+    return `# Autoresearch: ${config.name}
+**Started:** ${timestamp}
+**Segment:** ${config.segment}
+**Branch:** ${config.branch || "Not specified"}
+## Objective
+Optimize ${config.metric_name} (${config.direction} is better).
+## Primary Metric
+- **Name:** ${config.metric_name}
+- **Unit:** ${config.metric_unit || "Not specified"}
+- **Direction:** ${config.direction}
+## Command
+\`\`\`bash
+${config.command || "Not specified"}
+\`\`\`
+## Checks Command
+${config.checks_command || "Not specified"}
+## Files in Scope
+${config.files_in_scope?.map((f) => `- ${f}`).join("\n") || "Not specified"}
+## Baseline
+*Will be populated after first successful run*
+## Best Run
+*Will be populated after first kept experiment*
+## Experiments
+| Run | Status | Metric | Description |
+|-----|--------|--------|-------------|
+## Tried Ideas
+- [ ] Initial baseline
+## Dead Ends
+*None yet*
+## Next Ideas
+- [ ] Establish baseline
+`;
+}
+function updateMarkdownWithSession(existingContent, session, summary) {
+    const config = session.config;
+    if (!config)
+        return existingContent;
+    let content = existingContent;
+    const improvement = summary.baseline_metric !== null &&
+        summary.best_metric !== null &&
+        summary.baseline_metric !== 0
+        ? ((config.direction === "lower" ? -1 : 1) *
+            (((summary.best_metric - summary.baseline_metric) /
+                summary.baseline_metric) *
+                100)).toFixed(2) + "%"
+        : "N/A";
+    if (summary.baseline_metric !== null) {
+        const baselineSection = `## Baseline
+- **Metric:** ${summary.baseline_metric}${config.metric_unit ? ` ${config.metric_unit}` : ""}
+- **Status:** Established`;
+        if (content.includes("## Baseline")) {
+            content = content.replace(/## Baseline[\s\S]*?(?=\n## |$)/, baselineSection + "\n\n");
+        }
+    }
+    if (summary.best_metric !== null && summary.best_run_id) {
+        const bestSection = `## Best Run
+- **Run ID:** ${summary.best_run_id}
+- **Metric:** ${summary.best_metric}${config.metric_unit ? ` ${config.metric_unit}` : ""}
+- **Improvement:** ${improvement}`;
+        if (content.includes("## Best Run")) {
+            content = content.replace(/## Best Run[\s\S]*?(?=\n## |$)/, bestSection + "\n\n");
+        }
+    }
+    if (session.experiments.length > 0) {
+        const tableRows = session.experiments
+            .map((e) => `| ${e.run_id} | ${e.status} | ${e.metric}${config.metric_unit ? ` ${config.metric_unit}` : ""} | ${e.description} |`)
+            .join("\n");
+        const tableHeader = `| Run | Status | Metric | Description |
+|-----|--------|--------|-------------|`;
+        const tableSection = `## Experiments
+${tableHeader}
+${tableRows}`;
+        if (content.includes("## Experiments")) {
+            content = content.replace(/## Experiments[\s\S]*?(?=\n## |$)/, tableSection + "\n\n");
+        }
+    }
+    return content;
+}
+// --- plugin ---
+const z = tool.schema;
+export const AutoresearchPlugin = async ({ $ }) => {
+    return {
+        tool: {
+            init_experiment: tool({
+                description: "Initialize a new autoresearch experiment session. Creates autoresearch.jsonl and autoresearch.md files.",
+                args: {
+                    name: z.string().describe("Name of the experiment"),
+                    metric_name: z.string().describe("Primary metric to optimize"),
+                    metric_unit: z.string().optional().describe("Unit of the metric"),
+                    direction: z
+                        .enum(["lower", "higher"])
+                        .describe("Whether 'lower' or 'higher' metric values are better"),
+                    command: z
+                        .string()
+                        .optional()
+                        .describe("Command to run for experiments"),
+                    checks_command: z
+                        .string()
+                        .optional()
+                        .describe("Command to run after experiments for validation checks"),
+                    branch: z
+                        .string()
+                        .optional()
+                        .describe("Git branch for this experiment"),
+                    files_in_scope: z
+                        .array(z.string())
+                        .optional()
+                        .describe("Files involved in the experiment"),
+                },
+                execute: async (args, context) => {
+                    const { directory: ctxDir } = context;
+                    const events = await loadEvents(ctxDir);
+                    const segment = getCurrentSegment(events) + 1;
+                    const config = {
+                        type: "config",
+                        timestamp: new Date().toISOString(),
+                        name: args.name,
+                        metric_name: args.metric_name,
+                        metric_unit: args.metric_unit,
+                        direction: args.direction,
+                        command: args.command,
+                        checks_command: args.checks_command,
+                        branch: args.branch,
+                        files_in_scope: args.files_in_scope,
+                        segment,
+                    };
+                    await appendEvent(ctxDir, config);
+                    const markdownContent = generateMarkdownTemplate(config);
+                    await saveMarkdown(ctxDir, markdownContent);
+                    return JSON.stringify({
+                        success: true,
+                        segment,
+                        config: {
+                            name: args.name,
+                            metric_name: args.metric_name,
+                            direction: args.direction,
+                            command: args.command,
+                            checks_command: args.checks_command,
+                            branch: args.branch,
+                        },
+                        files_created: ["autoresearch.jsonl", "autoresearch.md"],
+                    });
+                },
+            }),
+            run_experiment: tool({
+                description: "Execute an experiment command, capture metrics from METRIC name=value output lines, and optionally run checks.",
+                args: {
+                    command: z
+                        .string()
+                        .optional()
+                        .describe("Command to run (overrides stored command)"),
+                    timeout_seconds: z
+                        .number()
+                        .int()
+                        .default(DEFAULT_COMMAND_TIMEOUT_SECONDS)
+                        .describe("Timeout for the command in seconds. Non-positive values use the default."),
+                    checks_timeout_seconds: z
+                        .number()
+                        .int()
+                        .default(DEFAULT_CHECKS_TIMEOUT_SECONDS)
+                        .describe("Timeout for checks in seconds. Non-positive values use the default."),
+                },
+                execute: async (args, context) => {
+                    const { directory: ctxDir } = context;
+                    const events = await loadEvents(ctxDir);
+                    const config = getCurrentConfig(events);
+                    if (!config) {
+                        throw new Error("No active experiment found. Run init_experiment first.");
+                    }
+                    const command = args.command ?? config.command;
+                    if (!command) {
+                        throw new Error("No command specified and no stored command in config");
+                    }
+                    const runId = randomUUID();
+                    const segment = config.segment;
+                    const timeoutSeconds = normalizeTimeoutSeconds(args.timeout_seconds, DEFAULT_COMMAND_TIMEOUT_SECONDS);
+                    const checksTimeoutSeconds = normalizeTimeoutSeconds(args.checks_timeout_seconds, DEFAULT_CHECKS_TIMEOUT_SECONDS);
+                    const startTime = Date.now();
+                    let status = "ok";
+                    let exitCode;
+                    let commandOutput = "";
+                    let timedOut = false;
+                    try {
+                        const result = await runProcessWithTimeout($, ctxDir, ["bash", "-c", command], timeoutSeconds);
+                        exitCode = result.exitCode;
+                        commandOutput = result.stdout + result.stderr;
+                        timedOut = result.timedOut;
+                        if (timedOut) {
+                            status = "timeout";
+                        }
+                        else if (exitCode !== 0) {
+                            status = "crash";
+                        }
+                    }
+                    catch (error) {
+                        status = "crash";
+                        commandOutput =
+                            error instanceof Error ? error.message : String(error);
+                    }
+                    const duration_seconds = (Date.now() - startTime) / 1000;
+                    const metrics = parseMetrics(commandOutput);
+                    const primaryMetric = metrics[config.metric_name] ?? null;
+                    let checksPass;
+                    let checks_duration_seconds;
+                    if (status === "ok") {
+                        const checkResult = await runChecks($, ctxDir, checksTimeoutSeconds, config.checks_command);
+                        checksPass = checkResult.pass;
+                        checks_duration_seconds = checkResult.duration_seconds;
+                        if (!checkResult.pass) {
+                            status = "checks_failed";
+                        }
+                    }
+                    const runEvent = {
+                        type: "run",
+                        timestamp: new Date().toISOString(),
+                        run_id: runId,
+                        segment,
+                        command,
+                        status,
+                        duration_seconds,
+                        timed_out: timedOut ? true : undefined,
+                        exit_code: exitCode,
+                        metrics,
+                        checks_pass: checksPass,
+                        checks_duration_seconds,
+                        log_tail: commandOutput.slice(-2000),
+                    };
+                    await appendEvent(ctxDir, runEvent);
+                    return JSON.stringify({
+                        run_id: runId,
+                        status,
+                        primary_metric: primaryMetric,
+                        metrics,
+                        duration_seconds,
+                        checks_pass: checksPass,
+                        checks_duration_seconds,
+                        log_tail: runEvent.log_tail,
+                    });
+                },
+            }),
+            log_experiment: tool({
+                description: "Log an experiment result with decision (keep/discard/crash/checks_failed). Updates autoresearch.jsonl and autoresearch.md.",
+                args: {
+                    run_id: z.string().describe("ID of the run to log"),
+                    commit: z.string().describe("Commit hash before the experiment"),
+                    metric: z.number().describe("Primary metric value"),
+                    status: z
+                        .enum(["keep", "discard", "crash", "checks_failed"])
+                        .describe("Status of the experiment"),
+                    description: z.string().describe("Description of what changed"),
+                    metrics: z
+                        .record(z.string(), z.number())
+                        .optional()
+                        .describe("Secondary metrics"),
+                },
+                execute: async (args, context) => {
+                    const { directory: ctxDir } = context;
+                    const events = await loadEvents(ctxDir);
+                    const config = getCurrentConfig(events);
+                    if (!config) {
+                        throw new Error("No active experiment found. Run init_experiment first.");
+                    }
+                    const run = findRun(events, args.run_id);
+                    if (!run) {
+                        throw new Error(`Run '${args.run_id}' not found in current segment`);
+                    }
+                    if (run.segment !== config.segment) {
+                        throw new Error(`Run '${args.run_id}' belongs to segment ${run.segment}, current segment is ${config.segment}`);
+                    }
+                    const experimentEvent = {
+                        type: "experiment",
+                        timestamp: new Date().toISOString(),
+                        run_id: args.run_id,
+                        segment: config.segment,
+                        commit_before: args.commit,
+                        metric: args.metric,
+                        metrics: args.metrics,
+                        status: args.status,
+                        description: args.description,
+                    };
+                    await appendEvent(ctxDir, experimentEvent);
+                    const updatedEvents = [...events, experimentEvent];
+                    const markdownContent = await loadMarkdown(ctxDir);
+                    const summary = computeStatusSummary(updatedEvents, null, false);
+                    const session = getSession(updatedEvents, config.segment);
+                    const updatedMarkdown = updateMarkdownWithSession(markdownContent, session, summary);
+                    await saveMarkdown(ctxDir, updatedMarkdown);
+                    return JSON.stringify({
+                        success: true,
+                        run_id: args.run_id,
+                        status: args.status,
+                        metric: args.metric,
+                        segment: config.segment,
+                    });
+                },
+            }),
+            keep_experiment: tool({
+                description: "Commit the current experiment changes to git. Stages all changes and creates a commit.",
+                args: {
+                    commit_message: z
+                        .string()
+                        .describe("Commit message for the kept experiment"),
+                },
+                execute: async (args, context) => {
+                    const { directory: ctxDir } = context;
+                    const gitStatus = await getGitStatus($, ctxDir);
+                    if (!gitStatus.isRepo) {
+                        throw new Error("Not in a git repository. Initialize git first.");
+                    }
+                    if (!gitStatus.isDirty) {
+                        throw new Error("No changes to commit. Make some changes first.");
+                    }
+                    await stageAll($, ctxDir);
+                    const commitHash = await commit($, ctxDir, args.commit_message);
+                    return JSON.stringify({
+                        success: true,
+                        commit_hash: commitHash,
+                        branch: gitStatus.branch,
+                        message: args.commit_message,
+                    });
+                },
+            }),
+            discard_experiment: tool({
+                description: "Discard uncommitted changes to restore the pre-experiment state. Requires explicit confirmation.",
+                args: {
+                    confirmation: z.string().describe("Type 'DISCARD' to confirm"),
+                },
+                execute: async (args, context) => {
+                    const { directory: ctxDir } = context;
+                    if (args.confirmation !== "DISCARD") {
+                        throw new Error("Confirmation required: type 'DISCARD' to confirm discarding changes");
+                    }
+                    const gitStatus = await getGitStatus($, ctxDir);
+                    if (!gitStatus.isRepo) {
+                        throw new Error("Not in a git repository.");
+                    }
+                    if (!gitStatus.isDirty) {
+                        return JSON.stringify({
+                            success: true,
+                            message: "No changes to discard",
+                            discarded_files: [],
+                        });
+                    }
+                    await discardChanges($, ctxDir);
+                    return JSON.stringify({
+                        success: true,
+                        message: "Changes discarded successfully",
+                        discarded_files: gitStatus.untrackedFiles,
+                        restored_changes: gitStatus.hasStaged || gitStatus.hasUnstaged,
+                    });
+                },
+            }),
+            autoresearch_status: tool({
+                description: "Get the current status of the autoresearch session including metrics, experiment counts, and git state.",
+                args: {},
+                execute: async (_args, context) => {
+                    const { directory: ctxDir } = context;
+                    const events = await loadEvents(ctxDir);
+                    const gitStatus = await getGitStatus($, ctxDir);
+                    const summary = computeStatusSummary(events, gitStatus.branch, gitStatus.isDirty);
+                    const config = getCurrentConfig(events);
+                    return JSON.stringify({
+                        segment: summary.segment,
+                        experiment_name: config?.name ?? null,
+                        metric_name: summary.metric_name,
+                        direction: summary.direction,
+                        total_runs: summary.total_runs,
+                        keep_count: summary.keep_count,
+                        discard_count: summary.discard_count,
+                        crash_count: summary.crash_count,
+                        checks_failed_count: summary.checks_failed_count,
+                        baseline_metric: summary.baseline_metric,
+                        best_metric: summary.best_metric,
+                        best_run_id: summary.best_run_id,
+                        current_branch: summary.current_branch,
+                        git_dirty: summary.git_dirty,
+                        files: {
+                            jsonl: "autoresearch.jsonl",
+                            markdown: "autoresearch.md",
+                        },
+                    });
+                },
+            }),
+        },
+    };
+};
+export default AutoresearchPlugin;

package/package.json ADDED Viewed

@@ -0,0 +1,40 @@
+{
+  "name": "@sndrgrdn/opencode-autoresearch",
+  "version": "0.1.0",
+  "description": "Autonomous experiment loop plugin for OpenCode - optimize code through iterative experimentation",
+  "type": "module",
+  "exports": {
+    ".": {
+      "types": "./dist/index.d.ts",
+      "default": "./dist/index.js"
+    }
+  },
+  "files": ["dist/", "skills/", "README.md"],
+  "scripts": {
+    "build": "tsc",
+    "dev": "tsc --watch",
+    "typecheck": "tsc --noEmit",
+    "clean": "rm -rf dist/",
+    "smoke": "bun run build && bun run smoke.ts"
+  },
+  "dependencies": {
+    "@opencode-ai/plugin": "^1.2.26",
+    "zod": "^4.3.6"
+  },
+  "devDependencies": {
+    "@types/bun": "latest",
+    "typescript": "^5.9.3"
+  },
+  "engines": {
+    "bun": ">=1.0.0"
+  },
+  "keywords": ["opencode", "plugin", "autoresearch", "optimization", "experiment"],
+  "repository": {
+    "type": "git",
+    "url": "https://github.com/sndrgrdn/opencode-autoresearch.git"
+  },
+  "license": "MIT",
+  "publishConfig": {
+    "access": "public"
+  }
+}

package/skills/autoresearch/SKILL.md ADDED Viewed

@@ -0,0 +1,110 @@
+---
+name: autoresearch
+description: Set up and run an autonomous experiment loop for any optimization target. Gathers what to optimize, then starts the loop immediately. Use when asked to "run autoresearch", "optimize X in a loop", "set up autoresearch for X", or "start experiments".
+---
+# Autoresearch
+Autonomous experiment loop: try ideas, keep what works, discard what doesn't, never stop.
+## Tools
+- **`init_experiment`** — configure session (name, metric, unit, direction). Call again to re-initialize with a new baseline when the optimization target changes.
+- **`run_experiment`** — runs command, times it, captures output.
+- **`log_experiment`** — records result. Always include secondary `metrics` dict. Dashboard: ctrl+x.
+- **`keep_experiment`** — commits kept changes after a successful run.
+- **`discard_experiment`** — reverts discarded/crashed/failed-check changes. Use confirmation `DISCARD`.
+- **`autoresearch_status`** — shows current session stats and history.
+## Setup
+1. Ask (or infer): **Goal**, **Command**, **Metric** (+ direction), **Files in scope**, **Constraints**.
+2. `git checkout -b autoresearch/<goal>-<date>`
+3. Read the source files. Understand the workload deeply before writing anything.
+4. Write `autoresearch.md` and `autoresearch.sh` (see below). Commit both.
+5. `init_experiment` → run baseline → `log_experiment` → start looping immediately.
+### `autoresearch.md`
+This is the heart of the session. A fresh agent with no context should be able to read this file and run the loop effectively. Invest time making it excellent.
+```markdown
+# Autoresearch: <goal>
+## Objective
+<Specific description of what we're optimizing and the workload.>
+## Metrics
+- **Primary**: <name> (<unit>, lower/higher is better)
+- **Secondary**: <name>, ...
+## How to Run
+`./autoresearch.sh` — outputs `METRIC name=number` lines.
+## Files in Scope
+<Every file the agent may modify, with a brief note on what it does.>
+## Off Limits
+<What must NOT be touched.>
+## Constraints
+<Hard rules: tests must pass, no new deps, etc.>
+## What's Been Tried
+<Update this section as experiments accumulate. Note key wins, dead ends,
+and architectural insights so the agent doesn't repeat failed approaches.>
+```
+Update `autoresearch.md` periodically — especially the "What's Been Tried" section — so resuming agents have full context.
+### `autoresearch.sh`
+Bash script (`set -euo pipefail`) that: pre-checks fast (syntax errors in <1s), runs the benchmark, outputs `METRIC name=number` lines. Keep it fast — every second is multiplied by hundreds of runs. Update it during the loop as needed.
+### `autoresearch.checks.sh` (optional)
+Bash script (`set -euo pipefail`) for backpressure/correctness checks: tests, types, lint, etc. **Only create this file when the user's constraints require correctness validation** (e.g., "tests must pass", "types must check").
+When this file exists:
+- Runs automatically after every **passing** benchmark in `run_experiment`.
+- If checks fail, `run_experiment` reports it clearly — log as `checks_failed`.
+- Its execution time does **NOT** affect the primary metric.
+- You cannot `keep` a result when checks have failed.
+- Has a separate timeout (default 300s, configurable via `checks_timeout_seconds`).
+When this file does **not** exist, everything behaves exactly as before — no changes to the loop.
+**Keep output minimal.** Only the last 80 lines of checks output are fed back to the agent on failure. Suppress verbose progress/success output and let only errors through. This keeps context lean and helps the agent pinpoint what broke.
+```bash
+#!/bin/bash
+set -euo pipefail
+# Example: run tests and typecheck — suppress success output, only show errors
+pnpm test --run --reporter=dot 2>&1 | tail -50
+pnpm typecheck 2>&1 | grep -i error || true
+```
+## Loop Rules
+**LOOP FOREVER.** Never ask "should I continue?" — the user expects autonomous work.
+- **Primary metric is king.** Improved → `keep_experiment`. Worse/equal → `discard_experiment`. Secondary metrics rarely affect this.
+- **Simpler is better.** Removing code for equal perf = keep. Ugly complexity for tiny gain = probably discard.
+- **Don't thrash.** Repeatedly reverting the same idea? Try something structurally different.
+- **Crashes:** fix if trivial, otherwise log and move on. Don't over-invest.
+- **Think longer when stuck.** Re-read source files, study the profiling data, reason about what the CPU is actually doing. The best ideas come from deep understanding, not from trying random variations.
+- **Resuming:** if `autoresearch.md` exists, read it + git log, continue looping.
+When reviewing progress or deciding what to try next, call `autoresearch_status`.
+**NEVER STOP.** The user may be away for hours. Keep going until interrupted.
+## Ideas Backlog
+When you discover complex but promising optimizations that you won't pursue right now, **append them as bullets to `autoresearch.ideas.md`**. Don't let good ideas get lost.
+On resume (context limit, crash), check `autoresearch.ideas.md` — prune stale/tried entries, experiment with the rest. When all paths are exhausted, delete the file and write a final summary.
+## User Messages During Experiments
+If the user sends a message while an experiment is running, finish the current `run_experiment` + `log_experiment` cycle first, then incorporate their feedback in the next iteration. Don't abandon a running experiment.