@sndrgrdn/opencode-autoresearch 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md ADDED
@@ -0,0 +1,164 @@
1
+ # Autoresearch Plugin for OpenCode
2
+
3
+ An OpenCode plugin that implements an autonomous keep/discard experiment loop for optimizing code through iterative experimentation.
4
+
5
+ ## Features
6
+
7
+ - **Experiment Tracking**: Record and track experiments with metrics in JSONL format
8
+ - **Git Integration**: Keep or discard experiments with proper git safety
9
+ - **Metric Parsing**: Automatically parse `METRIC name=value` output lines
10
+ - **Markdown Documentation**: Auto-generated human-readable experiment logs
11
+ - **Checks Support**: Optional validation via `autoresearch.checks.sh`
12
+ - **AI Skill**: Included skill provides guided workflows and best practices
13
+
14
+ ## Installation
15
+
16
+ ### Local Development
17
+
18
+ Add to your OpenCode configuration:
19
+
20
+ ```json
21
+ {
22
+ "plugin": ["file:///path/to/oc-autoresearch"]
23
+ }
24
+ ```
25
+
26
+ ### From npm (when published)
27
+
28
+ ```json
29
+ {
30
+ "plugin": ["@sndrgrdn/opencode-autoresearch"]
31
+ }
32
+ ```
33
+
34
+ ### Using the Skill
35
+
36
+ Copy the skill to your OpenCode skills directory:
37
+
38
+ ```bash
39
+ mkdir -p ~/.config/opencode/skills/autoresearch
40
+ cp node_modules/@sndrgrdn/opencode-autoresearch/skills/autoresearch/SKILL.md ~/.config/opencode/skills/autoresearch/
41
+ ```
42
+
43
+ The skill provides guided workflows and best practices for the autoresearch loop.
44
+
45
+ ## Tools
46
+
47
+ ### init_experiment
48
+
49
+ Initialize a new autoresearch experiment session.
50
+
51
+ ```json
52
+ {
53
+ "name": "optimize-parser",
54
+ "metric_name": "parse_time",
55
+ "metric_unit": "ms",
56
+ "direction": "lower",
57
+ "command": "node benchmark.js",
58
+ "branch": "experiment/parser-opt",
59
+ "files_in_scope": ["src/parser.js", "src/lexer.js"]
60
+ }
61
+ ```
62
+
63
+ ### run_experiment
64
+
65
+ Execute the experiment command and capture metrics.
66
+
67
+ ```json
68
+ {
69
+ "timeout_seconds": 600,
70
+ "checks_timeout_seconds": 300
71
+ }
72
+ ```
73
+
74
+ ### log_experiment
75
+
76
+ Log the experiment result with decision.
77
+
78
+ ```json
79
+ {
80
+ "run_id": "uuid-from-run",
81
+ "commit": "abc123",
82
+ "metric": 45.2,
83
+ "status": "keep",
84
+ "description": "Refactored parse loop to use iterator",
85
+ "metrics": {
86
+ "memory_mb": 128
87
+ }
88
+ }
89
+ ```
90
+
91
+ ### keep_experiment
92
+
93
+ Commit the current experiment changes.
94
+
95
+ ```json
96
+ {
97
+ "commit_message": "perf(parser): optimize parse loop using iterator"
98
+ }
99
+ ```
100
+
101
+ ### discard_experiment
102
+
103
+ Discard uncommitted changes (requires confirmation).
104
+
105
+ ```json
106
+ {
107
+ "confirmation": "DISCARD"
108
+ }
109
+ ```
110
+
111
+ ### autoresearch_status
112
+
113
+ Get current session status including metrics and counts.
114
+
115
+ ## Workflow
116
+
117
+ 1. **Initialize**: `init_experiment` creates `autoresearch.jsonl` and `autoresearch.md`
118
+ 2. **Baseline**: `run_experiment` to establish baseline metrics
119
+ 3. **Log**: `log_experiment` to record the baseline
120
+ 4. **Iterate**:
121
+ - Edit code
122
+ - `run_experiment` to measure
123
+ - `log_experiment` to record decision
124
+ - `keep_experiment` or `discard_experiment`
125
+ 5. **Status**: `autoresearch_status` to review progress
126
+
127
+ ## Metric Format
128
+
129
+ Experiments should output metrics in this format:
130
+
131
+ ```
132
+ METRIC parse_time=45.2
133
+ METRIC memory_mb=128
134
+ METRIC throughput=1000
135
+ ```
136
+
137
+ ## Files Created
138
+
139
+ - `autoresearch.jsonl` - Append-only event log
140
+ - `autoresearch.md` - Human-readable experiment notes
141
+ - `autoresearch.checks.sh` - Optional validation script
142
+
143
+ ## Development
144
+
145
+ ```bash
146
+ # Install dependencies
147
+ bun install
148
+
149
+ # Type check
150
+ bun run typecheck
151
+
152
+ # Build
153
+ bun run build
154
+
155
+ # Smoke test
156
+ bun run smoke
157
+
158
+ # Watch mode
159
+ bun run dev
160
+ ```
161
+
162
+ ## License
163
+
164
+ MIT
@@ -0,0 +1,154 @@
1
+ import type { Plugin } from "@opencode-ai/plugin";
2
+ import { z as zod } from "zod";
3
+ export declare const DirectionSchema: zod.ZodEnum<{
4
+ lower: "lower";
5
+ higher: "higher";
6
+ }>;
7
+ export type Direction = zod.infer<typeof DirectionSchema>;
8
+ export declare const ExperimentStatusSchema: zod.ZodEnum<{
9
+ keep: "keep";
10
+ discard: "discard";
11
+ crash: "crash";
12
+ checks_failed: "checks_failed";
13
+ }>;
14
+ export type ExperimentStatus = zod.infer<typeof ExperimentStatusSchema>;
15
+ export declare const RunStatusSchema: zod.ZodEnum<{
16
+ crash: "crash";
17
+ checks_failed: "checks_failed";
18
+ ok: "ok";
19
+ timeout: "timeout";
20
+ }>;
21
+ export type RunStatus = zod.infer<typeof RunStatusSchema>;
22
+ export declare const ConfigEventSchema: zod.ZodObject<{
23
+ type: zod.ZodLiteral<"config">;
24
+ timestamp: zod.ZodISODateTime;
25
+ name: zod.ZodString;
26
+ metric_name: zod.ZodString;
27
+ metric_unit: zod.ZodOptional<zod.ZodString>;
28
+ direction: zod.ZodEnum<{
29
+ lower: "lower";
30
+ higher: "higher";
31
+ }>;
32
+ command: zod.ZodOptional<zod.ZodString>;
33
+ checks_command: zod.ZodOptional<zod.ZodString>;
34
+ branch: zod.ZodOptional<zod.ZodString>;
35
+ files_in_scope: zod.ZodOptional<zod.ZodArray<zod.ZodString>>;
36
+ segment: zod.ZodNumber;
37
+ }, zod.core.$strip>;
38
+ export type ConfigEvent = zod.infer<typeof ConfigEventSchema>;
39
+ export declare const MetricsSchema: zod.ZodRecord<zod.ZodString, zod.ZodNumber>;
40
+ export type Metrics = Record<string, number>;
41
+ export declare const RunEventSchema: zod.ZodObject<{
42
+ type: zod.ZodLiteral<"run">;
43
+ timestamp: zod.ZodISODateTime;
44
+ run_id: zod.ZodString;
45
+ segment: zod.ZodNumber;
46
+ command: zod.ZodString;
47
+ status: zod.ZodEnum<{
48
+ crash: "crash";
49
+ checks_failed: "checks_failed";
50
+ ok: "ok";
51
+ timeout: "timeout";
52
+ }>;
53
+ duration_seconds: zod.ZodNumber;
54
+ timed_out: zod.ZodOptional<zod.ZodBoolean>;
55
+ exit_code: zod.ZodOptional<zod.ZodNumber>;
56
+ metrics: zod.ZodRecord<zod.ZodString, zod.ZodNumber>;
57
+ checks_pass: zod.ZodOptional<zod.ZodBoolean>;
58
+ checks_duration_seconds: zod.ZodOptional<zod.ZodNumber>;
59
+ log_tail: zod.ZodOptional<zod.ZodString>;
60
+ }, zod.core.$strip>;
61
+ export type RunEvent = zod.infer<typeof RunEventSchema>;
62
+ export declare const ExperimentEventSchema: zod.ZodObject<{
63
+ type: zod.ZodLiteral<"experiment">;
64
+ timestamp: zod.ZodISODateTime;
65
+ run_id: zod.ZodString;
66
+ segment: zod.ZodNumber;
67
+ commit_before: zod.ZodString;
68
+ commit_after: zod.ZodOptional<zod.ZodString>;
69
+ metric: zod.ZodNumber;
70
+ metrics: zod.ZodOptional<zod.ZodRecord<zod.ZodString, zod.ZodNumber>>;
71
+ status: zod.ZodEnum<{
72
+ keep: "keep";
73
+ discard: "discard";
74
+ crash: "crash";
75
+ checks_failed: "checks_failed";
76
+ }>;
77
+ description: zod.ZodString;
78
+ }, zod.core.$strip>;
79
+ export type ExperimentEvent = zod.infer<typeof ExperimentEventSchema>;
80
+ export declare const EventSchema: zod.ZodUnion<readonly [zod.ZodObject<{
81
+ type: zod.ZodLiteral<"config">;
82
+ timestamp: zod.ZodISODateTime;
83
+ name: zod.ZodString;
84
+ metric_name: zod.ZodString;
85
+ metric_unit: zod.ZodOptional<zod.ZodString>;
86
+ direction: zod.ZodEnum<{
87
+ lower: "lower";
88
+ higher: "higher";
89
+ }>;
90
+ command: zod.ZodOptional<zod.ZodString>;
91
+ checks_command: zod.ZodOptional<zod.ZodString>;
92
+ branch: zod.ZodOptional<zod.ZodString>;
93
+ files_in_scope: zod.ZodOptional<zod.ZodArray<zod.ZodString>>;
94
+ segment: zod.ZodNumber;
95
+ }, zod.core.$strip>, zod.ZodObject<{
96
+ type: zod.ZodLiteral<"run">;
97
+ timestamp: zod.ZodISODateTime;
98
+ run_id: zod.ZodString;
99
+ segment: zod.ZodNumber;
100
+ command: zod.ZodString;
101
+ status: zod.ZodEnum<{
102
+ crash: "crash";
103
+ checks_failed: "checks_failed";
104
+ ok: "ok";
105
+ timeout: "timeout";
106
+ }>;
107
+ duration_seconds: zod.ZodNumber;
108
+ timed_out: zod.ZodOptional<zod.ZodBoolean>;
109
+ exit_code: zod.ZodOptional<zod.ZodNumber>;
110
+ metrics: zod.ZodRecord<zod.ZodString, zod.ZodNumber>;
111
+ checks_pass: zod.ZodOptional<zod.ZodBoolean>;
112
+ checks_duration_seconds: zod.ZodOptional<zod.ZodNumber>;
113
+ log_tail: zod.ZodOptional<zod.ZodString>;
114
+ }, zod.core.$strip>, zod.ZodObject<{
115
+ type: zod.ZodLiteral<"experiment">;
116
+ timestamp: zod.ZodISODateTime;
117
+ run_id: zod.ZodString;
118
+ segment: zod.ZodNumber;
119
+ commit_before: zod.ZodString;
120
+ commit_after: zod.ZodOptional<zod.ZodString>;
121
+ metric: zod.ZodNumber;
122
+ metrics: zod.ZodOptional<zod.ZodRecord<zod.ZodString, zod.ZodNumber>>;
123
+ status: zod.ZodEnum<{
124
+ keep: "keep";
125
+ discard: "discard";
126
+ crash: "crash";
127
+ checks_failed: "checks_failed";
128
+ }>;
129
+ description: zod.ZodString;
130
+ }, zod.core.$strip>]>;
131
+ export type Event = zod.infer<typeof EventSchema>;
132
+ export interface ExperimentSession {
133
+ segment: number;
134
+ config?: ConfigEvent;
135
+ runs: RunEvent[];
136
+ experiments: ExperimentEvent[];
137
+ }
138
+ export interface StatusSummary {
139
+ segment: number;
140
+ total_runs: number;
141
+ keep_count: number;
142
+ discard_count: number;
143
+ crash_count: number;
144
+ checks_failed_count: number;
145
+ baseline_metric: number | null;
146
+ best_metric: number | null;
147
+ best_run_id: string | null;
148
+ current_branch: string | null;
149
+ git_dirty: boolean;
150
+ direction: Direction | null;
151
+ metric_name: string | null;
152
+ }
153
+ export declare const AutoresearchPlugin: Plugin;
154
+ export default AutoresearchPlugin;
package/dist/index.js ADDED
@@ -0,0 +1,787 @@
1
+ import { tool } from "@opencode-ai/plugin";
2
+ import { randomUUID } from "crypto";
3
+ import { existsSync } from "fs";
4
+ import { readFile, appendFile, writeFile } from "fs/promises";
5
+ import { z as zod } from "zod";
6
+ // --- types ---
7
+ export const DirectionSchema = zod.enum(["lower", "higher"]);
8
+ export const ExperimentStatusSchema = zod.enum([
9
+ "keep",
10
+ "discard",
11
+ "crash",
12
+ "checks_failed",
13
+ ]);
14
+ export const RunStatusSchema = zod.enum([
15
+ "ok",
16
+ "timeout",
17
+ "crash",
18
+ "checks_failed",
19
+ ]);
20
+ export const ConfigEventSchema = zod.object({
21
+ type: zod.literal("config"),
22
+ timestamp: zod.iso.datetime(),
23
+ name: zod.string(),
24
+ metric_name: zod.string(),
25
+ metric_unit: zod.string().optional(),
26
+ direction: DirectionSchema,
27
+ command: zod.string().optional(),
28
+ checks_command: zod.string().optional(),
29
+ branch: zod.string().optional(),
30
+ files_in_scope: zod.array(zod.string()).optional(),
31
+ segment: zod.number().int().nonnegative(),
32
+ });
33
+ export const MetricsSchema = zod.record(zod.string(), zod.number());
34
+ export const RunEventSchema = zod.object({
35
+ type: zod.literal("run"),
36
+ timestamp: zod.iso.datetime(),
37
+ run_id: zod.string(),
38
+ segment: zod.number().int().nonnegative(),
39
+ command: zod.string(),
40
+ status: RunStatusSchema,
41
+ duration_seconds: zod.number(),
42
+ timed_out: zod.boolean().optional(),
43
+ exit_code: zod.number().optional(),
44
+ metrics: MetricsSchema,
45
+ checks_pass: zod.boolean().optional(),
46
+ checks_duration_seconds: zod.number().optional(),
47
+ log_tail: zod.string().optional(),
48
+ });
49
+ export const ExperimentEventSchema = zod.object({
50
+ type: zod.literal("experiment"),
51
+ timestamp: zod.iso.datetime(),
52
+ run_id: zod.string(),
53
+ segment: zod.number().int().nonnegative(),
54
+ commit_before: zod.string(),
55
+ commit_after: zod.string().optional(),
56
+ metric: zod.number(),
57
+ metrics: MetricsSchema.optional(),
58
+ status: ExperimentStatusSchema,
59
+ description: zod.string(),
60
+ });
61
+ export const EventSchema = zod.union([
62
+ ConfigEventSchema,
63
+ RunEventSchema,
64
+ ExperimentEventSchema,
65
+ ]);
66
+ // --- state ---
67
+ const JSONL_FILE = "autoresearch.jsonl";
68
+ const DEFAULT_COMMAND_TIMEOUT_SECONDS = 600;
69
+ const DEFAULT_CHECKS_TIMEOUT_SECONDS = 300;
70
+ function normalizeTimeoutSeconds(value, fallback) {
71
+ if (!Number.isFinite(value) || value <= 0) {
72
+ return fallback;
73
+ }
74
+ return Math.max(1, Math.floor(value));
75
+ }
76
+ async function loadEvents(directory) {
77
+ const filePath = `${directory}/${JSONL_FILE}`;
78
+ if (!existsSync(filePath)) {
79
+ return [];
80
+ }
81
+ const content = await readFile(filePath, "utf-8");
82
+ const lines = content.split("\n").filter((line) => line.trim() !== "");
83
+ const events = [];
84
+ let malformedLineCount = 0;
85
+ for (const line of lines) {
86
+ try {
87
+ const parsed = JSON.parse(line);
88
+ const result = EventSchema.safeParse(parsed);
89
+ if (result.success) {
90
+ events.push(result.data);
91
+ }
92
+ else {
93
+ malformedLineCount += 1;
94
+ }
95
+ }
96
+ catch {
97
+ malformedLineCount += 1;
98
+ }
99
+ }
100
+ if (malformedLineCount > 0) {
101
+ console.warn(`Skipped ${malformedLineCount} malformed autoresearch event(s) in ${filePath}`);
102
+ }
103
+ return events;
104
+ }
105
+ async function appendEvent(directory, event) {
106
+ const filePath = `${directory}/${JSONL_FILE}`;
107
+ const line = JSON.stringify(event) + "\n";
108
+ await appendFile(filePath, line, "utf-8");
109
+ }
110
+ function getCurrentSegment(events) {
111
+ const configEvents = events.filter((e) => e.type === "config");
112
+ if (configEvents.length === 0) {
113
+ return 0;
114
+ }
115
+ return Math.max(...configEvents.map((e) => e.segment));
116
+ }
117
+ function getSession(events, segment) {
118
+ const config = events.find((e) => e.type === "config" && e.segment === segment);
119
+ const runs = events.filter((e) => e.type === "run" && e.segment === segment);
120
+ const experiments = events.filter((e) => e.type === "experiment" && e.segment === segment);
121
+ return { segment, config, runs, experiments };
122
+ }
123
+ function getCurrentConfig(events) {
124
+ const segment = getCurrentSegment(events);
125
+ return events.find((e) => e.type === "config" && e.segment === segment);
126
+ }
127
+ function findRun(events, runId) {
128
+ return events.find((e) => e.type === "run" && e.run_id === runId);
129
+ }
130
+ function computeStatusSummary(events, currentBranch, gitDirty) {
131
+ const segment = getCurrentSegment(events);
132
+ const session = getSession(events, segment);
133
+ const config = session.config;
134
+ const keep_count = session.experiments.filter((e) => e.status === "keep").length;
135
+ const discard_count = session.experiments.filter((e) => e.status === "discard").length;
136
+ const crash_count = session.experiments.filter((e) => e.status === "crash").length;
137
+ const checks_failed_count = session.experiments.filter((e) => e.status === "checks_failed").length;
138
+ const baselineRun = session.runs.find((r) => r.status === "ok");
139
+ const baseline_metric = config && baselineRun ? (baselineRun.metrics[config.metric_name] ?? null) : null;
140
+ let best_metric = null;
141
+ let best_run_id = null;
142
+ if (config) {
143
+ const validExperiments = session.experiments.filter((e) => e.status === "keep");
144
+ if (validExperiments.length > 0) {
145
+ if (config.direction === "lower") {
146
+ const best = validExperiments.reduce((min, e) => e.metric < min.metric ? e : min);
147
+ best_metric = best.metric;
148
+ best_run_id = best.run_id;
149
+ }
150
+ else {
151
+ const best = validExperiments.reduce((max, e) => e.metric > max.metric ? e : max);
152
+ best_metric = best.metric;
153
+ best_run_id = best.run_id;
154
+ }
155
+ }
156
+ }
157
+ return {
158
+ segment,
159
+ total_runs: session.runs.length,
160
+ keep_count,
161
+ discard_count,
162
+ crash_count,
163
+ checks_failed_count,
164
+ baseline_metric,
165
+ best_metric,
166
+ best_run_id,
167
+ current_branch: currentBranch,
168
+ git_dirty: gitDirty,
169
+ direction: config?.direction ?? null,
170
+ metric_name: config?.metric_name ?? null,
171
+ };
172
+ }
173
+ async function getGitStatus($, directory) {
174
+ try {
175
+ const result = await $ `git -C ${directory} status --porcelain -b`
176
+ .quiet()
177
+ .nothrow();
178
+ if (result.exitCode !== 0) {
179
+ return {
180
+ isRepo: false,
181
+ branch: null,
182
+ isDirty: false,
183
+ hasStaged: false,
184
+ hasUnstaged: false,
185
+ untrackedFiles: [],
186
+ };
187
+ }
188
+ const output = result.stdout.toString();
189
+ const lines = output.split("\n");
190
+ let branch = null;
191
+ const statusLines = [];
192
+ for (const line of lines) {
193
+ if (line.startsWith("## ")) {
194
+ const match = line.match(/^##\s+(\S+)/);
195
+ if (match) {
196
+ const branchName = match[1].replace(/^\*/, "").trim();
197
+ if (branchName) {
198
+ branch = branchName;
199
+ }
200
+ }
201
+ }
202
+ else if (line.trim()) {
203
+ statusLines.push(line);
204
+ }
205
+ }
206
+ let hasStaged = false;
207
+ let hasUnstaged = false;
208
+ const untrackedFiles = [];
209
+ for (const line of statusLines) {
210
+ if (line.startsWith("??") && line.length > 3) {
211
+ untrackedFiles.push(line.slice(3));
212
+ }
213
+ else if (line.length > 1) {
214
+ const status = line[1];
215
+ if (status === "M" || status === "D") {
216
+ hasUnstaged = true;
217
+ }
218
+ if (line[0] === " " && (status === "M" || status === "D")) {
219
+ hasStaged = true;
220
+ }
221
+ }
222
+ }
223
+ return {
224
+ isRepo: true,
225
+ branch,
226
+ isDirty: statusLines.length > 0,
227
+ hasStaged,
228
+ hasUnstaged,
229
+ untrackedFiles,
230
+ };
231
+ }
232
+ catch {
233
+ return {
234
+ isRepo: false,
235
+ branch: null,
236
+ isDirty: false,
237
+ hasStaged: false,
238
+ hasUnstaged: false,
239
+ untrackedFiles: [],
240
+ };
241
+ }
242
+ }
243
+ async function stageAll($, directory) {
244
+ const result = await $ `git -C ${directory} add -A`.quiet().nothrow();
245
+ if (result.exitCode !== 0) {
246
+ throw new Error(`Failed to stage changes: ${result.stderr.toString()}`);
247
+ }
248
+ }
249
+ async function commit($, directory, message) {
250
+ const result = await $ `git -C ${directory} commit -m ${message}`
251
+ .quiet()
252
+ .nothrow();
253
+ if (result.exitCode !== 0) {
254
+ throw new Error(`Failed to commit: ${result.stderr.toString()}`);
255
+ }
256
+ const commitResult = await $ `git -C ${directory} rev-parse --short HEAD`.quiet();
257
+ return commitResult.stdout.toString().trim();
258
+ }
259
+ async function discardChanges($, directory) {
260
+ await $ `git -C ${directory} reset --hard`.quiet().nothrow();
261
+ await $ `git -C ${directory} clean -fd`.quiet().nothrow();
262
+ }
263
+ // --- metrics ---
264
+ const METRIC_LINE_REGEX = /^METRIC\s+(\w+)\s*=\s*(-?\d+(?:\.\d+)?)$/;
265
+ function parseMetrics(output) {
266
+ const metrics = {};
267
+ const lines = output.split("\n");
268
+ for (const line of lines) {
269
+ const match = line.trim().match(METRIC_LINE_REGEX);
270
+ if (match) {
271
+ const [, name, valueStr] = match;
272
+ const value = parseFloat(valueStr);
273
+ if (!isNaN(value)) {
274
+ metrics[name] = value;
275
+ }
276
+ }
277
+ }
278
+ return metrics;
279
+ }
280
+ const CHECKS_SCRIPT = "autoresearch.checks.sh";
281
+ async function runProcessWithTimeout($, directory, command, timeoutSeconds) {
282
+ const cmd = command.join(" ");
283
+ const timeoutMs = Math.max(1000, timeoutSeconds * 1000);
284
+ const shellPromise = $ `bash -c ${cmd}`
285
+ .cwd(directory)
286
+ .quiet()
287
+ .nothrow();
288
+ const timeoutPromise = new Promise((_, reject) => {
289
+ setTimeout(() => reject(new Error("TIMEOUT")), timeoutMs);
290
+ });
291
+ try {
292
+ const result = await Promise.race([shellPromise, timeoutPromise]);
293
+ return {
294
+ exitCode: result.exitCode,
295
+ stdout: result.stdout?.toString() || "",
296
+ stderr: result.stderr?.toString() || "",
297
+ timedOut: false,
298
+ };
299
+ }
300
+ catch (error) {
301
+ if (error instanceof Error && error.message === "TIMEOUT") {
302
+ return {
303
+ exitCode: -1,
304
+ stdout: "",
305
+ stderr: "",
306
+ timedOut: true,
307
+ };
308
+ }
309
+ throw error;
310
+ }
311
+ }
312
+ async function runChecks($, directory, timeoutSeconds = 300, checksCommand) {
313
+ const scriptPath = `${directory}/${CHECKS_SCRIPT}`;
314
+ if (!checksCommand && !existsSync(scriptPath)) {
315
+ return {
316
+ pass: true,
317
+ duration_seconds: 0,
318
+ output: "No checks script found",
319
+ };
320
+ }
321
+ const startTime = Date.now();
322
+ try {
323
+ const result = checksCommand
324
+ ? await runProcessWithTimeout($, directory, ["bash", "-c", checksCommand], timeoutSeconds)
325
+ : await runProcessWithTimeout($, directory, ["bash", scriptPath], timeoutSeconds);
326
+ const duration_seconds = (Date.now() - startTime) / 1000;
327
+ if (result.timedOut) {
328
+ return {
329
+ pass: false,
330
+ duration_seconds,
331
+ output: result.stdout,
332
+ error: "Checks timed out",
333
+ timedOut: true,
334
+ };
335
+ }
336
+ if (result.exitCode !== 0) {
337
+ return {
338
+ pass: false,
339
+ duration_seconds,
340
+ output: result.stdout,
341
+ error: result.stderr || "Checks failed",
342
+ };
343
+ }
344
+ return {
345
+ pass: true,
346
+ duration_seconds,
347
+ output: result.stdout,
348
+ };
349
+ }
350
+ catch (error) {
351
+ const duration_seconds = (Date.now() - startTime) / 1000;
352
+ return {
353
+ pass: false,
354
+ duration_seconds,
355
+ output: "",
356
+ error: error instanceof Error ? error.message : String(error),
357
+ };
358
+ }
359
+ }
360
+ // --- markdown ---
361
+ const MARKDOWN_FILE = "autoresearch.md";
362
+ async function loadMarkdown(directory) {
363
+ const filePath = `${directory}/${MARKDOWN_FILE}`;
364
+ if (!existsSync(filePath)) {
365
+ return "";
366
+ }
367
+ return readFile(filePath, "utf-8");
368
+ }
369
+ async function saveMarkdown(directory, content) {
370
+ const filePath = `${directory}/${MARKDOWN_FILE}`;
371
+ await writeFile(filePath, content, "utf-8");
372
+ }
373
+ function generateMarkdownTemplate(config) {
374
+ const timestamp = new Date().toISOString();
375
+ return `# Autoresearch: ${config.name}
376
+
377
+ **Started:** ${timestamp}
378
+ **Segment:** ${config.segment}
379
+ **Branch:** ${config.branch || "Not specified"}
380
+
381
+ ## Objective
382
+
383
+ Optimize ${config.metric_name} (${config.direction} is better).
384
+
385
+ ## Primary Metric
386
+
387
+ - **Name:** ${config.metric_name}
388
+ - **Unit:** ${config.metric_unit || "Not specified"}
389
+ - **Direction:** ${config.direction}
390
+
391
+ ## Command
392
+
393
+ \`\`\`bash
394
+ ${config.command || "Not specified"}
395
+ \`\`\`
396
+
397
+ ## Checks Command
398
+
399
+ ${config.checks_command || "Not specified"}
400
+
401
+ ## Files in Scope
402
+
403
+ ${config.files_in_scope?.map((f) => `- ${f}`).join("\n") || "Not specified"}
404
+
405
+ ## Baseline
406
+
407
+ *Will be populated after first successful run*
408
+
409
+ ## Best Run
410
+
411
+ *Will be populated after first kept experiment*
412
+
413
+ ## Experiments
414
+
415
+ | Run | Status | Metric | Description |
416
+ |-----|--------|--------|-------------|
417
+
418
+ ## Tried Ideas
419
+
420
+ - [ ] Initial baseline
421
+
422
+ ## Dead Ends
423
+
424
+ *None yet*
425
+
426
+ ## Next Ideas
427
+
428
+ - [ ] Establish baseline
429
+ `;
430
+ }
431
+ function updateMarkdownWithSession(existingContent, session, summary) {
432
+ const config = session.config;
433
+ if (!config)
434
+ return existingContent;
435
+ let content = existingContent;
436
+ const improvement = summary.baseline_metric !== null &&
437
+ summary.best_metric !== null &&
438
+ summary.baseline_metric !== 0
439
+ ? ((config.direction === "lower" ? -1 : 1) *
440
+ (((summary.best_metric - summary.baseline_metric) /
441
+ summary.baseline_metric) *
442
+ 100)).toFixed(2) + "%"
443
+ : "N/A";
444
+ if (summary.baseline_metric !== null) {
445
+ const baselineSection = `## Baseline
446
+
447
+ - **Metric:** ${summary.baseline_metric}${config.metric_unit ? ` ${config.metric_unit}` : ""}
448
+ - **Status:** Established`;
449
+ if (content.includes("## Baseline")) {
450
+ content = content.replace(/## Baseline[\s\S]*?(?=\n## |$)/, baselineSection + "\n\n");
451
+ }
452
+ }
453
+ if (summary.best_metric !== null && summary.best_run_id) {
454
+ const bestSection = `## Best Run
455
+
456
+ - **Run ID:** ${summary.best_run_id}
457
+ - **Metric:** ${summary.best_metric}${config.metric_unit ? ` ${config.metric_unit}` : ""}
458
+ - **Improvement:** ${improvement}`;
459
+ if (content.includes("## Best Run")) {
460
+ content = content.replace(/## Best Run[\s\S]*?(?=\n## |$)/, bestSection + "\n\n");
461
+ }
462
+ }
463
+ if (session.experiments.length > 0) {
464
+ const tableRows = session.experiments
465
+ .map((e) => `| ${e.run_id} | ${e.status} | ${e.metric}${config.metric_unit ? ` ${config.metric_unit}` : ""} | ${e.description} |`)
466
+ .join("\n");
467
+ const tableHeader = `| Run | Status | Metric | Description |
468
+ |-----|--------|--------|-------------|`;
469
+ const tableSection = `## Experiments
470
+
471
+ ${tableHeader}
472
+ ${tableRows}`;
473
+ if (content.includes("## Experiments")) {
474
+ content = content.replace(/## Experiments[\s\S]*?(?=\n## |$)/, tableSection + "\n\n");
475
+ }
476
+ }
477
+ return content;
478
+ }
479
+ // --- plugin ---
480
+ const z = tool.schema;
481
+ export const AutoresearchPlugin = async ({ $ }) => {
482
+ return {
483
+ tool: {
484
+ init_experiment: tool({
485
+ description: "Initialize a new autoresearch experiment session. Creates autoresearch.jsonl and autoresearch.md files.",
486
+ args: {
487
+ name: z.string().describe("Name of the experiment"),
488
+ metric_name: z.string().describe("Primary metric to optimize"),
489
+ metric_unit: z.string().optional().describe("Unit of the metric"),
490
+ direction: z
491
+ .enum(["lower", "higher"])
492
+ .describe("Whether 'lower' or 'higher' metric values are better"),
493
+ command: z
494
+ .string()
495
+ .optional()
496
+ .describe("Command to run for experiments"),
497
+ checks_command: z
498
+ .string()
499
+ .optional()
500
+ .describe("Command to run after experiments for validation checks"),
501
+ branch: z
502
+ .string()
503
+ .optional()
504
+ .describe("Git branch for this experiment"),
505
+ files_in_scope: z
506
+ .array(z.string())
507
+ .optional()
508
+ .describe("Files involved in the experiment"),
509
+ },
510
+ execute: async (args, context) => {
511
+ const { directory: ctxDir } = context;
512
+ const events = await loadEvents(ctxDir);
513
+ const segment = getCurrentSegment(events) + 1;
514
+ const config = {
515
+ type: "config",
516
+ timestamp: new Date().toISOString(),
517
+ name: args.name,
518
+ metric_name: args.metric_name,
519
+ metric_unit: args.metric_unit,
520
+ direction: args.direction,
521
+ command: args.command,
522
+ checks_command: args.checks_command,
523
+ branch: args.branch,
524
+ files_in_scope: args.files_in_scope,
525
+ segment,
526
+ };
527
+ await appendEvent(ctxDir, config);
528
+ const markdownContent = generateMarkdownTemplate(config);
529
+ await saveMarkdown(ctxDir, markdownContent);
530
+ return JSON.stringify({
531
+ success: true,
532
+ segment,
533
+ config: {
534
+ name: args.name,
535
+ metric_name: args.metric_name,
536
+ direction: args.direction,
537
+ command: args.command,
538
+ checks_command: args.checks_command,
539
+ branch: args.branch,
540
+ },
541
+ files_created: ["autoresearch.jsonl", "autoresearch.md"],
542
+ });
543
+ },
544
+ }),
545
+ run_experiment: tool({
546
+ description: "Execute an experiment command, capture metrics from METRIC name=value output lines, and optionally run checks.",
547
+ args: {
548
+ command: z
549
+ .string()
550
+ .optional()
551
+ .describe("Command to run (overrides stored command)"),
552
+ timeout_seconds: z
553
+ .number()
554
+ .int()
555
+ .default(DEFAULT_COMMAND_TIMEOUT_SECONDS)
556
+ .describe("Timeout for the command in seconds. Non-positive values use the default."),
557
+ checks_timeout_seconds: z
558
+ .number()
559
+ .int()
560
+ .default(DEFAULT_CHECKS_TIMEOUT_SECONDS)
561
+ .describe("Timeout for checks in seconds. Non-positive values use the default."),
562
+ },
563
+ execute: async (args, context) => {
564
+ const { directory: ctxDir } = context;
565
+ const events = await loadEvents(ctxDir);
566
+ const config = getCurrentConfig(events);
567
+ if (!config) {
568
+ throw new Error("No active experiment found. Run init_experiment first.");
569
+ }
570
+ const command = args.command ?? config.command;
571
+ if (!command) {
572
+ throw new Error("No command specified and no stored command in config");
573
+ }
574
+ const runId = randomUUID();
575
+ const segment = config.segment;
576
+ const timeoutSeconds = normalizeTimeoutSeconds(args.timeout_seconds, DEFAULT_COMMAND_TIMEOUT_SECONDS);
577
+ const checksTimeoutSeconds = normalizeTimeoutSeconds(args.checks_timeout_seconds, DEFAULT_CHECKS_TIMEOUT_SECONDS);
578
+ const startTime = Date.now();
579
+ let status = "ok";
580
+ let exitCode;
581
+ let commandOutput = "";
582
+ let timedOut = false;
583
+ try {
584
+ const result = await runProcessWithTimeout($, ctxDir, ["bash", "-c", command], timeoutSeconds);
585
+ exitCode = result.exitCode;
586
+ commandOutput = result.stdout + result.stderr;
587
+ timedOut = result.timedOut;
588
+ if (timedOut) {
589
+ status = "timeout";
590
+ }
591
+ else if (exitCode !== 0) {
592
+ status = "crash";
593
+ }
594
+ }
595
+ catch (error) {
596
+ status = "crash";
597
+ commandOutput =
598
+ error instanceof Error ? error.message : String(error);
599
+ }
600
+ const duration_seconds = (Date.now() - startTime) / 1000;
601
+ const metrics = parseMetrics(commandOutput);
602
+ const primaryMetric = metrics[config.metric_name] ?? null;
603
+ let checksPass;
604
+ let checks_duration_seconds;
605
+ if (status === "ok") {
606
+ const checkResult = await runChecks($, ctxDir, checksTimeoutSeconds, config.checks_command);
607
+ checksPass = checkResult.pass;
608
+ checks_duration_seconds = checkResult.duration_seconds;
609
+ if (!checkResult.pass) {
610
+ status = "checks_failed";
611
+ }
612
+ }
613
+ const runEvent = {
614
+ type: "run",
615
+ timestamp: new Date().toISOString(),
616
+ run_id: runId,
617
+ segment,
618
+ command,
619
+ status,
620
+ duration_seconds,
621
+ timed_out: timedOut ? true : undefined,
622
+ exit_code: exitCode,
623
+ metrics,
624
+ checks_pass: checksPass,
625
+ checks_duration_seconds,
626
+ log_tail: commandOutput.slice(-2000),
627
+ };
628
+ await appendEvent(ctxDir, runEvent);
629
+ return JSON.stringify({
630
+ run_id: runId,
631
+ status,
632
+ primary_metric: primaryMetric,
633
+ metrics,
634
+ duration_seconds,
635
+ checks_pass: checksPass,
636
+ checks_duration_seconds,
637
+ log_tail: runEvent.log_tail,
638
+ });
639
+ },
640
+ }),
641
+ log_experiment: tool({
642
+ description: "Log an experiment result with decision (keep/discard/crash/checks_failed). Updates autoresearch.jsonl and autoresearch.md.",
643
+ args: {
644
+ run_id: z.string().describe("ID of the run to log"),
645
+ commit: z.string().describe("Commit hash before the experiment"),
646
+ metric: z.number().describe("Primary metric value"),
647
+ status: z
648
+ .enum(["keep", "discard", "crash", "checks_failed"])
649
+ .describe("Status of the experiment"),
650
+ description: z.string().describe("Description of what changed"),
651
+ metrics: z
652
+ .record(z.string(), z.number())
653
+ .optional()
654
+ .describe("Secondary metrics"),
655
+ },
656
+ execute: async (args, context) => {
657
+ const { directory: ctxDir } = context;
658
+ const events = await loadEvents(ctxDir);
659
+ const config = getCurrentConfig(events);
660
+ if (!config) {
661
+ throw new Error("No active experiment found. Run init_experiment first.");
662
+ }
663
+ const run = findRun(events, args.run_id);
664
+ if (!run) {
665
+ throw new Error(`Run '${args.run_id}' not found in current segment`);
666
+ }
667
+ if (run.segment !== config.segment) {
668
+ throw new Error(`Run '${args.run_id}' belongs to segment ${run.segment}, current segment is ${config.segment}`);
669
+ }
670
+ const experimentEvent = {
671
+ type: "experiment",
672
+ timestamp: new Date().toISOString(),
673
+ run_id: args.run_id,
674
+ segment: config.segment,
675
+ commit_before: args.commit,
676
+ metric: args.metric,
677
+ metrics: args.metrics,
678
+ status: args.status,
679
+ description: args.description,
680
+ };
681
+ await appendEvent(ctxDir, experimentEvent);
682
+ const updatedEvents = [...events, experimentEvent];
683
+ const markdownContent = await loadMarkdown(ctxDir);
684
+ const summary = computeStatusSummary(updatedEvents, null, false);
685
+ const session = getSession(updatedEvents, config.segment);
686
+ const updatedMarkdown = updateMarkdownWithSession(markdownContent, session, summary);
687
+ await saveMarkdown(ctxDir, updatedMarkdown);
688
+ return JSON.stringify({
689
+ success: true,
690
+ run_id: args.run_id,
691
+ status: args.status,
692
+ metric: args.metric,
693
+ segment: config.segment,
694
+ });
695
+ },
696
+ }),
697
+ keep_experiment: tool({
698
+ description: "Commit the current experiment changes to git. Stages all changes and creates a commit.",
699
+ args: {
700
+ commit_message: z
701
+ .string()
702
+ .describe("Commit message for the kept experiment"),
703
+ },
704
+ execute: async (args, context) => {
705
+ const { directory: ctxDir } = context;
706
+ const gitStatus = await getGitStatus($, ctxDir);
707
+ if (!gitStatus.isRepo) {
708
+ throw new Error("Not in a git repository. Initialize git first.");
709
+ }
710
+ if (!gitStatus.isDirty) {
711
+ throw new Error("No changes to commit. Make some changes first.");
712
+ }
713
+ await stageAll($, ctxDir);
714
+ const commitHash = await commit($, ctxDir, args.commit_message);
715
+ return JSON.stringify({
716
+ success: true,
717
+ commit_hash: commitHash,
718
+ branch: gitStatus.branch,
719
+ message: args.commit_message,
720
+ });
721
+ },
722
+ }),
723
+ discard_experiment: tool({
724
+ description: "Discard uncommitted changes to restore the pre-experiment state. Requires explicit confirmation.",
725
+ args: {
726
+ confirmation: z.string().describe("Type 'DISCARD' to confirm"),
727
+ },
728
+ execute: async (args, context) => {
729
+ const { directory: ctxDir } = context;
730
+ if (args.confirmation !== "DISCARD") {
731
+ throw new Error("Confirmation required: type 'DISCARD' to confirm discarding changes");
732
+ }
733
+ const gitStatus = await getGitStatus($, ctxDir);
734
+ if (!gitStatus.isRepo) {
735
+ throw new Error("Not in a git repository.");
736
+ }
737
+ if (!gitStatus.isDirty) {
738
+ return JSON.stringify({
739
+ success: true,
740
+ message: "No changes to discard",
741
+ discarded_files: [],
742
+ });
743
+ }
744
+ await discardChanges($, ctxDir);
745
+ return JSON.stringify({
746
+ success: true,
747
+ message: "Changes discarded successfully",
748
+ discarded_files: gitStatus.untrackedFiles,
749
+ restored_changes: gitStatus.hasStaged || gitStatus.hasUnstaged,
750
+ });
751
+ },
752
+ }),
753
+ autoresearch_status: tool({
754
+ description: "Get the current status of the autoresearch session including metrics, experiment counts, and git state.",
755
+ args: {},
756
+ execute: async (_args, context) => {
757
+ const { directory: ctxDir } = context;
758
+ const events = await loadEvents(ctxDir);
759
+ const gitStatus = await getGitStatus($, ctxDir);
760
+ const summary = computeStatusSummary(events, gitStatus.branch, gitStatus.isDirty);
761
+ const config = getCurrentConfig(events);
762
+ return JSON.stringify({
763
+ segment: summary.segment,
764
+ experiment_name: config?.name ?? null,
765
+ metric_name: summary.metric_name,
766
+ direction: summary.direction,
767
+ total_runs: summary.total_runs,
768
+ keep_count: summary.keep_count,
769
+ discard_count: summary.discard_count,
770
+ crash_count: summary.crash_count,
771
+ checks_failed_count: summary.checks_failed_count,
772
+ baseline_metric: summary.baseline_metric,
773
+ best_metric: summary.best_metric,
774
+ best_run_id: summary.best_run_id,
775
+ current_branch: summary.current_branch,
776
+ git_dirty: summary.git_dirty,
777
+ files: {
778
+ jsonl: "autoresearch.jsonl",
779
+ markdown: "autoresearch.md",
780
+ },
781
+ });
782
+ },
783
+ }),
784
+ },
785
+ };
786
+ };
787
+ export default AutoresearchPlugin;
package/package.json ADDED
@@ -0,0 +1,40 @@
1
+ {
2
+ "name": "@sndrgrdn/opencode-autoresearch",
3
+ "version": "0.1.0",
4
+ "description": "Autonomous experiment loop plugin for OpenCode - optimize code through iterative experimentation",
5
+ "type": "module",
6
+ "exports": {
7
+ ".": {
8
+ "types": "./dist/index.d.ts",
9
+ "default": "./dist/index.js"
10
+ }
11
+ },
12
+ "files": ["dist/", "skills/", "README.md"],
13
+ "scripts": {
14
+ "build": "tsc",
15
+ "dev": "tsc --watch",
16
+ "typecheck": "tsc --noEmit",
17
+ "clean": "rm -rf dist/",
18
+ "smoke": "bun run build && bun run smoke.ts"
19
+ },
20
+ "dependencies": {
21
+ "@opencode-ai/plugin": "^1.2.26",
22
+ "zod": "^4.3.6"
23
+ },
24
+ "devDependencies": {
25
+ "@types/bun": "latest",
26
+ "typescript": "^5.9.3"
27
+ },
28
+ "engines": {
29
+ "bun": ">=1.0.0"
30
+ },
31
+ "keywords": ["opencode", "plugin", "autoresearch", "optimization", "experiment"],
32
+ "repository": {
33
+ "type": "git",
34
+ "url": "https://github.com/sndrgrdn/opencode-autoresearch.git"
35
+ },
36
+ "license": "MIT",
37
+ "publishConfig": {
38
+ "access": "public"
39
+ }
40
+ }
@@ -0,0 +1,110 @@
1
+ ---
2
+ name: autoresearch
3
+ description: Set up and run an autonomous experiment loop for any optimization target. Gathers what to optimize, then starts the loop immediately. Use when asked to "run autoresearch", "optimize X in a loop", "set up autoresearch for X", or "start experiments".
4
+ ---
5
+
6
+ # Autoresearch
7
+
8
+ Autonomous experiment loop: try ideas, keep what works, discard what doesn't, never stop.
9
+
10
+ ## Tools
11
+
12
+ - **`init_experiment`** — configure session (name, metric, unit, direction). Call again to re-initialize with a new baseline when the optimization target changes.
13
+ - **`run_experiment`** — runs command, times it, captures output.
14
+ - **`log_experiment`** — records result. Always include secondary `metrics` dict. Dashboard: ctrl+x.
15
+ - **`keep_experiment`** — commits kept changes after a successful run.
16
+ - **`discard_experiment`** — reverts discarded/crashed/failed-check changes. Use confirmation `DISCARD`.
17
+ - **`autoresearch_status`** — shows current session stats and history.
18
+
19
+ ## Setup
20
+
21
+ 1. Ask (or infer): **Goal**, **Command**, **Metric** (+ direction), **Files in scope**, **Constraints**.
22
+ 2. `git checkout -b autoresearch/<goal>-<date>`
23
+ 3. Read the source files. Understand the workload deeply before writing anything.
24
+ 4. Write `autoresearch.md` and `autoresearch.sh` (see below). Commit both.
25
+ 5. `init_experiment` → run baseline → `log_experiment` → start looping immediately.
26
+
27
+ ### `autoresearch.md`
28
+
29
+ This is the heart of the session. A fresh agent with no context should be able to read this file and run the loop effectively. Invest time making it excellent.
30
+
31
+ ```markdown
32
+ # Autoresearch: <goal>
33
+
34
+ ## Objective
35
+ <Specific description of what we're optimizing and the workload.>
36
+
37
+ ## Metrics
38
+ - **Primary**: <name> (<unit>, lower/higher is better)
39
+ - **Secondary**: <name>, ...
40
+
41
+ ## How to Run
42
+ `./autoresearch.sh` — outputs `METRIC name=number` lines.
43
+
44
+ ## Files in Scope
45
+ <Every file the agent may modify, with a brief note on what it does.>
46
+
47
+ ## Off Limits
48
+ <What must NOT be touched.>
49
+
50
+ ## Constraints
51
+ <Hard rules: tests must pass, no new deps, etc.>
52
+
53
+ ## What's Been Tried
54
+ <Update this section as experiments accumulate. Note key wins, dead ends,
55
+ and architectural insights so the agent doesn't repeat failed approaches.>
56
+ ```
57
+
58
+ Update `autoresearch.md` periodically — especially the "What's Been Tried" section — so resuming agents have full context.
59
+
60
+ ### `autoresearch.sh`
61
+
62
+ Bash script (`set -euo pipefail`) that: pre-checks fast (syntax errors in <1s), runs the benchmark, outputs `METRIC name=number` lines. Keep it fast — every second is multiplied by hundreds of runs. Update it during the loop as needed.
63
+
64
+ ### `autoresearch.checks.sh` (optional)
65
+
66
+ Bash script (`set -euo pipefail`) for backpressure/correctness checks: tests, types, lint, etc. **Only create this file when the user's constraints require correctness validation** (e.g., "tests must pass", "types must check").
67
+
68
+ When this file exists:
69
+ - Runs automatically after every **passing** benchmark in `run_experiment`.
70
+ - If checks fail, `run_experiment` reports it clearly — log as `checks_failed`.
71
+ - Its execution time does **NOT** affect the primary metric.
72
+ - You cannot `keep` a result when checks have failed.
73
+ - Has a separate timeout (default 300s, configurable via `checks_timeout_seconds`).
74
+
75
+ When this file does **not** exist, everything behaves exactly as before — no changes to the loop.
76
+
77
+ **Keep output minimal.** Only the last 80 lines of checks output are fed back to the agent on failure. Suppress verbose progress/success output and let only errors through. This keeps context lean and helps the agent pinpoint what broke.
78
+
79
+ ```bash
80
+ #!/bin/bash
81
+ set -euo pipefail
82
+ # Example: run tests and typecheck — suppress success output, only show errors
83
+ pnpm test --run --reporter=dot 2>&1 | tail -50
84
+ pnpm typecheck 2>&1 | grep -i error || true
85
+ ```
86
+
87
+ ## Loop Rules
88
+
89
+ **LOOP FOREVER.** Never ask "should I continue?" — the user expects autonomous work.
90
+
91
+ - **Primary metric is king.** Improved → `keep_experiment`. Worse/equal → `discard_experiment`. Secondary metrics rarely affect this.
92
+ - **Simpler is better.** Removing code for equal perf = keep. Ugly complexity for tiny gain = probably discard.
93
+ - **Don't thrash.** Repeatedly reverting the same idea? Try something structurally different.
94
+ - **Crashes:** fix if trivial, otherwise log and move on. Don't over-invest.
95
+ - **Think longer when stuck.** Re-read source files, study the profiling data, reason about what the CPU is actually doing. The best ideas come from deep understanding, not from trying random variations.
96
+ - **Resuming:** if `autoresearch.md` exists, read it + git log, continue looping.
97
+
98
+ When reviewing progress or deciding what to try next, call `autoresearch_status`.
99
+
100
+ **NEVER STOP.** The user may be away for hours. Keep going until interrupted.
101
+
102
+ ## Ideas Backlog
103
+
104
+ When you discover complex but promising optimizations that you won't pursue right now, **append them as bullets to `autoresearch.ideas.md`**. Don't let good ideas get lost.
105
+
106
+ On resume (context limit, crash), check `autoresearch.ideas.md` — prune stale/tried entries, experiment with the rest. When all paths are exhausted, delete the file and write a final summary.
107
+
108
+ ## User Messages During Experiments
109
+
110
+ If the user sends a message while an experiment is running, finish the current `run_experiment` + `log_experiment` cycle first, then incorporate their feedback in the next iteration. Don't abandon a running experiment.