simmer-autoresearch 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/SKILL.md ADDED
@@ -0,0 +1,128 @@
1
+ ---
2
+ name: simmer-autoresearch
3
+ description: Set up and run autonomous experiment loops to optimize Simmer trading skills. Mutates skill code + config, measures P&L, keeps what works. Use when asked to "optimize a skill", "run autoresearch", or "improve my trading".
4
+ ---
5
+
6
+ # Simmer Autoresearch
7
+
8
+ Autonomous experiment loop for trading skill optimization: try ideas, keep what works, discard what doesn't, never stop.
9
+
10
+ Fork of [pi-autoresearch](https://github.com/davebcn87/pi-autoresearch) adapted for prediction market trading.
11
+
12
+ ## Tools
13
+
14
+ - **`init_experiment`** — configure session (name, metric, unit, direction). Call again to re-initialize with a new baseline.
15
+ - **`run_experiment`** — runs skill command, times it, captures output.
16
+ - **`log_experiment`** — records result. `keep` auto-commits. `discard`/`crash` → `git checkout -- .` to revert. Always include secondary `metrics` dict.
17
+
18
+ ## Setup
19
+
20
+ 1. Ask (or infer): **Skill** (which skill to optimize), **Goal** (maximize P&L? find more trades? reduce drawdown?), **Constraints** (budget, venues, markets).
21
+ 2. `git checkout -b autoresearch/<skill>-<date>`
22
+ 3. Read the skill source files deeply. Understand the strategy before changing anything.
23
+ 4. Write `autoresearch.md` and `autoresearch.sh` (see below). Commit both.
24
+ 5. `init_experiment` → run baseline → `log_experiment` → start looping immediately.
25
+
26
+ ### `autoresearch.md`
27
+
28
+ This is the heart of the session. A fresh agent with no context should be able to read this file and run the loop effectively.
29
+
30
+ ```markdown
31
+ # Autoresearch: Optimizing <skill-name> for <goal>
32
+
33
+ ## Objective
34
+ <What we're optimizing. E.g. "Maximize P&L on polymarket-fast-loop by tuning
35
+ entry thresholds, momentum signals, and position sizing.">
36
+
37
+ ## Metrics
38
+ - **Primary**: pnl ($, higher is better)
39
+ - **Secondary**: trades, win_rate, max_drawdown
40
+
41
+ ## How to Run
42
+ `./autoresearch.sh` — runs the skill once, outputs METRIC lines.
43
+
44
+ ## Skill Overview
45
+ <Brief description of what the skill does, its strategy, signal sources.>
46
+
47
+ ## Files in Scope
48
+ <Every file the agent may modify, with a brief note on what it does.>
49
+ - `fast_loop.py` — main strategy logic, config schema, entry/exit rules
50
+ - `SKILL.md` — skill metadata (update version on significant changes)
51
+
52
+ ## Off Limits
53
+ - `simmer-sdk/` core — don't modify the SDK itself
54
+ - Other skills — only optimize the target skill
55
+ - API keys, secrets, wallet addresses
56
+
57
+ ## Constraints
58
+ - Skill must still pass: `python3 <skill>.py --live` exits 0
59
+ - Don't remove safety guards (max position size, daily budget caps)
60
+ - Config env vars must stay compatible with clawhub.json tunables
61
+ - Don't break the `load_config()` / `get_client()` patterns from simmer-sdk
62
+
63
+ ## Market Conditions
64
+ <Current market state that affects strategy. E.g. "Low volatility week,
65
+ few new markets, existing markets trading near consensus.">
66
+
67
+ ## What's Been Tried
68
+ <Update as experiments accumulate. Key wins, dead ends, insights.>
69
+ ```
70
+
71
+ ### `autoresearch.sh`
72
+
73
+ ```bash
74
+ #!/bin/bash
75
+ set -euo pipefail
76
+
77
+ # Run the skill and capture output
78
+ SKILL_DIR="skills/polymarket-fast-loop"
79
+ OUTPUT=$(python3 "$SKILL_DIR/fast_loop.py" --live 2>&1) || {
80
+ echo "METRIC pnl=0"
81
+ echo "METRIC trades=0"
82
+ echo "STATUS crash"
83
+ exit 1
84
+ }
85
+
86
+ # Extract metrics from skill output
87
+ # Skills print trade summaries — parse them
88
+ echo "$OUTPUT"
89
+
90
+ # TODO: Query Simmer API for actual P&L since experiment start
91
+ # For now, extract from skill stdout
92
+ echo "METRIC pnl=0"
93
+ echo "METRIC trades=0"
94
+ ```
95
+
96
+ ## Loop Rules
97
+
98
+ **LOOP FOREVER.** Never ask "should I continue?" — the user expects autonomous work.
99
+
100
+ - **Primary metric is king.** P&L improved → `keep`. Worse/equal → `discard`. Crash → `crash`.
101
+ - **Trades > 0 is the gate.** A config that produces 0 trades is `discard` regardless of everything else. The whole point is to find opportunities.
102
+ - **Quality gates (all must pass to `keep`):**
103
+ - `trades >= 3` — enough data to be meaningful
104
+ - `profit_factor >= 1.0` — not losing money overall
105
+ - P&L improved vs baseline
106
+ - If any gate fails, `discard` even if P&L looks better (could be one lucky trade)
107
+ - **One parameter at a time.** Change one thing, measure, keep or revert. Don't change 3 things and guess which one helped. Reference clawhub.json tunables for parameter bounds (min/max/step).
108
+ - **Simpler is better.** Removing code for equal P&L = keep. Complex hack for tiny gain = discard.
109
+ - **Don't thrash.** Repeatedly reverting the same idea? Try something structurally different.
110
+ - **Crashes:** fix if trivial (typo, missing import), otherwise log and move on.
111
+ - **Think longer when stuck.** Re-read the skill source, study market data, understand what the strategy is actually doing. The best improvements come from understanding, not random parameter sweeps.
112
+ - **Resuming:** if `autoresearch.md` exists, read it + git log, continue looping.
113
+
114
+ ## Trading-Specific Guidelines
115
+
116
+ - **Start with config tuning** (thresholds, filters, sizing). Low risk, fast iterations.
117
+ - **Graduate to strategy changes** once config space is explored. Signal logic, market selection, timing.
118
+ - **Watch for overfitting.** A change that works on today's markets may not generalize. Prefer robust improvements.
119
+ - **Respect position limits.** Never remove max_position_size or daily_budget caps — these are safety rails.
120
+ - **Market conditions change.** What works in high-volatility weeks may fail in quiet periods. Note conditions in autoresearch.md.
121
+
122
+ ## Ideas Backlog
123
+
124
+ When you discover promising optimizations you won't pursue right now, **append them to `autoresearch.ideas.md`**. Don't let good ideas get lost.
125
+
126
+ On resume, check the ideas file — prune stale entries, experiment with promising ones.
127
+
128
+ **NEVER STOP.** The user may be away for hours. Keep going until interrupted.
@@ -0,0 +1,65 @@
1
+ /**
2
+ * Simmer Autoresearch — OpenClaw Plugin
3
+ *
4
+ * Fork of pi-autoresearch adapted for trading skill optimization.
5
+ * Agent mutates skill code + config, runs experiments, measures P&L,
6
+ * keeps what works, discards what doesn't. Never stops.
7
+ *
8
+ * Original: https://github.com/davebcn87/pi-autoresearch
9
+ * License: MIT (Tobi Lutke + David Cortés)
10
+ */
11
+ interface SpawnResult {
12
+ stdout: string;
13
+ stderr: string;
14
+ code: number | null;
15
+ signal: NodeJS.Signals | null;
16
+ killed: boolean;
17
+ termination: "exit" | "timeout" | "no-output-timeout" | "signal";
18
+ }
19
+ interface PluginRuntime {
20
+ system: {
21
+ runCommandWithTimeout: (argv: string[], opts: {
22
+ timeoutMs: number;
23
+ cwd?: string;
24
+ env?: NodeJS.ProcessEnv;
25
+ }) => Promise<SpawnResult>;
26
+ };
27
+ }
28
+ interface PluginApi {
29
+ pluginConfig?: Record<string, unknown>;
30
+ logger: {
31
+ info: (msg: string) => void;
32
+ warn: (msg: string) => void;
33
+ error: (msg: string) => void;
34
+ };
35
+ runtime: PluginRuntime;
36
+ on: (hook: string, handler: (...args: unknown[]) => unknown, opts?: Record<string, unknown>) => void;
37
+ registerService: (service: {
38
+ id: string;
39
+ start: (ctx: ServiceCtx) => Promise<void>;
40
+ stop?: (ctx: ServiceCtx) => Promise<void>;
41
+ }) => void;
42
+ registerCommand: (cmd: {
43
+ name: string;
44
+ description: string;
45
+ acceptsArgs?: boolean;
46
+ handler: (ctx: CommandCtx) => Promise<{
47
+ text: string;
48
+ }>;
49
+ }) => void;
50
+ registerTool: (tool: Record<string, unknown>, opts?: Record<string, unknown>) => void;
51
+ }
52
+ interface ServiceCtx {
53
+ stateDir: string;
54
+ workspaceDir?: string;
55
+ logger: {
56
+ info: (msg: string) => void;
57
+ warn: (msg: string) => void;
58
+ error: (msg: string) => void;
59
+ };
60
+ }
61
+ interface CommandCtx {
62
+ args?: string;
63
+ }
64
+ export default function simmerAutoresearch(pluginApi: PluginApi): void;
65
+ export {};
package/dist/index.js ADDED
@@ -0,0 +1,609 @@
1
+ /**
2
+ * Simmer Autoresearch — OpenClaw Plugin
3
+ *
4
+ * Fork of pi-autoresearch adapted for trading skill optimization.
5
+ * Agent mutates skill code + config, runs experiments, measures P&L,
6
+ * keeps what works, discards what doesn't. Never stops.
7
+ *
8
+ * Original: https://github.com/davebcn87/pi-autoresearch
9
+ * License: MIT (Tobi Lutke + David Cortés)
10
+ */
11
+ import * as fs from "node:fs";
12
+ import * as path from "node:path";
13
+ // ---------------------------------------------------------------------------
14
+ // Helpers
15
+ // ---------------------------------------------------------------------------
16
+ function formatNum(value, unit) {
17
+ if (value === null)
18
+ return "—";
19
+ const u = unit || "";
20
+ if (value === Math.round(value))
21
+ return String(value) + u;
22
+ return value.toFixed(2) + u;
23
+ }
24
+ function isBetter(current, best, direction) {
25
+ return direction === "lower" ? current < best : current > best;
26
+ }
27
+ function currentResults(results, segment) {
28
+ return results.filter((r) => r.segment === segment);
29
+ }
30
+ function findBaselineMetric(results, segment) {
31
+ const cur = currentResults(results, segment);
32
+ return cur.length > 0 ? cur[0].metric : null;
33
+ }
34
+ function toolResult(text) {
35
+ return { content: [{ type: "text", text }] };
36
+ }
37
+ class SimmerApi {
38
+ apiKey;
39
+ apiUrl;
40
+ constructor(apiKey, apiUrl) {
41
+ this.apiKey = apiKey;
42
+ this.apiUrl = apiUrl;
43
+ }
44
+ async getOutcomes(skillSlug, since) {
45
+ try {
46
+ const url = `${this.apiUrl}/api/sdk/outcomes?skill=${encodeURIComponent(skillSlug)}&since=${encodeURIComponent(since)}`;
47
+ const resp = await fetch(url, {
48
+ headers: {
49
+ Authorization: `Bearer ${this.apiKey}`,
50
+ "Content-Type": "application/json",
51
+ },
52
+ });
53
+ if (!resp.ok)
54
+ return null;
55
+ const data = (await resp.json());
56
+ return {
57
+ trades: data.trades ?? 0,
58
+ pnl: data.pnl ?? 0,
59
+ wins: data.wins ?? 0,
60
+ losses: data.losses ?? 0,
61
+ };
62
+ }
63
+ catch {
64
+ return null;
65
+ }
66
+ }
67
+ }
68
+ // ---------------------------------------------------------------------------
69
+ // State Reconstruction (from pi-autoresearch JSONL pattern)
70
+ // ---------------------------------------------------------------------------
71
+ function reconstructState(workspaceDir) {
72
+ const state = {
73
+ results: [],
74
+ bestMetric: null,
75
+ bestDirection: "higher",
76
+ metricName: "pnl",
77
+ metricUnit: "$",
78
+ secondaryMetrics: [],
79
+ name: null,
80
+ currentSegment: 0,
81
+ };
82
+ const jsonlPath = path.join(workspaceDir, "autoresearch.jsonl");
83
+ try {
84
+ if (fs.existsSync(jsonlPath)) {
85
+ let segment = 0;
86
+ const lines = fs
87
+ .readFileSync(jsonlPath, "utf-8")
88
+ .trim()
89
+ .split("\n")
90
+ .filter(Boolean);
91
+ for (const line of lines) {
92
+ try {
93
+ const entry = JSON.parse(line);
94
+ if (entry.type === "config") {
95
+ if (entry.name)
96
+ state.name = entry.name;
97
+ if (entry.metricName)
98
+ state.metricName = entry.metricName;
99
+ if (entry.metricUnit !== undefined)
100
+ state.metricUnit = entry.metricUnit;
101
+ if (entry.bestDirection)
102
+ state.bestDirection = entry.bestDirection;
103
+ if (state.results.length > 0)
104
+ segment++;
105
+ state.currentSegment = segment;
106
+ continue;
107
+ }
108
+ state.results.push({
109
+ commit: entry.commit ?? "",
110
+ metric: entry.metric ?? 0,
111
+ metrics: entry.metrics ?? {},
112
+ status: entry.status ?? "keep",
113
+ description: entry.description ?? "",
114
+ timestamp: entry.timestamp ?? 0,
115
+ segment,
116
+ });
117
+ for (const name of Object.keys(entry.metrics ?? {})) {
118
+ if (!state.secondaryMetrics.find((m) => m.name === name)) {
119
+ let unit = "";
120
+ if (name.includes("pnl") || name.includes("budget"))
121
+ unit = "$";
122
+ else if (name.includes("rate") || name.includes("pct"))
123
+ unit = "%";
124
+ state.secondaryMetrics.push({ name, unit });
125
+ }
126
+ }
127
+ }
128
+ catch {
129
+ // Skip malformed lines
130
+ }
131
+ }
132
+ if (state.results.length > 0) {
133
+ state.bestMetric = findBaselineMetric(state.results, state.currentSegment);
134
+ }
135
+ }
136
+ }
137
+ catch {
138
+ // Fresh state
139
+ }
140
+ return state;
141
+ }
142
+ // ---------------------------------------------------------------------------
143
+ // Plugin Entry Point
144
+ // ---------------------------------------------------------------------------
145
+ export default function simmerAutoresearch(pluginApi) {
146
+ const pluginConfig = pluginApi.pluginConfig ?? {};
147
+ const apiKey = pluginConfig.apiKey || process.env.SIMMER_API_KEY || "";
148
+ const apiUrl = pluginConfig.apiUrl ||
149
+ process.env.SIMMER_API_URL ||
150
+ "https://api.simmer.markets";
151
+ const simmer = new SimmerApi(apiKey, apiUrl);
152
+ let state = {
153
+ results: [],
154
+ bestMetric: null,
155
+ bestDirection: "higher",
156
+ metricName: "pnl",
157
+ metricUnit: "$",
158
+ secondaryMetrics: [],
159
+ name: null,
160
+ currentSegment: 0,
161
+ };
162
+ let resolvedWorkspaceDir = "";
163
+ // --- Service: reconstruct state on start ---
164
+ pluginApi.registerService({
165
+ id: "simmer-autoresearch",
166
+ async start(ctx) {
167
+ resolvedWorkspaceDir = ctx.workspaceDir || process.cwd();
168
+ state = reconstructState(resolvedWorkspaceDir);
169
+ if (state.results.length > 0) {
170
+ ctx.logger.info(`[autoresearch] Restored ${state.results.length} experiments from JSONL (segment ${state.currentSegment})`);
171
+ }
172
+ ctx.logger.info("[autoresearch] Ready. Use /autoresearch <skill> to begin.");
173
+ },
174
+ });
175
+ // --- Inject context into LLM prompt ---
176
+ pluginApi.on("before_prompt_build", async () => {
177
+ const dir = resolvedWorkspaceDir;
178
+ if (!dir)
179
+ return { prependContext: "" };
180
+ const mdPath = path.join(dir, "autoresearch.md");
181
+ const ideasPath = path.join(dir, "autoresearch.ideas.md");
182
+ if (!fs.existsSync(mdPath))
183
+ return { prependContext: "" };
184
+ let context = "\n\n## Autoresearch Mode (ACTIVE)\n" +
185
+ "You are in autoresearch mode. Optimize trading skill performance through an autonomous experiment loop.\n" +
186
+ "Use init_experiment, run_experiment, and log_experiment tools. NEVER STOP until interrupted.\n" +
187
+ `Experiment rules: ${mdPath} — read this file at the start of every session and after compaction.\n` +
188
+ "Write promising but deferred optimizations as bullet points to autoresearch.ideas.md.\n" +
189
+ "If the user sends a follow-on message while an experiment is running, finish the current run_experiment + log_experiment cycle first.\n";
190
+ if (fs.existsSync(ideasPath)) {
191
+ context += `\n💡 Ideas backlog exists at ${ideasPath} — check it for promising experiment paths. Prune stale entries.\n`;
192
+ }
193
+ if (state.results.length > 0) {
194
+ const cur = currentResults(state.results, state.currentSegment);
195
+ const kept = cur.filter((r) => r.status === "keep").length;
196
+ const crashed = cur.filter((r) => r.status === "crash").length;
197
+ const discarded = cur.filter((r) => r.status === "discard").length;
198
+ let bestPrimary = null;
199
+ for (const r of cur) {
200
+ if (r.status === "keep" &&
201
+ r.metric !== 0 &&
202
+ (bestPrimary === null ||
203
+ isBetter(r.metric, bestPrimary, state.bestDirection))) {
204
+ bestPrimary = r.metric;
205
+ }
206
+ }
207
+ context += `\n### Experiment Progress\n`;
208
+ context += `- ${cur.length} experiments: ${kept} kept, ${discarded} discarded, ${crashed} crashed\n`;
209
+ context += `- Baseline ${state.metricName}: ${formatNum(state.bestMetric, state.metricUnit)}\n`;
210
+ if (bestPrimary !== null) {
211
+ context += `- Best ${state.metricName}: ${formatNum(bestPrimary, state.metricUnit)}\n`;
212
+ if (state.bestMetric !== null && state.bestMetric !== 0) {
213
+ const pct = ((bestPrimary - state.bestMetric) / Math.abs(state.bestMetric)) *
214
+ 100;
215
+ context += `- Improvement: ${pct > 0 ? "+" : ""}${pct.toFixed(1)}%\n`;
216
+ }
217
+ }
218
+ const recent = cur.slice(-5);
219
+ if (recent.length > 0) {
220
+ context += `\nRecent experiments:\n`;
221
+ for (const r of recent) {
222
+ const icon = r.status === "keep" ? "✓" : r.status === "crash" ? "✗" : "–";
223
+ context += ` ${icon} ${r.description} → ${state.metricName}: ${formatNum(r.metric, state.metricUnit)} (${r.status})\n`;
224
+ }
225
+ }
226
+ }
227
+ return { prependContext: context };
228
+ });
229
+ // -----------------------------------------------------------------------
230
+ // init_experiment tool
231
+ // -----------------------------------------------------------------------
232
+ pluginApi.registerTool({
233
+ name: "init_experiment",
234
+ description: "Initialize the experiment session. Call once before the first run_experiment to set the name, primary metric, unit, and direction. Writes config to autoresearch.jsonl.",
235
+ parameters: {
236
+ type: "object",
237
+ properties: {
238
+ name: {
239
+ type: "string",
240
+ description: 'Human-readable name (e.g. "Optimizing polymarket-ai-divergence for P&L")',
241
+ },
242
+ metric_name: {
243
+ type: "string",
244
+ description: 'Primary metric name (e.g. "pnl", "trades", "sharpe")',
245
+ },
246
+ metric_unit: {
247
+ type: "string",
248
+ description: 'Unit (e.g. "$", "%", "")',
249
+ },
250
+ direction: {
251
+ type: "string",
252
+ enum: ["lower", "higher"],
253
+ description: 'Whether "lower" or "higher" is better. Default: "higher"',
254
+ },
255
+ },
256
+ required: ["name", "metric_name"],
257
+ },
258
+ async execute(_toolCallId, params) {
259
+ const dir = resolvedWorkspaceDir;
260
+ if (!dir)
261
+ return toolResult("❌ No workspace directory. Start the service first.");
262
+ const isReinit = state.results.length > 0;
263
+ state.name = params.name;
264
+ state.metricName = params.metric_name;
265
+ state.metricUnit = params.metric_unit ?? "$";
266
+ if (params.direction === "lower" || params.direction === "higher") {
267
+ state.bestDirection = params.direction;
268
+ }
269
+ if (isReinit) {
270
+ state.currentSegment++;
271
+ }
272
+ state.bestMetric = null;
273
+ state.secondaryMetrics = [];
274
+ try {
275
+ const jsonlPath = path.join(dir, "autoresearch.jsonl");
276
+ const configLine = JSON.stringify({
277
+ type: "config",
278
+ name: state.name,
279
+ metricName: state.metricName,
280
+ metricUnit: state.metricUnit,
281
+ bestDirection: state.bestDirection,
282
+ }) + "\n";
283
+ if (isReinit) {
284
+ fs.appendFileSync(jsonlPath, configLine);
285
+ }
286
+ else {
287
+ fs.writeFileSync(jsonlPath, configLine);
288
+ }
289
+ }
290
+ catch (e) {
291
+ return toolResult(`⚠️ Failed to write autoresearch.jsonl: ${e instanceof Error ? e.message : String(e)}`);
292
+ }
293
+ const reinitNote = isReinit
294
+ ? " (re-initialized — previous results archived, new baseline needed)"
295
+ : "";
296
+ return toolResult(`✅ Experiment initialized: "${state.name}"${reinitNote}\n` +
297
+ `Metric: ${state.metricName} (${state.metricUnit || "unitless"}, ${state.bestDirection} is better)\n` +
298
+ `Config written to autoresearch.jsonl. Now run the baseline with run_experiment.`);
299
+ },
300
+ }, { name: "init_experiment" });
301
+ // -----------------------------------------------------------------------
302
+ // run_experiment tool
303
+ // -----------------------------------------------------------------------
304
+ pluginApi.registerTool({
305
+ name: "run_experiment",
306
+ description: "Run a shell command as an experiment. Times execution, captures output, detects pass/fail. Use for running skill scripts, tests, or benchmarks.",
307
+ parameters: {
308
+ type: "object",
309
+ properties: {
310
+ command: {
311
+ type: "string",
312
+ description: "Shell command to run (e.g. 'python3 skills/polymarket-ai-divergence/ai_divergence.py --live')",
313
+ },
314
+ timeout_seconds: {
315
+ type: "number",
316
+ description: "Kill after this many seconds (default: 600)",
317
+ },
318
+ },
319
+ required: ["command"],
320
+ },
321
+ async execute(_toolCallId, params) {
322
+ const dir = resolvedWorkspaceDir;
323
+ if (!dir)
324
+ return toolResult("❌ No workspace directory.");
325
+ const timeout = (params.timeout_seconds ?? 600) * 1000;
326
+ const t0 = Date.now();
327
+ let result;
328
+ try {
329
+ result = await pluginApi.runtime.system.runCommandWithTimeout(["bash", "-c", params.command], { timeoutMs: timeout, cwd: dir });
330
+ }
331
+ catch (e) {
332
+ return toolResult(`💥 FAILED to execute: ${e instanceof Error ? e.message : String(e)}`);
333
+ }
334
+ const durationSeconds = (Date.now() - t0) / 1000;
335
+ const output = (result.stdout + "\n" + result.stderr).trim();
336
+ const passed = result.code === 0 && !result.killed;
337
+ const timedOut = result.killed || result.termination === "timeout";
338
+ let text = "";
339
+ if (timedOut) {
340
+ text += `⏰ TIMEOUT after ${durationSeconds.toFixed(1)}s\n`;
341
+ }
342
+ else if (!passed) {
343
+ text += `💥 FAILED (exit code ${result.code}) in ${durationSeconds.toFixed(1)}s\n`;
344
+ }
345
+ else {
346
+ text += `✅ PASSED in ${durationSeconds.toFixed(1)}s\n`;
347
+ }
348
+ if (state.bestMetric !== null) {
349
+ text += `📊 Current best ${state.metricName}: ${formatNum(state.bestMetric, state.metricUnit)}\n`;
350
+ }
351
+ const tail = output.split("\n").slice(-80).join("\n");
352
+ text += `\nLast 80 lines of output:\n${tail}`;
353
+ return toolResult(text);
354
+ },
355
+ }, { name: "run_experiment" });
356
+ // -----------------------------------------------------------------------
357
+ // log_experiment tool
358
+ // -----------------------------------------------------------------------
359
+ pluginApi.registerTool({
360
+ name: "log_experiment",
361
+ description: 'Record an experiment result. "keep" auto-commits via git. "discard"/"crash" → revert with git checkout. Call after every run_experiment.',
362
+ parameters: {
363
+ type: "object",
364
+ properties: {
365
+ commit: {
366
+ type: "string",
367
+ description: "Git commit hash (short, 7 chars)",
368
+ },
369
+ metric: {
370
+ type: "number",
371
+ description: "Primary metric value (e.g. P&L in dollars). 0 for crashes.",
372
+ },
373
+ status: {
374
+ type: "string",
375
+ enum: ["keep", "discard", "crash"],
376
+ description: "keep if improved, discard if worse, crash if failed",
377
+ },
378
+ description: {
379
+ type: "string",
380
+ description: "Short description of what this experiment tried",
381
+ },
382
+ metrics: {
383
+ type: "object",
384
+ description: 'Secondary metrics as { name: value } (e.g. { "trades": 5, "win_rate": 0.6 })',
385
+ },
386
+ force: {
387
+ type: "boolean",
388
+ description: "Set true to allow adding a new secondary metric not previously tracked",
389
+ },
390
+ },
391
+ required: ["commit", "metric", "status", "description"],
392
+ },
393
+ async execute(_toolCallId, params) {
394
+ const dir = resolvedWorkspaceDir;
395
+ if (!dir)
396
+ return toolResult("❌ No workspace directory.");
397
+ const secondaryMetrics = params.metrics ?? {};
398
+ const force = params.force ?? false;
399
+ // Validate secondary metrics consistency
400
+ if (state.secondaryMetrics.length > 0) {
401
+ const knownNames = new Set(state.secondaryMetrics.map((m) => m.name));
402
+ const providedNames = new Set(Object.keys(secondaryMetrics));
403
+ const missing = [...knownNames].filter((n) => !providedNames.has(n));
404
+ if (missing.length > 0) {
405
+ return toolResult(`❌ Missing secondary metrics: ${missing.join(", ")}\n` +
406
+ `Expected: ${[...knownNames].join(", ")}\n` +
407
+ `Got: ${[...providedNames].join(", ") || "(none)"}\n` +
408
+ `Fix: include ${missing.map((m) => `"${m}": <value>`).join(", ")} in metrics.`);
409
+ }
410
+ const newMetrics = [...providedNames].filter((n) => !knownNames.has(n));
411
+ if (newMetrics.length > 0 && !force) {
412
+ return toolResult(`❌ New secondary metric(s) not previously tracked: ${newMetrics.join(", ")}\n` +
413
+ `Existing: ${[...knownNames].join(", ")}\n` +
414
+ `Call again with force: true to add, or remove from metrics.`);
415
+ }
416
+ }
417
+ const experiment = {
418
+ commit: params.commit.slice(0, 7),
419
+ metric: params.metric,
420
+ metrics: secondaryMetrics,
421
+ status: params.status,
422
+ description: params.description,
423
+ timestamp: Date.now(),
424
+ segment: state.currentSegment,
425
+ };
426
+ state.results.push(experiment);
427
+ // Register new secondary metrics
428
+ for (const name of Object.keys(secondaryMetrics)) {
429
+ if (!state.secondaryMetrics.find((m) => m.name === name)) {
430
+ let unit = "";
431
+ if (name.includes("pnl") || name.includes("budget"))
432
+ unit = "$";
433
+ else if (name.includes("rate") || name.includes("pct"))
434
+ unit = "%";
435
+ state.secondaryMetrics.push({ name, unit });
436
+ }
437
+ }
438
+ state.bestMetric = findBaselineMetric(state.results, state.currentSegment);
439
+ const curCount = currentResults(state.results, state.currentSegment).length;
440
+ let text = `Logged #${state.results.length}: ${experiment.status} — ${experiment.description}`;
441
+ if (state.bestMetric !== null) {
442
+ text += `\nBaseline ${state.metricName}: ${formatNum(state.bestMetric, state.metricUnit)}`;
443
+ if (curCount > 1 && params.status === "keep" && params.metric !== 0) {
444
+ const delta = params.metric - state.bestMetric;
445
+ const pct = state.bestMetric !== 0
446
+ ? ((delta / Math.abs(state.bestMetric)) * 100).toFixed(1)
447
+ : "∞";
448
+ const sign = delta > 0 ? "+" : "";
449
+ text += ` | this: ${formatNum(params.metric, state.metricUnit)} (${sign}${pct}%)`;
450
+ }
451
+ }
452
+ if (Object.keys(secondaryMetrics).length > 0) {
453
+ const parts = [];
454
+ for (const [name, value] of Object.entries(secondaryMetrics)) {
455
+ const def = state.secondaryMetrics.find((m) => m.name === name);
456
+ parts.push(`${name}: ${formatNum(value, def?.unit ?? "")}`);
457
+ }
458
+ text += `\nSecondary: ${parts.join(" ")}`;
459
+ }
460
+ text += `\n(${state.results.length} experiments total)`;
461
+ // Auto-commit on keep
462
+ if (params.status === "keep") {
463
+ try {
464
+ const resultData = {
465
+ status: params.status,
466
+ [state.metricName || "metric"]: params.metric,
467
+ ...secondaryMetrics,
468
+ };
469
+ const trailerJson = JSON.stringify(resultData);
470
+ const commitMsg = `${params.description}\n\nResult: ${trailerJson}`;
471
+ const gitResult = await pluginApi.runtime.system.runCommandWithTimeout([
472
+ "bash",
473
+ "-c",
474
+ `git add -A && git diff --cached --quiet && echo "NOTHING_TO_COMMIT" || git commit -m ${JSON.stringify(commitMsg)}`,
475
+ ], { timeoutMs: 10000, cwd: dir });
476
+ const gitOutput = (gitResult.stdout + gitResult.stderr).trim();
477
+ if (gitOutput.includes("NOTHING_TO_COMMIT")) {
478
+ text += `\n📝 Git: nothing to commit`;
479
+ }
480
+ else if (gitResult.code === 0) {
481
+ const firstLine = gitOutput.split("\n")[0] || "";
482
+ text += `\n📝 Git: committed — ${firstLine}`;
483
+ try {
484
+ const shaResult = await pluginApi.runtime.system.runCommandWithTimeout(["git", "rev-parse", "--short=7", "HEAD"], { timeoutMs: 5000, cwd: dir });
485
+ const newSha = shaResult.stdout.trim();
486
+ if (newSha && newSha.length >= 7) {
487
+ experiment.commit = newSha;
488
+ }
489
+ }
490
+ catch {
491
+ // Keep original
492
+ }
493
+ }
494
+ else {
495
+ text += `\n⚠️ Git commit failed: ${gitOutput.slice(0, 200)}`;
496
+ }
497
+ }
498
+ catch (e) {
499
+ text += `\n⚠️ Git error: ${e instanceof Error ? e.message : String(e)}`;
500
+ }
501
+ }
502
+ else {
503
+ text += `\n📝 Git: skipped commit (${params.status}) — revert with git checkout -- .`;
504
+ }
505
+ // Persist to JSONL after git (so commit hash is correct)
506
+ try {
507
+ const jsonlPath = path.join(dir, "autoresearch.jsonl");
508
+ fs.appendFileSync(jsonlPath, JSON.stringify({ run: state.results.length, ...experiment }) +
509
+ "\n");
510
+ }
511
+ catch {
512
+ // Don't fail if write fails
513
+ }
514
+ return toolResult(text);
515
+ },
516
+ }, { name: "log_experiment" });
517
+ // -----------------------------------------------------------------------
518
+ // /autoresearch command
519
+ // -----------------------------------------------------------------------
520
+ pluginApi.registerCommand({
521
+ name: "autoresearch",
522
+ description: "Start or resume autoresearch mode for a skill",
523
+ acceptsArgs: true,
524
+ async handler(ctx) {
525
+ const dir = resolvedWorkspaceDir;
526
+ if (!dir)
527
+ return { text: "❌ No workspace directory." };
528
+ const args = ctx.args?.trim() ?? "";
529
+ const mdPath = path.join(dir, "autoresearch.md");
530
+ const hasRules = fs.existsSync(mdPath);
531
+ if (args === "off") {
532
+ return { text: "Autoresearch mode OFF." };
533
+ }
534
+ if (args === "status") {
535
+ if (state.results.length === 0) {
536
+ return {
537
+ text: "No experiments yet. Run /autoresearch <skill> to start.",
538
+ };
539
+ }
540
+ const cur = currentResults(state.results, state.currentSegment);
541
+ const kept = cur.filter((r) => r.status === "keep").length;
542
+ const crashed = cur.filter((r) => r.status === "crash").length;
543
+ const discarded = cur.filter((r) => r.status === "discard").length;
544
+ let bestPrimary = null;
545
+ for (const r of cur) {
546
+ if (r.status === "keep" &&
547
+ r.metric !== 0 &&
548
+ (bestPrimary === null ||
549
+ isBetter(r.metric, bestPrimary, state.bestDirection))) {
550
+ bestPrimary = r.metric;
551
+ }
552
+ }
553
+ let text = `🔬 Autoresearch: ${state.name ?? "unnamed"}\n`;
554
+ text += `${cur.length} experiments: ${kept} kept, ${discarded} discarded, ${crashed} crashed\n`;
555
+ text += `Baseline ${state.metricName}: ${formatNum(state.bestMetric, state.metricUnit)}\n`;
556
+ if (bestPrimary !== null) {
557
+ text += `Best ${state.metricName}: ${formatNum(bestPrimary, state.metricUnit)}`;
558
+ if (state.bestMetric !== null && state.bestMetric !== 0) {
559
+ const pct = ((bestPrimary - state.bestMetric) /
560
+ Math.abs(state.bestMetric)) *
561
+ 100;
562
+ text += ` (${pct > 0 ? "+" : ""}${pct.toFixed(1)}%)`;
563
+ }
564
+ text += "\n";
565
+ }
566
+ const recent = cur.slice(-5);
567
+ text += "\nRecent:\n";
568
+ for (const r of recent) {
569
+ const icon = r.status === "keep" ? "✓" : r.status === "crash" ? "✗" : "–";
570
+ text += ` ${icon} ${r.description} → ${state.metricName}: ${formatNum(r.metric, state.metricUnit)} (${r.status})\n`;
571
+ }
572
+ return { text };
573
+ }
574
+ if (args === "reset") {
575
+ state = {
576
+ results: [],
577
+ bestMetric: null,
578
+ bestDirection: "higher",
579
+ metricName: "pnl",
580
+ metricUnit: "$",
581
+ secondaryMetrics: [],
582
+ name: null,
583
+ currentSegment: 0,
584
+ };
585
+ // Clear JSONL file
586
+ try {
587
+ const jsonlPath = path.join(dir, "autoresearch.jsonl");
588
+ if (fs.existsSync(jsonlPath))
589
+ fs.unlinkSync(jsonlPath);
590
+ }
591
+ catch { /* ignore */ }
592
+ return { text: "Experiment history cleared. Ready to start fresh." };
593
+ }
594
+ if (hasRules) {
595
+ return {
596
+ text: args
597
+ ? `Autoresearch mode active. ${args}\nRead autoresearch.md for experiment rules, then resume the loop.`
598
+ : "Autoresearch mode active. Read autoresearch.md and autoresearch.sh, then resume the experiment loop.",
599
+ };
600
+ }
601
+ return {
602
+ text: args
603
+ ? `Start autoresearch: ${args}\nNo autoresearch.md found — read the skill source code, set up autoresearch.md and autoresearch.sh, then start the experiment loop.`
604
+ : "Start autoresearch. No autoresearch.md found — specify a skill slug (e.g. /autoresearch polymarket-ai-divergence).",
605
+ };
606
+ },
607
+ });
608
+ pluginApi.logger.info("[simmer-autoresearch] Plugin loaded");
609
+ }
@@ -0,0 +1,29 @@
1
+ {
2
+ "id": "simmer-autoresearch",
3
+ "name": "Simmer Autoresearch",
4
+ "description": "Autonomous skill optimization — agent mutates skill code + config, measures P&L, keeps what works",
5
+ "version": "0.1.0",
6
+ "configSchema": {
7
+ "type": "object",
8
+ "additionalProperties": false,
9
+ "properties": {
10
+ "apiKey": {
11
+ "type": "string",
12
+ "description": "Simmer API key (sk_live_...)"
13
+ },
14
+ "apiUrl": {
15
+ "type": "string",
16
+ "default": "https://api.simmer.markets"
17
+ }
18
+ },
19
+ "required": ["apiKey"]
20
+ },
21
+ "uiHints": {
22
+ "apiKey": {
23
+ "label": "Simmer API Key",
24
+ "sensitive": true,
25
+ "placeholder": "sk_live_...",
26
+ "help": "Get from simmer.markets/dashboard"
27
+ }
28
+ }
29
+ }
package/package.json ADDED
@@ -0,0 +1,34 @@
1
+ {
2
+ "name": "simmer-autoresearch",
3
+ "version": "0.1.0",
4
+ "description": "Autonomous skill optimization for Simmer — fork of pi-autoresearch adapted for trading",
5
+ "main": "dist/index.js",
6
+ "types": "dist/index.d.ts",
7
+ "files": [
8
+ "dist/",
9
+ "openclaw.plugin.json",
10
+ "SKILL.md"
11
+ ],
12
+ "scripts": {
13
+ "build": "tsc",
14
+ "dev": "tsc --watch"
15
+ },
16
+ "openclaw": {
17
+ "extensions": [
18
+ "./dist/index.js"
19
+ ]
20
+ },
21
+ "keywords": [
22
+ "openclaw",
23
+ "plugin",
24
+ "simmer",
25
+ "autoresearch",
26
+ "prediction-markets",
27
+ "trading"
28
+ ],
29
+ "license": "MIT",
30
+ "devDependencies": {
31
+ "@types/node": "^25.5.0",
32
+ "typescript": "^5.4.0"
33
+ }
34
+ }