researchloop 0.1.0 → 0.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +22 -0
- package/README.md +33 -12
- package/bin/researchloop.js +532 -34
- package/docs/getting-started.md +39 -11
- package/package.json +4 -1
- package/skills/README.md +31 -0
- package/skills/researchloop-autoresearch/claude-code/CLAUDE.md +35 -0
- package/skills/researchloop-autoresearch/codex/SKILL.md +50 -0
- package/skills/researchloop-autoresearch/references/architecture.md +21 -0
- package/skills/researchloop-autoresearch/references/attention.md +21 -0
- package/skills/researchloop-autoresearch/references/hyperparameters.md +22 -0
package/CHANGELOG.md
CHANGED
|
@@ -1,5 +1,27 @@
|
|
|
1
1
|
# Changelog
|
|
2
2
|
|
|
3
|
+
## 0.2.0
|
|
4
|
+
|
|
5
|
+
ResearchLoop becomes a runtime, not just a folder.
|
|
6
|
+
|
|
7
|
+
New:
|
|
8
|
+
|
|
9
|
+
- `researchloop run` executes a command, streams output to a per-run log, parses a metric (default regex on `metric=N` or `"metric": N`, plus last-line JSON fallback), and auto-appends a row to `runs.jsonl`. No more manual `record`.
|
|
10
|
+
- `researchloop baseline` is `run` for the baseline command and also updates `goal.md` Current Best and `plan.md` Current State.
|
|
11
|
+
- `researchloop scan-papers` queries the arXiv API for papers relevant to the goal, writes one markdown note per result to `scratchpad/papers/`, caches responses to `~/.cache/researchloop/arxiv/`, supports `--offline`, `--since YYYY-MM`, `--limit`, `--query`, `--cache-dir`.
|
|
12
|
+
- `researchloop idea` now reads `scratchpad/papers/` and adds paper-derived ideas alongside the adapter playbook.
|
|
13
|
+
|
|
14
|
+
Improvements:
|
|
15
|
+
|
|
16
|
+
- Tighter adapter detection: pytorch needs a real `train*.py` script or `torch` in deps; huggingface needs `transformers` in deps. No more false positives from filename substrings.
|
|
17
|
+
- `candidate_config_files` no longer matches every `.json`/`.yaml`/`.toml` in the repo.
|
|
18
|
+
- README install command no longer hardcodes a developer machine path.
|
|
19
|
+
- New tests: `test:run`, `test:scan-papers`. arXiv test uses a recorded XML fixture and never hits the network.
|
|
20
|
+
|
|
21
|
+
Cleanup:
|
|
22
|
+
|
|
23
|
+
- Removed misleading `projects/researchloop` and `projects/researchloop-cli` symlinks.
|
|
24
|
+
|
|
3
25
|
## 0.1.0
|
|
4
26
|
|
|
5
27
|
First public ResearchLoop release.
|
package/README.md
CHANGED
|
@@ -8,7 +8,17 @@ It installs a durable research harness into a machine learning repo so agents li
|
|
|
8
8
|
|
|
9
9
|
This repo is both the product and the startup home base.
|
|
10
10
|
|
|
11
|
-
##
|
|
11
|
+
## Give This Prompt To Your Agent
|
|
12
|
+
|
|
13
|
+
Copy this into Codex, Claude Code, Hermes, Cursor, or another coding agent:
|
|
14
|
+
|
|
15
|
+
```text
|
|
16
|
+
Set up an autonomous AI research environment in this repo using ResearchLoop.
|
|
17
|
+
Read the ResearchLoop docs and the .researchloop/ files, inspect the repo, establish the baseline, propose a small set of experiments, run the smallest valid change first, record every result, compare runs, and keep the research loop moving.
|
|
18
|
+
Use the package commands to manage goals, ideas, prompts, runs, comparisons, and reports.
|
|
19
|
+
```
|
|
20
|
+
|
|
21
|
+
Then install ResearchLoop:
|
|
12
22
|
|
|
13
23
|
```bash
|
|
14
24
|
npm install -g researchloop
|
|
@@ -17,7 +27,8 @@ npm install -g researchloop
|
|
|
17
27
|
Local development from this checkout:
|
|
18
28
|
|
|
19
29
|
```bash
|
|
20
|
-
|
|
30
|
+
git clone https://github.com/vukrosic/researchloop.git
|
|
31
|
+
cd researchloop
|
|
21
32
|
npm link
|
|
22
33
|
researchloop --help
|
|
23
34
|
```
|
|
@@ -26,16 +37,18 @@ researchloop --help
|
|
|
26
37
|
|
|
27
38
|
```bash
|
|
28
39
|
researchloop init --agent codex
|
|
29
|
-
researchloop goal "lower validation loss"
|
|
40
|
+
researchloop goal "lower validation loss" --metric val_loss --direction lower \
|
|
41
|
+
--baseline "python train.py" --evaluation "python eval.py"
|
|
30
42
|
researchloop inspect
|
|
43
|
+
researchloop scan-papers --limit 10
|
|
31
44
|
researchloop idea --write
|
|
32
45
|
researchloop prompt --agent codex
|
|
33
|
-
researchloop
|
|
34
|
-
researchloop
|
|
35
|
-
researchloop doctor
|
|
36
|
-
researchloop record --id first-run --status complete --metric val_loss=2.31 --note "First logged experiment"
|
|
46
|
+
researchloop baseline
|
|
47
|
+
researchloop run --id lr-3e-4 --command "python train.py --lr 3e-4"
|
|
37
48
|
researchloop compare --metric val_loss --direction lower
|
|
38
49
|
researchloop report
|
|
50
|
+
researchloop dashboard
|
|
51
|
+
researchloop doctor
|
|
39
52
|
```
|
|
40
53
|
|
|
41
54
|
Then paste the generated prompt into the coding agent.
|
|
@@ -65,6 +78,7 @@ The package does not claim to magically train every model. It gives an agent the
|
|
|
65
78
|
```text
|
|
66
79
|
bin/ CLI entrypoint
|
|
67
80
|
templates/ Harness, adapters, and agent prompts
|
|
81
|
+
skills/ Downloadable agent research skill packs
|
|
68
82
|
docs/site/ Landing page
|
|
69
83
|
docs/research/ Local testing notes and research logs
|
|
70
84
|
docs/competitors/ Competitor and adjacent-project research
|
|
@@ -116,17 +130,22 @@ The startup plan is in `docs/startup/`.
|
|
|
116
130
|
- `researchloop init` creates `.researchloop/` and agent instruction files.
|
|
117
131
|
- `researchloop goal` saves a durable research objective in `.researchloop/goal.md`.
|
|
118
132
|
- `researchloop inspect` writes `.researchloop/repo-profile.json`.
|
|
119
|
-
- `researchloop
|
|
133
|
+
- `researchloop scan-papers` fetches relevant arXiv abstracts into `.researchloop/scratchpad/papers/`.
|
|
134
|
+
- `researchloop idea` generates ranked experiment ideas, including paper-derived ones, and can write an idea note.
|
|
120
135
|
- `researchloop prompt` prints an agent-ready autonomous research prompt, with optional focus playbooks.
|
|
121
|
-
- `researchloop
|
|
122
|
-
- `researchloop
|
|
123
|
-
- `researchloop record` appends a structured run result to `runs.jsonl
|
|
136
|
+
- `researchloop baseline` runs the baseline command, parses the metric, and locks it into `goal.md` and `plan.md`.
|
|
137
|
+
- `researchloop run` executes a training or eval command, streams the log, parses the metric, and records the run.
|
|
138
|
+
- `researchloop record` appends a structured run result to `runs.jsonl` (use for manual rows).
|
|
124
139
|
- `researchloop compare` ranks runs by a chosen metric.
|
|
125
140
|
- `researchloop report` summarizes the run ledger.
|
|
141
|
+
- `researchloop dashboard` starts a local localhost dashboard for experiment tracking.
|
|
142
|
+
- `researchloop doctor` checks basic local tooling.
|
|
126
143
|
- `npm run test:setup` runs the blank-repo and minimal-fixture setup checks.
|
|
127
144
|
- `npm run test:compare` checks comparison output for a few recorded runs.
|
|
145
|
+
- `npm run test:run` checks `run` and `baseline` against deterministic shell commands.
|
|
146
|
+
- `npm run test:scan-papers` checks the arXiv scan path against a recorded XML fixture (no network).
|
|
128
147
|
- `npm run test:goal` checks goal saving and prompt handoff.
|
|
129
|
-
- `npm run test:idea` checks idea generation for a blank repo
|
|
148
|
+
- `npm run test:idea` checks idea generation for a blank repo, an llm-research-kit-shaped repo, and a paper-augmented repo.
|
|
130
149
|
- `npm run test:dashboard` checks the local dashboard server and API.
|
|
131
150
|
- `npm run test:prompts` checks prompt templates for placeholder drift.
|
|
132
151
|
- `npm run test:focus-prompts` checks the hyperparameter, architecture, and attention playbooks.
|
|
@@ -136,6 +155,8 @@ The startup plan is in `docs/startup/`.
|
|
|
136
155
|
|
|
137
156
|
ResearchLoop should stay open source at the core. The npm package, prompts, adapters, and run ledger format should be inspectable and forkable.
|
|
138
157
|
|
|
158
|
+
The package also ships optional skill packs under `skills/` so teams can copy the same research rules into Codex, Claude Code, or other agent-specific folders.
|
|
159
|
+
|
|
139
160
|
Possible paid layers later:
|
|
140
161
|
|
|
141
162
|
- hosted dashboard
|
package/bin/researchloop.js
CHANGED
|
@@ -1,9 +1,11 @@
|
|
|
1
1
|
#!/usr/bin/env node
|
|
2
2
|
import fs from "node:fs";
|
|
3
3
|
import http from "node:http";
|
|
4
|
+
import os from "node:os";
|
|
4
5
|
import path from "node:path";
|
|
5
6
|
import process from "node:process";
|
|
6
|
-
import { execSync } from "node:child_process";
|
|
7
|
+
import { execSync, spawn } from "node:child_process";
|
|
8
|
+
import { createHash } from "node:crypto";
|
|
7
9
|
import { fileURLToPath } from "node:url";
|
|
8
10
|
|
|
9
11
|
const __filename = fileURLToPath(import.meta.url);
|
|
@@ -132,16 +134,37 @@ function walkFiles(cwd, maxDepth = 3) {
|
|
|
132
134
|
return out;
|
|
133
135
|
}
|
|
134
136
|
|
|
137
|
+
function readSafe(file) {
|
|
138
|
+
try {
|
|
139
|
+
return fs.readFileSync(file, "utf8");
|
|
140
|
+
} catch {
|
|
141
|
+
return "";
|
|
142
|
+
}
|
|
143
|
+
}
|
|
144
|
+
|
|
145
|
+
function depsMention(cwd, needle) {
|
|
146
|
+
const candidates = ["requirements.txt", "pyproject.toml", "setup.py", "uv.lock", "Pipfile"];
|
|
147
|
+
const needleLower = needle.toLowerCase();
|
|
148
|
+
for (const name of candidates) {
|
|
149
|
+
const text = readSafe(path.join(cwd, name)).toLowerCase();
|
|
150
|
+
if (text.includes(needleLower)) {
|
|
151
|
+
return true;
|
|
152
|
+
}
|
|
153
|
+
}
|
|
154
|
+
return false;
|
|
155
|
+
}
|
|
156
|
+
|
|
135
157
|
function detectRepo(cwd) {
|
|
136
158
|
const files = walkFiles(cwd, 3);
|
|
137
|
-
const
|
|
138
|
-
const
|
|
159
|
+
const basenames = files.map((file) => path.basename(file));
|
|
160
|
+
const trainScriptPattern = /^(train|finetune|pretrain)[\w-]*\.py$/i;
|
|
161
|
+
const hasTrainScript = basenames.some((name) => trainScriptPattern.test(name));
|
|
139
162
|
|
|
140
163
|
const adapters = ["generic"];
|
|
141
|
-
if (
|
|
164
|
+
if (hasTrainScript || depsMention(cwd, "torch")) {
|
|
142
165
|
adapters.push("pytorch");
|
|
143
166
|
}
|
|
144
|
-
if (
|
|
167
|
+
if (depsMention(cwd, "transformers") || depsMention(cwd, "huggingface_hub")) {
|
|
145
168
|
adapters.push("huggingface");
|
|
146
169
|
}
|
|
147
170
|
if (files.includes("train_llm.py") && files.includes("configs/llm_config.py")) {
|
|
@@ -154,9 +177,9 @@ function detectRepo(cwd) {
|
|
|
154
177
|
git_branch: run("git branch --show-current", cwd) || null,
|
|
155
178
|
git_status_short: run("git status --short", cwd) || null,
|
|
156
179
|
package_files: existsAny(cwd, ["package.json", "pyproject.toml", "requirements.txt", "uv.lock"]),
|
|
157
|
-
candidate_train_files: files.filter((file) => /(^|\/)(train|finetune|pretrain)
|
|
158
|
-
candidate_eval_files: files.filter((file) => /(^|\/)(eval|evaluate|benchmark)
|
|
159
|
-
candidate_config_files: files.filter((file) => /(^|\/|_)(config|cfg)[
|
|
180
|
+
candidate_train_files: files.filter((file) => /(^|\/)(train|finetune|pretrain)[\w-]*\.py$/i.test(file)).slice(0, 30),
|
|
181
|
+
candidate_eval_files: files.filter((file) => /(^|\/)(eval|evaluate|benchmark)[\w-]*\.py$/i.test(file)).slice(0, 30),
|
|
182
|
+
candidate_config_files: files.filter((file) => /(^|\/|_)(config|cfg)[\w-]*\.(py|js|ts|json|yaml|yml|toml)$/i.test(file)).slice(0, 40),
|
|
160
183
|
candidate_log_dirs: existsAny(cwd, ["logs", "runs", "wandb", "mlruns", "checkpoints", "plots"]),
|
|
161
184
|
adapters: [...new Set(adapters)],
|
|
162
185
|
};
|
|
@@ -715,6 +738,45 @@ function renderIdeasMarkdown(profile, goalText, ideas) {
|
|
|
715
738
|
return lines.join("\n");
|
|
716
739
|
}
|
|
717
740
|
|
|
741
|
+
function readPaperNotes(cwd) {
|
|
742
|
+
const papersDir = path.join(cwd, ".researchloop", "scratchpad", "papers");
|
|
743
|
+
if (!fs.existsSync(papersDir)) {
|
|
744
|
+
return [];
|
|
745
|
+
}
|
|
746
|
+
const out = [];
|
|
747
|
+
for (const entry of fs.readdirSync(papersDir, { withFileTypes: true })) {
|
|
748
|
+
if (!entry.isFile() || !entry.name.endsWith(".md")) continue;
|
|
749
|
+
const file = path.join(papersDir, entry.name);
|
|
750
|
+
const raw = fs.readFileSync(file, "utf8");
|
|
751
|
+
const titleMatch = raw.match(/^#\s+(.+?)\s*$/m);
|
|
752
|
+
const idMatch = raw.match(/^arXiv:\s*(.+?)\s*$/m);
|
|
753
|
+
out.push({
|
|
754
|
+
title: titleMatch ? titleMatch[1].trim() : entry.name.replace(/\.md$/, ""),
|
|
755
|
+
arxivId: idMatch ? idMatch[1].trim() : entry.name.replace(/\.md$/, ""),
|
|
756
|
+
file: path.relative(cwd, file),
|
|
757
|
+
});
|
|
758
|
+
}
|
|
759
|
+
return out;
|
|
760
|
+
}
|
|
761
|
+
|
|
762
|
+
function buildPaperIdeas(papers, goalText, startRank) {
|
|
763
|
+
const ideas = [];
|
|
764
|
+
let rank = startRank;
|
|
765
|
+
for (const paper of papers.slice(0, 5)) {
|
|
766
|
+
const shortTitle = paper.title.length > 60 ? `${paper.title.slice(0, 57)}...` : paper.title;
|
|
767
|
+
ideas.push({
|
|
768
|
+
rank,
|
|
769
|
+
title: `Read paper: ${shortTitle}`,
|
|
770
|
+
hypothesis: `arXiv ${paper.arxivId} may suggest a mechanism relevant to ${goalText || "the target metric"}.`,
|
|
771
|
+
change: `Read ${paper.file}, extract one concrete mechanism, and decide if it can be ported in one experiment.`,
|
|
772
|
+
killCriterion: "If the mechanism cannot be cleanly ported or has no reproducible result section, log the lesson and skip.",
|
|
773
|
+
whyNow: "Paper was fetched recently and is cheap to read before launching another sweep.",
|
|
774
|
+
});
|
|
775
|
+
rank += 1;
|
|
776
|
+
}
|
|
777
|
+
return ideas;
|
|
778
|
+
}
|
|
779
|
+
|
|
718
780
|
function cmdIdea() {
|
|
719
781
|
const cwd = targetDir();
|
|
720
782
|
const researchDir = path.join(cwd, ".researchloop");
|
|
@@ -722,6 +784,10 @@ function cmdIdea() {
|
|
|
722
784
|
const goalText = option("--goal", "") || readGoalSummary(path.join(researchDir, "goal.md"));
|
|
723
785
|
const profile = loadRepoProfile(cwd);
|
|
724
786
|
const ideas = buildIdeaList(profile, goalText);
|
|
787
|
+
const papers = readPaperNotes(cwd);
|
|
788
|
+
if (papers.length) {
|
|
789
|
+
ideas.push(...buildPaperIdeas(papers, goalText, ideas.length + 1));
|
|
790
|
+
}
|
|
725
791
|
const markdown = renderIdeasMarkdown(profile, goalText, ideas);
|
|
726
792
|
process.stdout.write(`${markdown}\n`);
|
|
727
793
|
|
|
@@ -852,6 +918,422 @@ function cmdRecord() {
|
|
|
852
918
|
console.log(`Recorded run: ${row.id}`);
|
|
853
919
|
}
|
|
854
920
|
|
|
921
|
+
function defaultMetricRegex(metricName) {
|
|
922
|
+
const escaped = metricName.replace(/[.*+?^${}()|[\]\\]/g, "\\$&");
|
|
923
|
+
return new RegExp(`["']?${escaped}["']?\\s*[:=]\\s*["']?(-?\\d+(?:\\.\\d+)?(?:[eE][+-]?\\d+)?)`, "gi");
|
|
924
|
+
}
|
|
925
|
+
|
|
926
|
+
function parseMetricFromOutput(output, metricName, customRegexSource) {
|
|
927
|
+
const regex = customRegexSource
|
|
928
|
+
? new RegExp(customRegexSource, "gi")
|
|
929
|
+
: defaultMetricRegex(metricName);
|
|
930
|
+
let last = null;
|
|
931
|
+
let match;
|
|
932
|
+
while ((match = regex.exec(output)) !== null) {
|
|
933
|
+
last = match[1] !== undefined ? match[1] : match[0];
|
|
934
|
+
}
|
|
935
|
+
if (last !== null && Number.isFinite(Number(last))) {
|
|
936
|
+
return Number(last);
|
|
937
|
+
}
|
|
938
|
+
const lines = output.split("\n").map((line) => line.trim()).filter(Boolean);
|
|
939
|
+
for (let idx = lines.length - 1; idx >= 0; idx -= 1) {
|
|
940
|
+
try {
|
|
941
|
+
const obj = JSON.parse(lines[idx]);
|
|
942
|
+
if (obj && typeof obj === "object" && metricName in obj && Number.isFinite(Number(obj[metricName]))) {
|
|
943
|
+
return Number(obj[metricName]);
|
|
944
|
+
}
|
|
945
|
+
} catch {
|
|
946
|
+
// not JSON, skip
|
|
947
|
+
}
|
|
948
|
+
}
|
|
949
|
+
return null;
|
|
950
|
+
}
|
|
951
|
+
|
|
952
|
+
function spawnCommand(commandText, cwd, timeoutMs, logFile) {
|
|
953
|
+
return new Promise((resolve) => {
|
|
954
|
+
const child = spawn(commandText, { cwd, shell: true });
|
|
955
|
+
const chunks = [];
|
|
956
|
+
let timedOut = false;
|
|
957
|
+
const logStream = fs.createWriteStream(logFile);
|
|
958
|
+
const timer = setTimeout(() => {
|
|
959
|
+
timedOut = true;
|
|
960
|
+
try {
|
|
961
|
+
child.kill("SIGKILL");
|
|
962
|
+
} catch {
|
|
963
|
+
// already gone
|
|
964
|
+
}
|
|
965
|
+
}, timeoutMs);
|
|
966
|
+
child.stdout.on("data", (data) => {
|
|
967
|
+
chunks.push(data);
|
|
968
|
+
process.stdout.write(data);
|
|
969
|
+
logStream.write(data);
|
|
970
|
+
});
|
|
971
|
+
child.stderr.on("data", (data) => {
|
|
972
|
+
chunks.push(data);
|
|
973
|
+
process.stderr.write(data);
|
|
974
|
+
logStream.write(data);
|
|
975
|
+
});
|
|
976
|
+
child.on("error", (err) => {
|
|
977
|
+
clearTimeout(timer);
|
|
978
|
+
const message = `\nresearchloop: spawn error: ${err.message}\n`;
|
|
979
|
+
logStream.end(message);
|
|
980
|
+
resolve({
|
|
981
|
+
output: Buffer.concat(chunks).toString("utf8") + message,
|
|
982
|
+
exitCode: null,
|
|
983
|
+
timedOut,
|
|
984
|
+
spawnError: err.message,
|
|
985
|
+
});
|
|
986
|
+
});
|
|
987
|
+
child.on("close", (code) => {
|
|
988
|
+
clearTimeout(timer);
|
|
989
|
+
logStream.end();
|
|
990
|
+
resolve({
|
|
991
|
+
output: Buffer.concat(chunks).toString("utf8"),
|
|
992
|
+
exitCode: code,
|
|
993
|
+
timedOut,
|
|
994
|
+
spawnError: null,
|
|
995
|
+
});
|
|
996
|
+
});
|
|
997
|
+
});
|
|
998
|
+
}
|
|
999
|
+
|
|
1000
|
+
function replaceOrAppendSection(text, heading, body) {
|
|
1001
|
+
const escaped = heading.replace(/[.*+?^${}()|[\]\\]/g, "\\$&");
|
|
1002
|
+
const pattern = new RegExp(`(^## ${escaped}\\s+)([\\s\\S]*?)(?=\\n## |\\n# |$)`, "mi");
|
|
1003
|
+
if (pattern.test(text)) {
|
|
1004
|
+
return text.replace(pattern, `$1${body}\n`);
|
|
1005
|
+
}
|
|
1006
|
+
const suffix = text.endsWith("\n") ? "" : "\n";
|
|
1007
|
+
return `${text}${suffix}\n## ${heading}\n${body}\n`;
|
|
1008
|
+
}
|
|
1009
|
+
|
|
1010
|
+
function updateGoalCurrentBest(cwd, metricName, value, runId) {
|
|
1011
|
+
const goalFile = path.join(cwd, ".researchloop", "goal.md");
|
|
1012
|
+
if (!fs.existsSync(goalFile)) {
|
|
1013
|
+
return;
|
|
1014
|
+
}
|
|
1015
|
+
const raw = fs.readFileSync(goalFile, "utf8");
|
|
1016
|
+
const body = `${metricName} = ${value} (run ${runId})`;
|
|
1017
|
+
fs.writeFileSync(goalFile, replaceOrAppendSection(raw, "Current Best", body));
|
|
1018
|
+
}
|
|
1019
|
+
|
|
1020
|
+
function updatePlanBaseline(cwd, metricName, value, runId) {
|
|
1021
|
+
const planFile = path.join(cwd, ".researchloop", "plan.md");
|
|
1022
|
+
if (!fs.existsSync(planFile)) {
|
|
1023
|
+
return;
|
|
1024
|
+
}
|
|
1025
|
+
const raw = fs.readFileSync(planFile, "utf8");
|
|
1026
|
+
const body = [
|
|
1027
|
+
`- Baseline: ${metricName} = ${value} (run ${runId})`,
|
|
1028
|
+
"- Best valid result: same as baseline",
|
|
1029
|
+
"- Active family: none",
|
|
1030
|
+
"- Running jobs: none",
|
|
1031
|
+
"- Next action: design first experiment",
|
|
1032
|
+
].join("\n");
|
|
1033
|
+
fs.writeFileSync(planFile, replaceOrAppendSection(raw, "Current State", body));
|
|
1034
|
+
}
|
|
1035
|
+
|
|
1036
|
+
function readGoalFields(cwd) {
|
|
1037
|
+
const goalFile = path.join(cwd, ".researchloop", "goal.md");
|
|
1038
|
+
const raw = readTextIfExists(goalFile);
|
|
1039
|
+
return {
|
|
1040
|
+
goal: parseMarkdownSection(raw, "Goal") || "",
|
|
1041
|
+
metric: parseMarkdownSection(raw, "Target Metric") || "",
|
|
1042
|
+
direction: parseMarkdownSection(raw, "Direction") || "",
|
|
1043
|
+
baseline: parseMarkdownSection(raw, "Baseline Command") || "",
|
|
1044
|
+
evaluation: parseMarkdownSection(raw, "Evaluation Command") || "",
|
|
1045
|
+
};
|
|
1046
|
+
}
|
|
1047
|
+
|
|
1048
|
+
async function cmdRun(isBaseline) {
|
|
1049
|
+
const cwd = targetDir();
|
|
1050
|
+
const goalFields = readGoalFields(cwd);
|
|
1051
|
+
const explicitCommand = option("--command", null);
|
|
1052
|
+
let cmdText = explicitCommand && typeof explicitCommand === "string" ? explicitCommand : "";
|
|
1053
|
+
if (!cmdText) {
|
|
1054
|
+
cmdText = isBaseline
|
|
1055
|
+
? goalFields.baseline
|
|
1056
|
+
: (goalFields.evaluation || goalFields.baseline);
|
|
1057
|
+
}
|
|
1058
|
+
if (!cmdText || cmdText.toLowerCase() === "unknown") {
|
|
1059
|
+
console.error("No command to run.");
|
|
1060
|
+
console.error("Set one via:");
|
|
1061
|
+
console.error(" researchloop goal \"<text>\" --baseline \"python train.py\" --evaluation \"python eval.py\"");
|
|
1062
|
+
console.error("Or pass --command directly.");
|
|
1063
|
+
process.exitCode = 1;
|
|
1064
|
+
return;
|
|
1065
|
+
}
|
|
1066
|
+
|
|
1067
|
+
const metricName = String(option("--metric", goalFields.metric || "val_loss")).trim() || "val_loss";
|
|
1068
|
+
const customRegex = option("--regex", null);
|
|
1069
|
+
const regexSource = customRegex && typeof customRegex === "string" ? customRegex : null;
|
|
1070
|
+
const timeoutSec = Number(option("--timeout", 600));
|
|
1071
|
+
const timeoutMs = Number.isFinite(timeoutSec) && timeoutSec > 0 ? timeoutSec * 1000 : 600000;
|
|
1072
|
+
|
|
1073
|
+
const prefix = isBaseline ? "baseline" : "run";
|
|
1074
|
+
const id = String(option("--id", `${prefix}-${new Date().toISOString().replace(/[:.]/g, "-")}`));
|
|
1075
|
+
const runDir = path.join(cwd, ".researchloop", "scratchpad", "runs", id);
|
|
1076
|
+
ensureDir(runDir);
|
|
1077
|
+
const logFile = path.join(runDir, "log.txt");
|
|
1078
|
+
|
|
1079
|
+
console.log(`researchloop ${prefix}`);
|
|
1080
|
+
console.log(`command: ${cmdText}`);
|
|
1081
|
+
console.log(`metric: ${metricName}`);
|
|
1082
|
+
console.log(`timeout: ${timeoutMs / 1000}s`);
|
|
1083
|
+
console.log(`log: ${path.relative(cwd, logFile)}`);
|
|
1084
|
+
console.log("---");
|
|
1085
|
+
|
|
1086
|
+
const startedAt = new Date().toISOString();
|
|
1087
|
+
const result = await spawnCommand(cmdText, cwd, timeoutMs, logFile);
|
|
1088
|
+
const finishedAt = new Date().toISOString();
|
|
1089
|
+
|
|
1090
|
+
let status;
|
|
1091
|
+
if (result.spawnError) {
|
|
1092
|
+
status = "spawn_error";
|
|
1093
|
+
} else if (result.timedOut) {
|
|
1094
|
+
status = "timeout";
|
|
1095
|
+
} else if (result.exitCode !== 0) {
|
|
1096
|
+
status = "failed";
|
|
1097
|
+
} else {
|
|
1098
|
+
status = "complete";
|
|
1099
|
+
}
|
|
1100
|
+
|
|
1101
|
+
const metrics = {};
|
|
1102
|
+
const metricValue = parseMetricFromOutput(result.output, metricName, regexSource);
|
|
1103
|
+
if (metricValue !== null) {
|
|
1104
|
+
metrics[metricName] = metricValue;
|
|
1105
|
+
}
|
|
1106
|
+
if (status === "complete" && metricValue === null) {
|
|
1107
|
+
status = "complete_no_metric";
|
|
1108
|
+
}
|
|
1109
|
+
|
|
1110
|
+
const row = {
|
|
1111
|
+
id,
|
|
1112
|
+
timestamp: finishedAt,
|
|
1113
|
+
started_at: startedAt,
|
|
1114
|
+
status,
|
|
1115
|
+
agent: `researchloop ${prefix}`,
|
|
1116
|
+
command: cmdText,
|
|
1117
|
+
exit_code: result.exitCode,
|
|
1118
|
+
log: path.relative(cwd, logFile),
|
|
1119
|
+
metrics,
|
|
1120
|
+
notes: "",
|
|
1121
|
+
};
|
|
1122
|
+
const ledger = path.join(cwd, ".researchloop", "scratchpad", "runs.jsonl");
|
|
1123
|
+
ensureDir(path.dirname(ledger));
|
|
1124
|
+
fs.appendFileSync(ledger, `${JSON.stringify(row)}\n`);
|
|
1125
|
+
|
|
1126
|
+
const thread = path.join(cwd, ".researchloop", "scratchpad", "THREAD.md");
|
|
1127
|
+
ensureDir(path.dirname(thread));
|
|
1128
|
+
const metricSuffix = metricValue !== null ? ` ${metricName}=${metricValue}` : "";
|
|
1129
|
+
fs.appendFileSync(thread, `- ${finishedAt} ${prefix} ${id} status=${status}${metricSuffix}\n`);
|
|
1130
|
+
|
|
1131
|
+
console.log("---");
|
|
1132
|
+
console.log(`status: ${status}`);
|
|
1133
|
+
console.log(`exit_code: ${result.exitCode}`);
|
|
1134
|
+
if (metricValue !== null) {
|
|
1135
|
+
console.log(`${metricName}: ${metricValue}`);
|
|
1136
|
+
} else {
|
|
1137
|
+
console.log("metric: not parsed");
|
|
1138
|
+
}
|
|
1139
|
+
console.log(`recorded: ${id}`);
|
|
1140
|
+
|
|
1141
|
+
if (isBaseline && metricValue !== null) {
|
|
1142
|
+
updateGoalCurrentBest(cwd, metricName, metricValue, id);
|
|
1143
|
+
updatePlanBaseline(cwd, metricName, metricValue, id);
|
|
1144
|
+
console.log("goal.md Current Best updated.");
|
|
1145
|
+
console.log("plan.md Current State updated.");
|
|
1146
|
+
}
|
|
1147
|
+
|
|
1148
|
+
if (status === "failed" || status === "timeout" || status === "spawn_error") {
|
|
1149
|
+
process.exitCode = 1;
|
|
1150
|
+
}
|
|
1151
|
+
}
|
|
1152
|
+
|
|
1153
|
+
const ARXIV_API_URL = "http://export.arxiv.org/api/query";
|
|
1154
|
+
|
|
1155
|
+
function arxivCacheDir() {
|
|
1156
|
+
return path.join(os.homedir(), ".cache", "researchloop", "arxiv");
|
|
1157
|
+
}
|
|
1158
|
+
|
|
1159
|
+
function arxivCacheKey(query, limit, since) {
|
|
1160
|
+
return createHash("sha1")
|
|
1161
|
+
.update(`${query}|${limit}|${since || ""}`)
|
|
1162
|
+
.digest("hex")
|
|
1163
|
+
.slice(0, 16);
|
|
1164
|
+
}
|
|
1165
|
+
|
|
1166
|
+
async function fetchArxivXml({ query, limit, since, cacheDir, offline }) {
|
|
1167
|
+
const fixture = process.env.RESEARCHLOOP_ARXIV_FIXTURE;
|
|
1168
|
+
if (fixture) {
|
|
1169
|
+
return fs.readFileSync(fixture, "utf8");
|
|
1170
|
+
}
|
|
1171
|
+
ensureDir(cacheDir);
|
|
1172
|
+
const key = arxivCacheKey(query, limit, since);
|
|
1173
|
+
const cacheFile = path.join(cacheDir, `${key}.xml`);
|
|
1174
|
+
if (fs.existsSync(cacheFile)) {
|
|
1175
|
+
return fs.readFileSync(cacheFile, "utf8");
|
|
1176
|
+
}
|
|
1177
|
+
if (offline) {
|
|
1178
|
+
throw new Error(`offline mode: no cache for query "${query}" (key=${key})`);
|
|
1179
|
+
}
|
|
1180
|
+
const params = new URLSearchParams({
|
|
1181
|
+
search_query: query,
|
|
1182
|
+
sortBy: "submittedDate",
|
|
1183
|
+
sortOrder: "descending",
|
|
1184
|
+
max_results: String(limit),
|
|
1185
|
+
});
|
|
1186
|
+
const url = `${ARXIV_API_URL}?${params.toString()}`;
|
|
1187
|
+
const res = await fetch(url, { headers: { "User-Agent": "researchloop/0.2.0" } });
|
|
1188
|
+
if (!res.ok) {
|
|
1189
|
+
throw new Error(`arxiv fetch failed: HTTP ${res.status}`);
|
|
1190
|
+
}
|
|
1191
|
+
const xml = await res.text();
|
|
1192
|
+
fs.writeFileSync(cacheFile, xml);
|
|
1193
|
+
return xml;
|
|
1194
|
+
}
|
|
1195
|
+
|
|
1196
|
+
function decodeXmlEntities(text) {
|
|
1197
|
+
return text
|
|
1198
|
+
.replace(/</g, "<")
|
|
1199
|
+
.replace(/>/g, ">")
|
|
1200
|
+
.replace(/"/g, '"')
|
|
1201
|
+
.replace(/'/g, "'")
|
|
1202
|
+
.replace(/'/g, "'")
|
|
1203
|
+
.replace(/&/g, "&");
|
|
1204
|
+
}
|
|
1205
|
+
|
|
1206
|
+
function extractXmlTag(block, tag) {
|
|
1207
|
+
const re = new RegExp(`<${tag}[^>]*>([\\s\\S]*?)<\\/${tag}>`, "i");
|
|
1208
|
+
const match = block.match(re);
|
|
1209
|
+
return match ? decodeXmlEntities(match[1]).replace(/\s+/g, " ").trim() : "";
|
|
1210
|
+
}
|
|
1211
|
+
|
|
1212
|
+
function parseArxivEntries(xml) {
|
|
1213
|
+
const entries = [];
|
|
1214
|
+
const entryRe = /<entry>([\s\S]*?)<\/entry>/g;
|
|
1215
|
+
let match;
|
|
1216
|
+
while ((match = entryRe.exec(xml)) !== null) {
|
|
1217
|
+
const block = match[1];
|
|
1218
|
+
const idUrl = extractXmlTag(block, "id");
|
|
1219
|
+
const arxivId = idUrl.replace(/^https?:\/\/arxiv\.org\/abs\//, "");
|
|
1220
|
+
const authorBlocks = block.match(/<author>[\s\S]*?<\/author>/g) || [];
|
|
1221
|
+
const authors = authorBlocks
|
|
1222
|
+
.map((blk) => extractXmlTag(blk, "name"))
|
|
1223
|
+
.filter(Boolean);
|
|
1224
|
+
entries.push({
|
|
1225
|
+
arxivId,
|
|
1226
|
+
idUrl,
|
|
1227
|
+
title: extractXmlTag(block, "title"),
|
|
1228
|
+
summary: extractXmlTag(block, "summary"),
|
|
1229
|
+
published: extractXmlTag(block, "published"),
|
|
1230
|
+
updated: extractXmlTag(block, "updated"),
|
|
1231
|
+
authors,
|
|
1232
|
+
});
|
|
1233
|
+
}
|
|
1234
|
+
return entries;
|
|
1235
|
+
}
|
|
1236
|
+
|
|
1237
|
+
function filterArxivBySince(entries, since) {
|
|
1238
|
+
if (!since) return entries;
|
|
1239
|
+
const sinceDate = new Date(since.length === 7 ? `${since}-01` : since);
|
|
1240
|
+
if (Number.isNaN(sinceDate.getTime())) return entries;
|
|
1241
|
+
return entries.filter((entry) => {
|
|
1242
|
+
const date = new Date(entry.published);
|
|
1243
|
+
return !Number.isNaN(date.getTime()) && date >= sinceDate;
|
|
1244
|
+
});
|
|
1245
|
+
}
|
|
1246
|
+
|
|
1247
|
+
function buildDefaultArxivQuery(goalFields, profile) {
|
|
1248
|
+
const parts = [];
|
|
1249
|
+
if (goalFields.goal) parts.push(goalFields.goal);
|
|
1250
|
+
if (goalFields.metric) parts.push(goalFields.metric);
|
|
1251
|
+
const adapters = (profile && profile.adapters) || [];
|
|
1252
|
+
if (adapters.includes("huggingface")) parts.push("transformer");
|
|
1253
|
+
if (adapters.includes("pytorch")) parts.push("deep learning");
|
|
1254
|
+
const joined = parts.filter(Boolean).join(" ").slice(0, 200).trim();
|
|
1255
|
+
return joined ? `all:${joined}` : "all:deep learning";
|
|
1256
|
+
}
|
|
1257
|
+
|
|
1258
|
+
function renderPaperMarkdown(entry) {
|
|
1259
|
+
const pubDate = entry.published ? entry.published.slice(0, 10) : "";
|
|
1260
|
+
return [
|
|
1261
|
+
`# ${entry.title || entry.arxivId}`,
|
|
1262
|
+
"",
|
|
1263
|
+
`arXiv: ${entry.arxivId}`,
|
|
1264
|
+
`Published: ${pubDate}`,
|
|
1265
|
+
`Authors: ${entry.authors.join(", ")}`,
|
|
1266
|
+
`Link: ${entry.idUrl}`,
|
|
1267
|
+
"",
|
|
1268
|
+
"## Abstract",
|
|
1269
|
+
"",
|
|
1270
|
+
entry.summary,
|
|
1271
|
+
"",
|
|
1272
|
+
"## How to port this",
|
|
1273
|
+
"",
|
|
1274
|
+
"TODO. Fill in when the paper is read.",
|
|
1275
|
+
"",
|
|
1276
|
+
].join("\n");
|
|
1277
|
+
}
|
|
1278
|
+
|
|
1279
|
+
async function cmdScanPapers() {
|
|
1280
|
+
const cwd = targetDir();
|
|
1281
|
+
const goalFields = readGoalFields(cwd);
|
|
1282
|
+
const profile = loadRepoProfile(cwd);
|
|
1283
|
+
const explicitQuery = option("--query", null);
|
|
1284
|
+
const query = explicitQuery && typeof explicitQuery === "string"
|
|
1285
|
+
? explicitQuery
|
|
1286
|
+
: buildDefaultArxivQuery(goalFields, profile);
|
|
1287
|
+
const limitRaw = Number(option("--limit", 10));
|
|
1288
|
+
const limit = Number.isFinite(limitRaw) && limitRaw > 0 ? Math.min(50, Math.floor(limitRaw)) : 10;
|
|
1289
|
+
const sinceOpt = option("--since", null);
|
|
1290
|
+
const since = sinceOpt && typeof sinceOpt === "string" ? sinceOpt : null;
|
|
1291
|
+
const offline = hasFlag("--offline");
|
|
1292
|
+
const cacheDirOpt = option("--cache-dir", null);
|
|
1293
|
+
const cacheDir = cacheDirOpt && typeof cacheDirOpt === "string" ? cacheDirOpt : arxivCacheDir();
|
|
1294
|
+
|
|
1295
|
+
console.log("researchloop scan-papers");
|
|
1296
|
+
console.log(`query: ${query}`);
|
|
1297
|
+
console.log(`limit: ${limit}`);
|
|
1298
|
+
if (since) console.log(`since: ${since}`);
|
|
1299
|
+
console.log(`cache: ${cacheDir}`);
|
|
1300
|
+
|
|
1301
|
+
let xml;
|
|
1302
|
+
try {
|
|
1303
|
+
xml = await fetchArxivXml({ query, limit, since, cacheDir, offline });
|
|
1304
|
+
} catch (err) {
|
|
1305
|
+
console.error(`scan-papers failed: ${err.message}`);
|
|
1306
|
+
process.exitCode = 1;
|
|
1307
|
+
return;
|
|
1308
|
+
}
|
|
1309
|
+
|
|
1310
|
+
let entries = parseArxivEntries(xml);
|
|
1311
|
+
entries = filterArxivBySince(entries, since);
|
|
1312
|
+
|
|
1313
|
+
const papersDir = path.join(cwd, ".researchloop", "scratchpad", "papers");
|
|
1314
|
+
ensureDir(papersDir);
|
|
1315
|
+
for (const entry of entries) {
|
|
1316
|
+
const safeId = entry.arxivId.replace(/[/\\]/g, "_");
|
|
1317
|
+
const file = path.join(papersDir, `${safeId}.md`);
|
|
1318
|
+
fs.writeFileSync(file, renderPaperMarkdown(entry));
|
|
1319
|
+
}
|
|
1320
|
+
|
|
1321
|
+
const thread = path.join(cwd, ".researchloop", "scratchpad", "THREAD.md");
|
|
1322
|
+
ensureDir(path.dirname(thread));
|
|
1323
|
+
fs.appendFileSync(
|
|
1324
|
+
thread,
|
|
1325
|
+
`- ${new Date().toISOString()} scan-papers query="${query.slice(0, 100)}" found=${entries.length}\n`
|
|
1326
|
+
);
|
|
1327
|
+
|
|
1328
|
+
console.log("---");
|
|
1329
|
+
console.log(`found: ${entries.length}`);
|
|
1330
|
+
for (const entry of entries) {
|
|
1331
|
+
const title = entry.title.length > 80 ? `${entry.title.slice(0, 77)}...` : entry.title;
|
|
1332
|
+
console.log(`- ${entry.arxivId} ${title}`);
|
|
1333
|
+
}
|
|
1334
|
+
console.log(`papers written to: ${path.relative(cwd, papersDir)}`);
|
|
1335
|
+
}
|
|
1336
|
+
|
|
855
1337
|
function cmdHelp() {
|
|
856
1338
|
console.log(`Research Loop
|
|
857
1339
|
|
|
@@ -863,6 +1345,9 @@ Usage:
|
|
|
863
1345
|
researchloop prompt [--agent codex|claude-code|hermes|generic] [--goal TEXT] [--focus hyperparameters|architecture|attention]
|
|
864
1346
|
researchloop doctor [--dir PATH] [--python PATH]
|
|
865
1347
|
researchloop record [--dir PATH] [--id ID] [--status STATUS] [--metric key=value] [--note TEXT]
|
|
1348
|
+
researchloop run [--dir PATH] [--id ID] [--command CMD] [--metric NAME] [--regex PATTERN] [--timeout SECONDS]
|
|
1349
|
+
researchloop baseline [--dir PATH] [--id ID] [--command CMD] [--metric NAME] [--regex PATTERN] [--timeout SECONDS]
|
|
1350
|
+
researchloop scan-papers [--dir PATH] [--query TEXT] [--limit N] [--since YYYY-MM] [--cache-dir PATH] [--offline]
|
|
866
1351
|
researchloop compare [--dir PATH] [--metric NAME] [--direction lower|higher]
|
|
867
1352
|
researchloop dashboard [--dir PATH] [--host HOST] [--port PORT]
|
|
868
1353
|
researchloop report [--dir PATH]
|
|
@@ -871,30 +1356,43 @@ Research Loop installs docs, prompts, scratchpads, and experiment ledgers for au
|
|
|
871
1356
|
`);
|
|
872
1357
|
}
|
|
873
1358
|
|
|
874
|
-
|
|
875
|
-
|
|
876
|
-
|
|
877
|
-
|
|
878
|
-
|
|
879
|
-
|
|
880
|
-
|
|
881
|
-
|
|
882
|
-
|
|
883
|
-
|
|
884
|
-
|
|
885
|
-
|
|
886
|
-
|
|
887
|
-
|
|
888
|
-
|
|
889
|
-
|
|
890
|
-
|
|
891
|
-
|
|
892
|
-
|
|
893
|
-
|
|
894
|
-
|
|
895
|
-
|
|
896
|
-
|
|
897
|
-
|
|
898
|
-
|
|
899
|
-
|
|
1359
|
+
async function main() {
|
|
1360
|
+
if (hasFlag("--help") || command === "help") {
|
|
1361
|
+
cmdHelp();
|
|
1362
|
+
} else if (command === "init") {
|
|
1363
|
+
cmdInit();
|
|
1364
|
+
} else if (command === "goal") {
|
|
1365
|
+
cmdGoal();
|
|
1366
|
+
} else if (command === "inspect") {
|
|
1367
|
+
cmdInspect();
|
|
1368
|
+
} else if (command === "idea") {
|
|
1369
|
+
cmdIdea();
|
|
1370
|
+
} else if (command === "prompt") {
|
|
1371
|
+
cmdPrompt();
|
|
1372
|
+
} else if (command === "doctor") {
|
|
1373
|
+
cmdDoctor();
|
|
1374
|
+
} else if (command === "record") {
|
|
1375
|
+
cmdRecord();
|
|
1376
|
+
} else if (command === "run") {
|
|
1377
|
+
await cmdRun(false);
|
|
1378
|
+
} else if (command === "baseline") {
|
|
1379
|
+
await cmdRun(true);
|
|
1380
|
+
} else if (command === "scan-papers") {
|
|
1381
|
+
await cmdScanPapers();
|
|
1382
|
+
} else if (command === "compare") {
|
|
1383
|
+
cmdCompare();
|
|
1384
|
+
} else if (command === "dashboard") {
|
|
1385
|
+
cmdDashboard();
|
|
1386
|
+
} else if (command === "report") {
|
|
1387
|
+
cmdReport();
|
|
1388
|
+
} else {
|
|
1389
|
+
console.error(`Unknown command: ${command}`);
|
|
1390
|
+
cmdHelp();
|
|
1391
|
+
process.exitCode = 1;
|
|
1392
|
+
}
|
|
900
1393
|
}
|
|
1394
|
+
|
|
1395
|
+
main().catch((err) => {
|
|
1396
|
+
console.error(err);
|
|
1397
|
+
process.exitCode = 1;
|
|
1398
|
+
});
|
package/docs/getting-started.md
CHANGED
|
@@ -9,7 +9,17 @@ The shortest way to think about it:
|
|
|
9
9
|
- it creates a durable `.researchloop/` workspace
|
|
10
10
|
- your AI agent uses that workspace to plan, run, compare, and record experiments
|
|
11
11
|
|
|
12
|
-
## 1.
|
|
12
|
+
## 1. Give This Prompt To Your Agent
|
|
13
|
+
|
|
14
|
+
Copy this into Codex, Claude Code, Hermes, Cursor, or another coding agent:
|
|
15
|
+
|
|
16
|
+
```text
|
|
17
|
+
Set up an autonomous AI research environment in this repo using ResearchLoop.
|
|
18
|
+
Read the ResearchLoop docs and the .researchloop/ files, inspect the repo, establish the baseline, propose a small set of experiments, run the smallest valid change first, record every result, compare runs, and keep the research loop moving.
|
|
19
|
+
Use the package commands to manage goals, ideas, prompts, runs, comparisons, and reports.
|
|
20
|
+
```
|
|
21
|
+
|
|
22
|
+
## 2. Install
|
|
13
23
|
|
|
14
24
|
From your own machine:
|
|
15
25
|
|
|
@@ -31,7 +41,7 @@ If you want to hand this to an AI agent, the simplest instruction is:
|
|
|
31
41
|
Install ResearchLoop, initialize the repo, inspect the project, then use the generated prompt to start the research loop.
|
|
32
42
|
```
|
|
33
43
|
|
|
34
|
-
##
|
|
44
|
+
## 3. Initialize a repo
|
|
35
45
|
|
|
36
46
|
Run this inside a blank folder or inside an existing ML repo:
|
|
37
47
|
|
|
@@ -64,7 +74,7 @@ researchloop init --agent hermes
|
|
|
64
74
|
researchloop init --agent cursor
|
|
65
75
|
```
|
|
66
76
|
|
|
67
|
-
##
|
|
77
|
+
## 4. Set the research goal
|
|
68
78
|
|
|
69
79
|
Tell ResearchLoop what the agent should optimize:
|
|
70
80
|
|
|
@@ -80,7 +90,7 @@ researchloop goal "lower validation loss" --metric val_loss --direction lower
|
|
|
80
90
|
|
|
81
91
|
That saves the objective into `.researchloop/goal.md`, which the agent and the prompt command can read later.
|
|
82
92
|
|
|
83
|
-
##
|
|
93
|
+
## 5. Generate experiment ideas
|
|
84
94
|
|
|
85
95
|
```bash
|
|
86
96
|
researchloop idea --write
|
|
@@ -88,7 +98,7 @@ researchloop idea --write
|
|
|
88
98
|
|
|
89
99
|
This prints a ranked list of small experiments for the current repo shape. For `llm-research-kit`, that usually means baseline checks, learning-rate sweeps, and tiny architecture changes. For a generic repo, it starts with finding the baseline and metric plumbing.
|
|
90
100
|
|
|
91
|
-
##
|
|
101
|
+
## 6. Inspect the repo
|
|
92
102
|
|
|
93
103
|
```bash
|
|
94
104
|
researchloop inspect
|
|
@@ -102,7 +112,7 @@ This writes a repo profile into `.researchloop/repo-profile.json` and helps the
|
|
|
102
112
|
- log folders
|
|
103
113
|
- likely adapters
|
|
104
114
|
|
|
105
|
-
##
|
|
115
|
+
## 7. Generate the agent prompt
|
|
106
116
|
|
|
107
117
|
```bash
|
|
108
118
|
researchloop prompt --agent codex
|
|
@@ -127,6 +137,24 @@ That prompt tells the agent to:
|
|
|
127
137
|
- compare results
|
|
128
138
|
- keep the loop moving
|
|
129
139
|
|
|
140
|
+
## 7b. Use the skill pack
|
|
141
|
+
|
|
142
|
+
The npm package also ships a downloadable `skills/` folder.
|
|
143
|
+
|
|
144
|
+
It contains the same research loop as agent-local skills:
|
|
145
|
+
|
|
146
|
+
- `skills/researchloop-autoresearch/codex/SKILL.md`
|
|
147
|
+
- `skills/researchloop-autoresearch/claude-code/CLAUDE.md`
|
|
148
|
+
- `skills/researchloop-autoresearch/references/*.md`
|
|
149
|
+
|
|
150
|
+
Use those files when you want the agent itself to carry the research rules, not just the current prompt.
|
|
151
|
+
|
|
152
|
+
Typical flow:
|
|
153
|
+
|
|
154
|
+
1. Copy the Codex or Claude Code file into the skill location your agent uses.
|
|
155
|
+
2. Keep the `references/` files nearby as optional playbooks.
|
|
156
|
+
3. Pair the skill with `.researchloop/goal.md` and the `researchloop prompt` output.
|
|
157
|
+
|
|
130
158
|
You can still pass `--goal` for a one-off override, but the normal flow is to save the goal once and let the prompt command read it back.
|
|
131
159
|
|
|
132
160
|
If you want the prompt to narrow in on a family of experiments, use one of the built-in focus playbooks:
|
|
@@ -135,7 +163,7 @@ If you want the prompt to narrow in on a family of experiments, use one of the b
|
|
|
135
163
|
- `architecture`
|
|
136
164
|
- `attention`
|
|
137
165
|
|
|
138
|
-
##
|
|
166
|
+
## 8. Record and compare runs
|
|
139
167
|
|
|
140
168
|
After a run finishes:
|
|
141
169
|
|
|
@@ -161,7 +189,7 @@ Then summarize the current state:
|
|
|
161
189
|
researchloop report
|
|
162
190
|
```
|
|
163
191
|
|
|
164
|
-
##
|
|
192
|
+
## 9. Open the dashboard
|
|
165
193
|
|
|
166
194
|
Serve a local dashboard for the current repo:
|
|
167
195
|
|
|
@@ -179,7 +207,7 @@ Then open the localhost URL it prints. The dashboard reads the repo's `.research
|
|
|
179
207
|
|
|
180
208
|
It does not need accounts or auth because it stays on your machine.
|
|
181
209
|
|
|
182
|
-
##
|
|
210
|
+
## 10. Test the setup before you trust it
|
|
183
211
|
|
|
184
212
|
Run the local checks from this repo:
|
|
185
213
|
|
|
@@ -201,7 +229,7 @@ These checks verify that:
|
|
|
201
229
|
- the website copy matches the product
|
|
202
230
|
- the end-to-end flow works
|
|
203
231
|
|
|
204
|
-
##
|
|
232
|
+
## 11. Use it in a real ML repo
|
|
205
233
|
|
|
206
234
|
Once the basics work, move into a real project:
|
|
207
235
|
|
|
@@ -216,7 +244,7 @@ Then give the prompt to your AI agent and let it run the loop.
|
|
|
216
244
|
|
|
217
245
|
ResearchLoop is not trying to magically solve the model for you. It gives the agent the operating system for research: goals, baseline, logs, comparison, and continuation.
|
|
218
246
|
|
|
219
|
-
##
|
|
247
|
+
## 12. Publish to npm
|
|
220
248
|
|
|
221
249
|
The package is published to the public npm registry at [npmjs.com](https://www.npmjs.com/).
|
|
222
250
|
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "researchloop",
|
|
3
|
-
"version": "0.
|
|
3
|
+
"version": "0.2.0",
|
|
4
4
|
"description": "Install an autonomous AI research harness for Codex, Claude Code, Hermes, and other coding agents.",
|
|
5
5
|
"type": "module",
|
|
6
6
|
"bin": {
|
|
@@ -9,6 +9,7 @@
|
|
|
9
9
|
"files": [
|
|
10
10
|
"bin",
|
|
11
11
|
"templates",
|
|
12
|
+
"skills",
|
|
12
13
|
"README.md",
|
|
13
14
|
"docs/getting-started.md",
|
|
14
15
|
"CHANGELOG.md"
|
|
@@ -21,6 +22,8 @@
|
|
|
21
22
|
"test:dashboard": "bash ./scripts/test-dashboard.sh",
|
|
22
23
|
"test:setup": "bash ./scripts/test-setup.sh",
|
|
23
24
|
"test:compare": "bash ./scripts/test-compare.sh",
|
|
25
|
+
"test:run": "bash ./scripts/test-run.sh",
|
|
26
|
+
"test:scan-papers": "bash ./scripts/test-scan-papers.sh",
|
|
24
27
|
"test:prompts": "bash ./scripts/test-prompts.sh",
|
|
25
28
|
"test:focus-prompts": "bash ./scripts/test-focus-prompts.sh",
|
|
26
29
|
"test:site": "bash ./scripts/test-site.sh"
|
package/skills/README.md
ADDED
|
@@ -0,0 +1,31 @@
|
|
|
1
|
+
# ResearchLoop Skills
|
|
2
|
+
|
|
3
|
+
This folder ships downloadable agent skills for autonomous AI research.
|
|
4
|
+
|
|
5
|
+
The package keeps the core product in the CLI, dashboard, prompts, and run ledger.
|
|
6
|
+
These skills are the agent-side memory layer that makes the research loop stick.
|
|
7
|
+
|
|
8
|
+
## What is in here
|
|
9
|
+
|
|
10
|
+
- `researchloop-autoresearch/` - the main research skill pack
|
|
11
|
+
- `researchloop-autoresearch/references/` - focused playbooks for common experiment families
|
|
12
|
+
|
|
13
|
+
## How users use it
|
|
14
|
+
|
|
15
|
+
Users copy the right file into the skill folder their agent expects.
|
|
16
|
+
|
|
17
|
+
Typical mapping:
|
|
18
|
+
|
|
19
|
+
- Codex: copy `researchloop-autoresearch/codex/SKILL.md` into the local Codex skills directory
|
|
20
|
+
- Claude Code: copy `researchloop-autoresearch/claude-code/CLAUDE.md` into the Claude Code instructions or skill location they use
|
|
21
|
+
|
|
22
|
+
## What the skill pack does
|
|
23
|
+
|
|
24
|
+
- keeps the goal visible
|
|
25
|
+
- forces baseline-first behavior
|
|
26
|
+
- asks for one small experiment at a time
|
|
27
|
+
- records runs and comparisons
|
|
28
|
+
- prunes weak ideas instead of spiraling
|
|
29
|
+
|
|
30
|
+
The CLI prints prompts and creates `.researchloop/` state.
|
|
31
|
+
The skills make the agent remember how to behave while doing the work.
|
|
@@ -0,0 +1,35 @@
|
|
|
1
|
+
# ResearchLoop Autoresearch
|
|
2
|
+
|
|
3
|
+
Use this repo as an autonomous AI research loop.
|
|
4
|
+
|
|
5
|
+
Before changing code, read:
|
|
6
|
+
|
|
7
|
+
- `.researchloop/goal.md`
|
|
8
|
+
- `.researchloop/plan.md`
|
|
9
|
+
- `.researchloop/AGENTS.md`
|
|
10
|
+
- `.researchloop/scratchpad/THREAD.md`
|
|
11
|
+
- `.researchloop/repo-profile.json`
|
|
12
|
+
|
|
13
|
+
Then:
|
|
14
|
+
|
|
15
|
+
1. confirm the baseline
|
|
16
|
+
2. pick one small experiment
|
|
17
|
+
3. change one variable at a time
|
|
18
|
+
4. run the smallest valid check
|
|
19
|
+
5. record the run
|
|
20
|
+
6. compare against the baseline
|
|
21
|
+
7. prune weak branches
|
|
22
|
+
|
|
23
|
+
Use ResearchLoop to keep the loop durable:
|
|
24
|
+
|
|
25
|
+
- `researchloop goal`
|
|
26
|
+
- `researchloop inspect`
|
|
27
|
+
- `researchloop idea`
|
|
28
|
+
- `researchloop prompt`
|
|
29
|
+
- `researchloop record`
|
|
30
|
+
- `researchloop compare`
|
|
31
|
+
- `researchloop report`
|
|
32
|
+
|
|
33
|
+
Never claim improvement without a run.
|
|
34
|
+
Never skip the baseline.
|
|
35
|
+
Never let the goal drift.
|
|
@@ -0,0 +1,50 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: researchloop-autoresearch
|
|
3
|
+
description: Use when doing autonomous AI research in a machine learning repo with ResearchLoop, especially when choosing experiments, preserving baselines, or logging run results.
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# ResearchLoop Autoresearch
|
|
7
|
+
|
|
8
|
+
You are the research agent inside a repo that uses ResearchLoop.
|
|
9
|
+
|
|
10
|
+
Before changing code, read:
|
|
11
|
+
|
|
12
|
+
- `.researchloop/goal.md`
|
|
13
|
+
- `.researchloop/plan.md`
|
|
14
|
+
- `.researchloop/AGENTS.md`
|
|
15
|
+
- `.researchloop/scratchpad/THREAD.md`
|
|
16
|
+
- `.researchloop/repo-profile.json`
|
|
17
|
+
|
|
18
|
+
Then work in this order:
|
|
19
|
+
|
|
20
|
+
1. Confirm the baseline.
|
|
21
|
+
2. Propose the smallest informative next experiment.
|
|
22
|
+
3. Change one thing at a time.
|
|
23
|
+
4. Run the smallest valid check.
|
|
24
|
+
5. Record the result.
|
|
25
|
+
6. Compare against the baseline.
|
|
26
|
+
7. Prune weak branches quickly.
|
|
27
|
+
8. Continue until the goal is met or the family is exhausted.
|
|
28
|
+
|
|
29
|
+
Use the ResearchLoop commands as the control plane:
|
|
30
|
+
|
|
31
|
+
- `researchloop goal`
|
|
32
|
+
- `researchloop inspect`
|
|
33
|
+
- `researchloop prompt`
|
|
34
|
+
- `researchloop idea`
|
|
35
|
+
- `researchloop record`
|
|
36
|
+
- `researchloop compare`
|
|
37
|
+
- `researchloop report`
|
|
38
|
+
|
|
39
|
+
Do not claim improvement without a recorded run.
|
|
40
|
+
Do not stack architecture changes before the baseline is stable.
|
|
41
|
+
Do not let the loop drift away from the saved goal.
|
|
42
|
+
|
|
43
|
+
## When to use playbooks
|
|
44
|
+
|
|
45
|
+
If the task is clearly one of these families, load the matching reference:
|
|
46
|
+
|
|
47
|
+
- hyperparameters -> `references/hyperparameters.md`
|
|
48
|
+
- architecture -> `references/architecture.md`
|
|
49
|
+
- attention -> `references/attention.md`
|
|
50
|
+
|
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
# Architecture Playbook
|
|
2
|
+
|
|
3
|
+
Use this when tuning model shape or layer structure.
|
|
4
|
+
|
|
5
|
+
Try one change at a time:
|
|
6
|
+
|
|
7
|
+
- width
|
|
8
|
+
- depth
|
|
9
|
+
- feedforward size
|
|
10
|
+
- number of heads
|
|
11
|
+
- embedding size
|
|
12
|
+
- normalization placement
|
|
13
|
+
|
|
14
|
+
Rules:
|
|
15
|
+
|
|
16
|
+
- do not stack multiple architecture changes in the first pass
|
|
17
|
+
- keep the optimizer and schedule fixed
|
|
18
|
+
- compare against a reproduced baseline
|
|
19
|
+
- re-run the best candidate with a second seed
|
|
20
|
+
|
|
21
|
+
If the win does not reproduce, drop it.
|
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
# Attention Playbook
|
|
2
|
+
|
|
3
|
+
Use this when the bottleneck appears to be the attention block itself.
|
|
4
|
+
|
|
5
|
+
Try one change at a time:
|
|
6
|
+
|
|
7
|
+
- number of heads
|
|
8
|
+
- head dimension
|
|
9
|
+
- context length
|
|
10
|
+
- causal masking
|
|
11
|
+
- rotary or positional setup
|
|
12
|
+
- attention implementation
|
|
13
|
+
|
|
14
|
+
Rules:
|
|
15
|
+
|
|
16
|
+
- keep the rest of the model fixed
|
|
17
|
+
- keep the metric fixed
|
|
18
|
+
- capture throughput and loss together
|
|
19
|
+
- record the exact config diff
|
|
20
|
+
|
|
21
|
+
If the change only helps once, do not promote it.
|
|
@@ -0,0 +1,22 @@
|
|
|
1
|
+
# Hyperparameters Playbook
|
|
2
|
+
|
|
3
|
+
Use this when the likely next win is a cheap tuning change.
|
|
4
|
+
|
|
5
|
+
Try one family at a time:
|
|
6
|
+
|
|
7
|
+
- learning rate
|
|
8
|
+
- warmup
|
|
9
|
+
- optimizer
|
|
10
|
+
- weight decay
|
|
11
|
+
- batch size
|
|
12
|
+
- gradient clipping
|
|
13
|
+
|
|
14
|
+
Rules:
|
|
15
|
+
|
|
16
|
+
- keep architecture fixed
|
|
17
|
+
- keep the dataset fixed
|
|
18
|
+
- keep the metric fixed
|
|
19
|
+
- sweep only a few values
|
|
20
|
+
- record every run
|
|
21
|
+
|
|
22
|
+
Kill the family quickly if the curve is flat.
|