researchloop 0.1.0 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -1,5 +1,27 @@
1
1
  # Changelog
2
2
 
3
+ ## 0.2.0
4
+
5
+ ResearchLoop becomes a runtime, not just a folder.
6
+
7
+ New:
8
+
9
+ - `researchloop run` executes a command, streams output to a per-run log, parses a metric (default regex on `metric=N` or `"metric": N`, plus last-line JSON fallback), and auto-appends a row to `runs.jsonl`. No more manual `record`.
10
+ - `researchloop baseline` is `run` for the baseline command and also updates `goal.md` Current Best and `plan.md` Current State.
11
+ - `researchloop scan-papers` queries the arXiv API for papers relevant to the goal, writes one markdown note per result to `scratchpad/papers/`, caches responses to `~/.cache/researchloop/arxiv/`, supports `--offline`, `--since YYYY-MM`, `--limit`, `--query`, `--cache-dir`.
12
+ - `researchloop idea` now reads `scratchpad/papers/` and adds paper-derived ideas alongside the adapter playbook.
13
+
14
+ Improvements:
15
+
16
+ - Tighter adapter detection: pytorch needs a real `train*.py` script or `torch` in deps; huggingface needs `transformers` in deps. No more false positives from filename substrings.
17
+ - `candidate_config_files` no longer matches every `.json`/`.yaml`/`.toml` in the repo.
18
+ - README install command no longer hardcodes a developer machine path.
19
+ - New tests: `test:run`, `test:scan-papers`. arXiv test uses a recorded XML fixture and never hits the network.
20
+
21
+ Cleanup:
22
+
23
+ - Removed misleading `projects/researchloop` and `projects/researchloop-cli` symlinks.
24
+
3
25
  ## 0.1.0
4
26
 
5
27
  First public ResearchLoop release.
package/README.md CHANGED
@@ -8,7 +8,17 @@ It installs a durable research harness into a machine learning repo so agents li
8
8
 
9
9
  This repo is both the product and the startup home base.
10
10
 
11
- ## Install
11
+ ## Give This Prompt To Your Agent
12
+
13
+ Copy this into Codex, Claude Code, Hermes, Cursor, or another coding agent:
14
+
15
+ ```text
16
+ Set up an autonomous AI research environment in this repo using ResearchLoop.
17
+ Read the ResearchLoop docs and the .researchloop/ files, inspect the repo, establish the baseline, propose a small set of experiments, run the smallest valid change first, record every result, compare runs, and keep the research loop moving.
18
+ Use the package commands to manage goals, ideas, prompts, runs, comparisons, and reports.
19
+ ```
20
+
21
+ Then install ResearchLoop:
12
22
 
13
23
  ```bash
14
24
  npm install -g researchloop
@@ -17,7 +27,8 @@ npm install -g researchloop
17
27
  Local development from this checkout:
18
28
 
19
29
  ```bash
20
- cd /Users/vukrosic/my-life/researchloop
30
+ git clone https://github.com/vukrosic/researchloop.git
31
+ cd researchloop
21
32
  npm link
22
33
  researchloop --help
23
34
  ```
@@ -26,16 +37,18 @@ researchloop --help
26
37
 
27
38
  ```bash
28
39
  researchloop init --agent codex
29
- researchloop goal "lower validation loss"
40
+ researchloop goal "lower validation loss" --metric val_loss --direction lower \
41
+ --baseline "python train.py" --evaluation "python eval.py"
30
42
  researchloop inspect
43
+ researchloop scan-papers --limit 10
31
44
  researchloop idea --write
32
45
  researchloop prompt --agent codex
33
- researchloop prompt --agent codex --focus hyperparameters
34
- researchloop dashboard
35
- researchloop doctor
36
- researchloop record --id first-run --status complete --metric val_loss=2.31 --note "First logged experiment"
46
+ researchloop baseline
47
+ researchloop run --id lr-3e-4 --command "python train.py --lr 3e-4"
37
48
  researchloop compare --metric val_loss --direction lower
38
49
  researchloop report
50
+ researchloop dashboard
51
+ researchloop doctor
39
52
  ```
40
53
 
41
54
  Then paste the generated prompt into the coding agent.
@@ -65,6 +78,7 @@ The package does not claim to magically train every model. It gives an agent the
65
78
  ```text
66
79
  bin/ CLI entrypoint
67
80
  templates/ Harness, adapters, and agent prompts
81
+ skills/ Downloadable agent research skill packs
68
82
  docs/site/ Landing page
69
83
  docs/research/ Local testing notes and research logs
70
84
  docs/competitors/ Competitor and adjacent-project research
@@ -116,17 +130,22 @@ The startup plan is in `docs/startup/`.
116
130
  - `researchloop init` creates `.researchloop/` and agent instruction files.
117
131
  - `researchloop goal` saves a durable research objective in `.researchloop/goal.md`.
118
132
  - `researchloop inspect` writes `.researchloop/repo-profile.json`.
119
- - `researchloop idea` generates ranked experiment ideas and can write an idea note.
133
+ - `researchloop scan-papers` fetches relevant arXiv abstracts into `.researchloop/scratchpad/papers/`.
134
+ - `researchloop idea` generates ranked experiment ideas, including paper-derived ones, and can write an idea note.
120
135
  - `researchloop prompt` prints an agent-ready autonomous research prompt, with optional focus playbooks.
121
- - `researchloop dashboard` starts a local localhost dashboard for experiment tracking.
122
- - `researchloop doctor` checks basic local tooling.
123
- - `researchloop record` appends a structured run result to `runs.jsonl`.
136
+ - `researchloop baseline` runs the baseline command, parses the metric, and locks it into `goal.md` and `plan.md`.
137
+ - `researchloop run` executes a training or eval command, streams the log, parses the metric, and records the run.
138
+ - `researchloop record` appends a structured run result to `runs.jsonl` (use for manual rows).
124
139
  - `researchloop compare` ranks runs by a chosen metric.
125
140
  - `researchloop report` summarizes the run ledger.
141
+ - `researchloop dashboard` starts a local localhost dashboard for experiment tracking.
142
+ - `researchloop doctor` checks basic local tooling.
126
143
  - `npm run test:setup` runs the blank-repo and minimal-fixture setup checks.
127
144
  - `npm run test:compare` checks comparison output for a few recorded runs.
145
+ - `npm run test:run` checks `run` and `baseline` against deterministic shell commands.
146
+ - `npm run test:scan-papers` checks the arXiv scan path against a recorded XML fixture (no network).
128
147
  - `npm run test:goal` checks goal saving and prompt handoff.
129
- - `npm run test:idea` checks idea generation for a blank repo and an llm-research-kit-shaped repo.
148
+ - `npm run test:idea` checks idea generation for a blank repo, an llm-research-kit-shaped repo, and a paper-augmented repo.
130
149
  - `npm run test:dashboard` checks the local dashboard server and API.
131
150
  - `npm run test:prompts` checks prompt templates for placeholder drift.
132
151
  - `npm run test:focus-prompts` checks the hyperparameter, architecture, and attention playbooks.
@@ -136,6 +155,8 @@ The startup plan is in `docs/startup/`.
136
155
 
137
156
  ResearchLoop should stay open source at the core. The npm package, prompts, adapters, and run ledger format should be inspectable and forkable.
138
157
 
158
+ The package also ships optional skill packs under `skills/` so teams can copy the same research rules into Codex, Claude Code, or other agent-specific folders.
159
+
139
160
  Possible paid layers later:
140
161
 
141
162
  - hosted dashboard
@@ -1,9 +1,11 @@
1
1
  #!/usr/bin/env node
2
2
  import fs from "node:fs";
3
3
  import http from "node:http";
4
+ import os from "node:os";
4
5
  import path from "node:path";
5
6
  import process from "node:process";
6
- import { execSync } from "node:child_process";
7
+ import { execSync, spawn } from "node:child_process";
8
+ import { createHash } from "node:crypto";
7
9
  import { fileURLToPath } from "node:url";
8
10
 
9
11
  const __filename = fileURLToPath(import.meta.url);
@@ -132,16 +134,37 @@ function walkFiles(cwd, maxDepth = 3) {
132
134
  return out;
133
135
  }
134
136
 
137
+ function readSafe(file) {
138
+ try {
139
+ return fs.readFileSync(file, "utf8");
140
+ } catch {
141
+ return "";
142
+ }
143
+ }
144
+
145
+ function depsMention(cwd, needle) {
146
+ const candidates = ["requirements.txt", "pyproject.toml", "setup.py", "uv.lock", "Pipfile"];
147
+ const needleLower = needle.toLowerCase();
148
+ for (const name of candidates) {
149
+ const text = readSafe(path.join(cwd, name)).toLowerCase();
150
+ if (text.includes(needleLower)) {
151
+ return true;
152
+ }
153
+ }
154
+ return false;
155
+ }
156
+
135
157
  function detectRepo(cwd) {
136
158
  const files = walkFiles(cwd, 3);
137
- const lower = files.map((file) => file.toLowerCase());
138
- const has = (pattern) => lower.some((file) => file.includes(pattern));
159
+ const basenames = files.map((file) => path.basename(file));
160
+ const trainScriptPattern = /^(train|finetune|pretrain)[\w-]*\.py$/i;
161
+ const hasTrainScript = basenames.some((name) => trainScriptPattern.test(name));
139
162
 
140
163
  const adapters = ["generic"];
141
- if (has("train.py") || has("train_") || has("pytorch") || has("torch")) {
164
+ if (hasTrainScript || depsMention(cwd, "torch")) {
142
165
  adapters.push("pytorch");
143
166
  }
144
- if (has("trainer") || has("transformers") || has("huggingface")) {
167
+ if (depsMention(cwd, "transformers") || depsMention(cwd, "huggingface_hub")) {
145
168
  adapters.push("huggingface");
146
169
  }
147
170
  if (files.includes("train_llm.py") && files.includes("configs/llm_config.py")) {
@@ -154,9 +177,9 @@ function detectRepo(cwd) {
154
177
  git_branch: run("git branch --show-current", cwd) || null,
155
178
  git_status_short: run("git status --short", cwd) || null,
156
179
  package_files: existsAny(cwd, ["package.json", "pyproject.toml", "requirements.txt", "uv.lock"]),
157
- candidate_train_files: files.filter((file) => /(^|\/)(train|finetune|pretrain).*\.py$/i.test(file)).slice(0, 30),
158
- candidate_eval_files: files.filter((file) => /(^|\/)(eval|evaluate|benchmark).*\.py$/i.test(file)).slice(0, 30),
159
- candidate_config_files: files.filter((file) => /(^|\/|_)(config|cfg)[^/]*\.(py|js|ts|json|yaml|yml|toml)$|\.ya?ml$|\.toml$|\.json$/i.test(file)).slice(0, 40),
180
+ candidate_train_files: files.filter((file) => /(^|\/)(train|finetune|pretrain)[\w-]*\.py$/i.test(file)).slice(0, 30),
181
+ candidate_eval_files: files.filter((file) => /(^|\/)(eval|evaluate|benchmark)[\w-]*\.py$/i.test(file)).slice(0, 30),
182
+ candidate_config_files: files.filter((file) => /(^|\/|_)(config|cfg)[\w-]*\.(py|js|ts|json|yaml|yml|toml)$/i.test(file)).slice(0, 40),
160
183
  candidate_log_dirs: existsAny(cwd, ["logs", "runs", "wandb", "mlruns", "checkpoints", "plots"]),
161
184
  adapters: [...new Set(adapters)],
162
185
  };
@@ -715,6 +738,45 @@ function renderIdeasMarkdown(profile, goalText, ideas) {
715
738
  return lines.join("\n");
716
739
  }
717
740
 
741
+ function readPaperNotes(cwd) {
742
+ const papersDir = path.join(cwd, ".researchloop", "scratchpad", "papers");
743
+ if (!fs.existsSync(papersDir)) {
744
+ return [];
745
+ }
746
+ const out = [];
747
+ for (const entry of fs.readdirSync(papersDir, { withFileTypes: true })) {
748
+ if (!entry.isFile() || !entry.name.endsWith(".md")) continue;
749
+ const file = path.join(papersDir, entry.name);
750
+ const raw = fs.readFileSync(file, "utf8");
751
+ const titleMatch = raw.match(/^#\s+(.+?)\s*$/m);
752
+ const idMatch = raw.match(/^arXiv:\s*(.+?)\s*$/m);
753
+ out.push({
754
+ title: titleMatch ? titleMatch[1].trim() : entry.name.replace(/\.md$/, ""),
755
+ arxivId: idMatch ? idMatch[1].trim() : entry.name.replace(/\.md$/, ""),
756
+ file: path.relative(cwd, file),
757
+ });
758
+ }
759
+ return out;
760
+ }
761
+
762
+ function buildPaperIdeas(papers, goalText, startRank) {
763
+ const ideas = [];
764
+ let rank = startRank;
765
+ for (const paper of papers.slice(0, 5)) {
766
+ const shortTitle = paper.title.length > 60 ? `${paper.title.slice(0, 57)}...` : paper.title;
767
+ ideas.push({
768
+ rank,
769
+ title: `Read paper: ${shortTitle}`,
770
+ hypothesis: `arXiv ${paper.arxivId} may suggest a mechanism relevant to ${goalText || "the target metric"}.`,
771
+ change: `Read ${paper.file}, extract one concrete mechanism, and decide if it can be ported in one experiment.`,
772
+ killCriterion: "If the mechanism cannot be cleanly ported or has no reproducible result section, log the lesson and skip.",
773
+ whyNow: "Paper was fetched recently and is cheap to read before launching another sweep.",
774
+ });
775
+ rank += 1;
776
+ }
777
+ return ideas;
778
+ }
779
+
718
780
  function cmdIdea() {
719
781
  const cwd = targetDir();
720
782
  const researchDir = path.join(cwd, ".researchloop");
@@ -722,6 +784,10 @@ function cmdIdea() {
722
784
  const goalText = option("--goal", "") || readGoalSummary(path.join(researchDir, "goal.md"));
723
785
  const profile = loadRepoProfile(cwd);
724
786
  const ideas = buildIdeaList(profile, goalText);
787
+ const papers = readPaperNotes(cwd);
788
+ if (papers.length) {
789
+ ideas.push(...buildPaperIdeas(papers, goalText, ideas.length + 1));
790
+ }
725
791
  const markdown = renderIdeasMarkdown(profile, goalText, ideas);
726
792
  process.stdout.write(`${markdown}\n`);
727
793
 
@@ -852,6 +918,422 @@ function cmdRecord() {
852
918
  console.log(`Recorded run: ${row.id}`);
853
919
  }
854
920
 
921
+ function defaultMetricRegex(metricName) {
922
+ const escaped = metricName.replace(/[.*+?^${}()|[\]\\]/g, "\\$&");
923
+ return new RegExp(`["']?${escaped}["']?\\s*[:=]\\s*["']?(-?\\d+(?:\\.\\d+)?(?:[eE][+-]?\\d+)?)`, "gi");
924
+ }
925
+
926
+ function parseMetricFromOutput(output, metricName, customRegexSource) {
927
+ const regex = customRegexSource
928
+ ? new RegExp(customRegexSource, "gi")
929
+ : defaultMetricRegex(metricName);
930
+ let last = null;
931
+ let match;
932
+ while ((match = regex.exec(output)) !== null) {
933
+ last = match[1] !== undefined ? match[1] : match[0];
934
+ }
935
+ if (last !== null && Number.isFinite(Number(last))) {
936
+ return Number(last);
937
+ }
938
+ const lines = output.split("\n").map((line) => line.trim()).filter(Boolean);
939
+ for (let idx = lines.length - 1; idx >= 0; idx -= 1) {
940
+ try {
941
+ const obj = JSON.parse(lines[idx]);
942
+ if (obj && typeof obj === "object" && metricName in obj && Number.isFinite(Number(obj[metricName]))) {
943
+ return Number(obj[metricName]);
944
+ }
945
+ } catch {
946
+ // not JSON, skip
947
+ }
948
+ }
949
+ return null;
950
+ }
951
+
952
+ function spawnCommand(commandText, cwd, timeoutMs, logFile) {
953
+ return new Promise((resolve) => {
954
+ const child = spawn(commandText, { cwd, shell: true });
955
+ const chunks = [];
956
+ let timedOut = false;
957
+ const logStream = fs.createWriteStream(logFile);
958
+ const timer = setTimeout(() => {
959
+ timedOut = true;
960
+ try {
961
+ child.kill("SIGKILL");
962
+ } catch {
963
+ // already gone
964
+ }
965
+ }, timeoutMs);
966
+ child.stdout.on("data", (data) => {
967
+ chunks.push(data);
968
+ process.stdout.write(data);
969
+ logStream.write(data);
970
+ });
971
+ child.stderr.on("data", (data) => {
972
+ chunks.push(data);
973
+ process.stderr.write(data);
974
+ logStream.write(data);
975
+ });
976
+ child.on("error", (err) => {
977
+ clearTimeout(timer);
978
+ const message = `\nresearchloop: spawn error: ${err.message}\n`;
979
+ logStream.end(message);
980
+ resolve({
981
+ output: Buffer.concat(chunks).toString("utf8") + message,
982
+ exitCode: null,
983
+ timedOut,
984
+ spawnError: err.message,
985
+ });
986
+ });
987
+ child.on("close", (code) => {
988
+ clearTimeout(timer);
989
+ logStream.end();
990
+ resolve({
991
+ output: Buffer.concat(chunks).toString("utf8"),
992
+ exitCode: code,
993
+ timedOut,
994
+ spawnError: null,
995
+ });
996
+ });
997
+ });
998
+ }
999
+
1000
+ function replaceOrAppendSection(text, heading, body) {
1001
+ const escaped = heading.replace(/[.*+?^${}()|[\]\\]/g, "\\$&");
1002
+ const pattern = new RegExp(`(^## ${escaped}\\s+)([\\s\\S]*?)(?=\\n## |\\n# |$)`, "mi");
1003
+ if (pattern.test(text)) {
1004
+ return text.replace(pattern, `$1${body}\n`);
1005
+ }
1006
+ const suffix = text.endsWith("\n") ? "" : "\n";
1007
+ return `${text}${suffix}\n## ${heading}\n${body}\n`;
1008
+ }
1009
+
1010
+ function updateGoalCurrentBest(cwd, metricName, value, runId) {
1011
+ const goalFile = path.join(cwd, ".researchloop", "goal.md");
1012
+ if (!fs.existsSync(goalFile)) {
1013
+ return;
1014
+ }
1015
+ const raw = fs.readFileSync(goalFile, "utf8");
1016
+ const body = `${metricName} = ${value} (run ${runId})`;
1017
+ fs.writeFileSync(goalFile, replaceOrAppendSection(raw, "Current Best", body));
1018
+ }
1019
+
1020
+ function updatePlanBaseline(cwd, metricName, value, runId) {
1021
+ const planFile = path.join(cwd, ".researchloop", "plan.md");
1022
+ if (!fs.existsSync(planFile)) {
1023
+ return;
1024
+ }
1025
+ const raw = fs.readFileSync(planFile, "utf8");
1026
+ const body = [
1027
+ `- Baseline: ${metricName} = ${value} (run ${runId})`,
1028
+ "- Best valid result: same as baseline",
1029
+ "- Active family: none",
1030
+ "- Running jobs: none",
1031
+ "- Next action: design first experiment",
1032
+ ].join("\n");
1033
+ fs.writeFileSync(planFile, replaceOrAppendSection(raw, "Current State", body));
1034
+ }
1035
+
1036
+ function readGoalFields(cwd) {
1037
+ const goalFile = path.join(cwd, ".researchloop", "goal.md");
1038
+ const raw = readTextIfExists(goalFile);
1039
+ return {
1040
+ goal: parseMarkdownSection(raw, "Goal") || "",
1041
+ metric: parseMarkdownSection(raw, "Target Metric") || "",
1042
+ direction: parseMarkdownSection(raw, "Direction") || "",
1043
+ baseline: parseMarkdownSection(raw, "Baseline Command") || "",
1044
+ evaluation: parseMarkdownSection(raw, "Evaluation Command") || "",
1045
+ };
1046
+ }
1047
+
1048
+ async function cmdRun(isBaseline) {
1049
+ const cwd = targetDir();
1050
+ const goalFields = readGoalFields(cwd);
1051
+ const explicitCommand = option("--command", null);
1052
+ let cmdText = explicitCommand && typeof explicitCommand === "string" ? explicitCommand : "";
1053
+ if (!cmdText) {
1054
+ cmdText = isBaseline
1055
+ ? goalFields.baseline
1056
+ : (goalFields.evaluation || goalFields.baseline);
1057
+ }
1058
+ if (!cmdText || cmdText.toLowerCase() === "unknown") {
1059
+ console.error("No command to run.");
1060
+ console.error("Set one via:");
1061
+ console.error(" researchloop goal \"<text>\" --baseline \"python train.py\" --evaluation \"python eval.py\"");
1062
+ console.error("Or pass --command directly.");
1063
+ process.exitCode = 1;
1064
+ return;
1065
+ }
1066
+
1067
+ const metricName = String(option("--metric", goalFields.metric || "val_loss")).trim() || "val_loss";
1068
+ const customRegex = option("--regex", null);
1069
+ const regexSource = customRegex && typeof customRegex === "string" ? customRegex : null;
1070
+ const timeoutSec = Number(option("--timeout", 600));
1071
+ const timeoutMs = Number.isFinite(timeoutSec) && timeoutSec > 0 ? timeoutSec * 1000 : 600000;
1072
+
1073
+ const prefix = isBaseline ? "baseline" : "run";
1074
+ const id = String(option("--id", `${prefix}-${new Date().toISOString().replace(/[:.]/g, "-")}`));
1075
+ const runDir = path.join(cwd, ".researchloop", "scratchpad", "runs", id);
1076
+ ensureDir(runDir);
1077
+ const logFile = path.join(runDir, "log.txt");
1078
+
1079
+ console.log(`researchloop ${prefix}`);
1080
+ console.log(`command: ${cmdText}`);
1081
+ console.log(`metric: ${metricName}`);
1082
+ console.log(`timeout: ${timeoutMs / 1000}s`);
1083
+ console.log(`log: ${path.relative(cwd, logFile)}`);
1084
+ console.log("---");
1085
+
1086
+ const startedAt = new Date().toISOString();
1087
+ const result = await spawnCommand(cmdText, cwd, timeoutMs, logFile);
1088
+ const finishedAt = new Date().toISOString();
1089
+
1090
+ let status;
1091
+ if (result.spawnError) {
1092
+ status = "spawn_error";
1093
+ } else if (result.timedOut) {
1094
+ status = "timeout";
1095
+ } else if (result.exitCode !== 0) {
1096
+ status = "failed";
1097
+ } else {
1098
+ status = "complete";
1099
+ }
1100
+
1101
+ const metrics = {};
1102
+ const metricValue = parseMetricFromOutput(result.output, metricName, regexSource);
1103
+ if (metricValue !== null) {
1104
+ metrics[metricName] = metricValue;
1105
+ }
1106
+ if (status === "complete" && metricValue === null) {
1107
+ status = "complete_no_metric";
1108
+ }
1109
+
1110
+ const row = {
1111
+ id,
1112
+ timestamp: finishedAt,
1113
+ started_at: startedAt,
1114
+ status,
1115
+ agent: `researchloop ${prefix}`,
1116
+ command: cmdText,
1117
+ exit_code: result.exitCode,
1118
+ log: path.relative(cwd, logFile),
1119
+ metrics,
1120
+ notes: "",
1121
+ };
1122
+ const ledger = path.join(cwd, ".researchloop", "scratchpad", "runs.jsonl");
1123
+ ensureDir(path.dirname(ledger));
1124
+ fs.appendFileSync(ledger, `${JSON.stringify(row)}\n`);
1125
+
1126
+ const thread = path.join(cwd, ".researchloop", "scratchpad", "THREAD.md");
1127
+ ensureDir(path.dirname(thread));
1128
+ const metricSuffix = metricValue !== null ? ` ${metricName}=${metricValue}` : "";
1129
+ fs.appendFileSync(thread, `- ${finishedAt} ${prefix} ${id} status=${status}${metricSuffix}\n`);
1130
+
1131
+ console.log("---");
1132
+ console.log(`status: ${status}`);
1133
+ console.log(`exit_code: ${result.exitCode}`);
1134
+ if (metricValue !== null) {
1135
+ console.log(`${metricName}: ${metricValue}`);
1136
+ } else {
1137
+ console.log("metric: not parsed");
1138
+ }
1139
+ console.log(`recorded: ${id}`);
1140
+
1141
+ if (isBaseline && metricValue !== null) {
1142
+ updateGoalCurrentBest(cwd, metricName, metricValue, id);
1143
+ updatePlanBaseline(cwd, metricName, metricValue, id);
1144
+ console.log("goal.md Current Best updated.");
1145
+ console.log("plan.md Current State updated.");
1146
+ }
1147
+
1148
+ if (status === "failed" || status === "timeout" || status === "spawn_error") {
1149
+ process.exitCode = 1;
1150
+ }
1151
+ }
1152
+
1153
+ const ARXIV_API_URL = "http://export.arxiv.org/api/query";
1154
+
1155
+ function arxivCacheDir() {
1156
+ return path.join(os.homedir(), ".cache", "researchloop", "arxiv");
1157
+ }
1158
+
1159
+ function arxivCacheKey(query, limit, since) {
1160
+ return createHash("sha1")
1161
+ .update(`${query}|${limit}|${since || ""}`)
1162
+ .digest("hex")
1163
+ .slice(0, 16);
1164
+ }
1165
+
1166
+ async function fetchArxivXml({ query, limit, since, cacheDir, offline }) {
1167
+ const fixture = process.env.RESEARCHLOOP_ARXIV_FIXTURE;
1168
+ if (fixture) {
1169
+ return fs.readFileSync(fixture, "utf8");
1170
+ }
1171
+ ensureDir(cacheDir);
1172
+ const key = arxivCacheKey(query, limit, since);
1173
+ const cacheFile = path.join(cacheDir, `${key}.xml`);
1174
+ if (fs.existsSync(cacheFile)) {
1175
+ return fs.readFileSync(cacheFile, "utf8");
1176
+ }
1177
+ if (offline) {
1178
+ throw new Error(`offline mode: no cache for query "${query}" (key=${key})`);
1179
+ }
1180
+ const params = new URLSearchParams({
1181
+ search_query: query,
1182
+ sortBy: "submittedDate",
1183
+ sortOrder: "descending",
1184
+ max_results: String(limit),
1185
+ });
1186
+ const url = `${ARXIV_API_URL}?${params.toString()}`;
1187
+ const res = await fetch(url, { headers: { "User-Agent": "researchloop/0.2.0" } });
1188
+ if (!res.ok) {
1189
+ throw new Error(`arxiv fetch failed: HTTP ${res.status}`);
1190
+ }
1191
+ const xml = await res.text();
1192
+ fs.writeFileSync(cacheFile, xml);
1193
+ return xml;
1194
+ }
1195
+
1196
+ function decodeXmlEntities(text) {
1197
+ return text
1198
+ .replace(/&lt;/g, "<")
1199
+ .replace(/&gt;/g, ">")
1200
+ .replace(/&quot;/g, '"')
1201
+ .replace(/&apos;/g, "'")
1202
+ .replace(/&#39;/g, "'")
1203
+ .replace(/&amp;/g, "&");
1204
+ }
1205
+
1206
+ function extractXmlTag(block, tag) {
1207
+ const re = new RegExp(`<${tag}[^>]*>([\\s\\S]*?)<\\/${tag}>`, "i");
1208
+ const match = block.match(re);
1209
+ return match ? decodeXmlEntities(match[1]).replace(/\s+/g, " ").trim() : "";
1210
+ }
1211
+
1212
+ function parseArxivEntries(xml) {
1213
+ const entries = [];
1214
+ const entryRe = /<entry>([\s\S]*?)<\/entry>/g;
1215
+ let match;
1216
+ while ((match = entryRe.exec(xml)) !== null) {
1217
+ const block = match[1];
1218
+ const idUrl = extractXmlTag(block, "id");
1219
+ const arxivId = idUrl.replace(/^https?:\/\/arxiv\.org\/abs\//, "");
1220
+ const authorBlocks = block.match(/<author>[\s\S]*?<\/author>/g) || [];
1221
+ const authors = authorBlocks
1222
+ .map((blk) => extractXmlTag(blk, "name"))
1223
+ .filter(Boolean);
1224
+ entries.push({
1225
+ arxivId,
1226
+ idUrl,
1227
+ title: extractXmlTag(block, "title"),
1228
+ summary: extractXmlTag(block, "summary"),
1229
+ published: extractXmlTag(block, "published"),
1230
+ updated: extractXmlTag(block, "updated"),
1231
+ authors,
1232
+ });
1233
+ }
1234
+ return entries;
1235
+ }
1236
+
1237
+ function filterArxivBySince(entries, since) {
1238
+ if (!since) return entries;
1239
+ const sinceDate = new Date(since.length === 7 ? `${since}-01` : since);
1240
+ if (Number.isNaN(sinceDate.getTime())) return entries;
1241
+ return entries.filter((entry) => {
1242
+ const date = new Date(entry.published);
1243
+ return !Number.isNaN(date.getTime()) && date >= sinceDate;
1244
+ });
1245
+ }
1246
+
1247
+ function buildDefaultArxivQuery(goalFields, profile) {
1248
+ const parts = [];
1249
+ if (goalFields.goal) parts.push(goalFields.goal);
1250
+ if (goalFields.metric) parts.push(goalFields.metric);
1251
+ const adapters = (profile && profile.adapters) || [];
1252
+ if (adapters.includes("huggingface")) parts.push("transformer");
1253
+ if (adapters.includes("pytorch")) parts.push("deep learning");
1254
+ const joined = parts.filter(Boolean).join(" ").slice(0, 200).trim();
1255
+ return joined ? `all:${joined}` : "all:deep learning";
1256
+ }
1257
+
1258
+ function renderPaperMarkdown(entry) {
1259
+ const pubDate = entry.published ? entry.published.slice(0, 10) : "";
1260
+ return [
1261
+ `# ${entry.title || entry.arxivId}`,
1262
+ "",
1263
+ `arXiv: ${entry.arxivId}`,
1264
+ `Published: ${pubDate}`,
1265
+ `Authors: ${entry.authors.join(", ")}`,
1266
+ `Link: ${entry.idUrl}`,
1267
+ "",
1268
+ "## Abstract",
1269
+ "",
1270
+ entry.summary,
1271
+ "",
1272
+ "## How to port this",
1273
+ "",
1274
+ "TODO. Fill in when the paper is read.",
1275
+ "",
1276
+ ].join("\n");
1277
+ }
1278
+
1279
+ async function cmdScanPapers() {
1280
+ const cwd = targetDir();
1281
+ const goalFields = readGoalFields(cwd);
1282
+ const profile = loadRepoProfile(cwd);
1283
+ const explicitQuery = option("--query", null);
1284
+ const query = explicitQuery && typeof explicitQuery === "string"
1285
+ ? explicitQuery
1286
+ : buildDefaultArxivQuery(goalFields, profile);
1287
+ const limitRaw = Number(option("--limit", 10));
1288
+ const limit = Number.isFinite(limitRaw) && limitRaw > 0 ? Math.min(50, Math.floor(limitRaw)) : 10;
1289
+ const sinceOpt = option("--since", null);
1290
+ const since = sinceOpt && typeof sinceOpt === "string" ? sinceOpt : null;
1291
+ const offline = hasFlag("--offline");
1292
+ const cacheDirOpt = option("--cache-dir", null);
1293
+ const cacheDir = cacheDirOpt && typeof cacheDirOpt === "string" ? cacheDirOpt : arxivCacheDir();
1294
+
1295
+ console.log("researchloop scan-papers");
1296
+ console.log(`query: ${query}`);
1297
+ console.log(`limit: ${limit}`);
1298
+ if (since) console.log(`since: ${since}`);
1299
+ console.log(`cache: ${cacheDir}`);
1300
+
1301
+ let xml;
1302
+ try {
1303
+ xml = await fetchArxivXml({ query, limit, since, cacheDir, offline });
1304
+ } catch (err) {
1305
+ console.error(`scan-papers failed: ${err.message}`);
1306
+ process.exitCode = 1;
1307
+ return;
1308
+ }
1309
+
1310
+ let entries = parseArxivEntries(xml);
1311
+ entries = filterArxivBySince(entries, since);
1312
+
1313
+ const papersDir = path.join(cwd, ".researchloop", "scratchpad", "papers");
1314
+ ensureDir(papersDir);
1315
+ for (const entry of entries) {
1316
+ const safeId = entry.arxivId.replace(/[/\\]/g, "_");
1317
+ const file = path.join(papersDir, `${safeId}.md`);
1318
+ fs.writeFileSync(file, renderPaperMarkdown(entry));
1319
+ }
1320
+
1321
+ const thread = path.join(cwd, ".researchloop", "scratchpad", "THREAD.md");
1322
+ ensureDir(path.dirname(thread));
1323
+ fs.appendFileSync(
1324
+ thread,
1325
+ `- ${new Date().toISOString()} scan-papers query="${query.slice(0, 100)}" found=${entries.length}\n`
1326
+ );
1327
+
1328
+ console.log("---");
1329
+ console.log(`found: ${entries.length}`);
1330
+ for (const entry of entries) {
1331
+ const title = entry.title.length > 80 ? `${entry.title.slice(0, 77)}...` : entry.title;
1332
+ console.log(`- ${entry.arxivId} ${title}`);
1333
+ }
1334
+ console.log(`papers written to: ${path.relative(cwd, papersDir)}`);
1335
+ }
1336
+
855
1337
  function cmdHelp() {
856
1338
  console.log(`Research Loop
857
1339
 
@@ -863,6 +1345,9 @@ Usage:
863
1345
  researchloop prompt [--agent codex|claude-code|hermes|generic] [--goal TEXT] [--focus hyperparameters|architecture|attention]
864
1346
  researchloop doctor [--dir PATH] [--python PATH]
865
1347
  researchloop record [--dir PATH] [--id ID] [--status STATUS] [--metric key=value] [--note TEXT]
1348
+ researchloop run [--dir PATH] [--id ID] [--command CMD] [--metric NAME] [--regex PATTERN] [--timeout SECONDS]
1349
+ researchloop baseline [--dir PATH] [--id ID] [--command CMD] [--metric NAME] [--regex PATTERN] [--timeout SECONDS]
1350
+ researchloop scan-papers [--dir PATH] [--query TEXT] [--limit N] [--since YYYY-MM] [--cache-dir PATH] [--offline]
866
1351
  researchloop compare [--dir PATH] [--metric NAME] [--direction lower|higher]
867
1352
  researchloop dashboard [--dir PATH] [--host HOST] [--port PORT]
868
1353
  researchloop report [--dir PATH]
@@ -871,30 +1356,43 @@ Research Loop installs docs, prompts, scratchpads, and experiment ledgers for au
871
1356
  `);
872
1357
  }
873
1358
 
874
- if (hasFlag("--help") || command === "help") {
875
- cmdHelp();
876
- } else if (command === "init") {
877
- cmdInit();
878
- } else if (command === "goal") {
879
- cmdGoal();
880
- } else if (command === "inspect") {
881
- cmdInspect();
882
- } else if (command === "idea") {
883
- cmdIdea();
884
- } else if (command === "prompt") {
885
- cmdPrompt();
886
- } else if (command === "doctor") {
887
- cmdDoctor();
888
- } else if (command === "record") {
889
- cmdRecord();
890
- } else if (command === "compare") {
891
- cmdCompare();
892
- } else if (command === "dashboard") {
893
- cmdDashboard();
894
- } else if (command === "report") {
895
- cmdReport();
896
- } else {
897
- console.error(`Unknown command: ${command}`);
898
- cmdHelp();
899
- process.exitCode = 1;
1359
+ async function main() {
1360
+ if (hasFlag("--help") || command === "help") {
1361
+ cmdHelp();
1362
+ } else if (command === "init") {
1363
+ cmdInit();
1364
+ } else if (command === "goal") {
1365
+ cmdGoal();
1366
+ } else if (command === "inspect") {
1367
+ cmdInspect();
1368
+ } else if (command === "idea") {
1369
+ cmdIdea();
1370
+ } else if (command === "prompt") {
1371
+ cmdPrompt();
1372
+ } else if (command === "doctor") {
1373
+ cmdDoctor();
1374
+ } else if (command === "record") {
1375
+ cmdRecord();
1376
+ } else if (command === "run") {
1377
+ await cmdRun(false);
1378
+ } else if (command === "baseline") {
1379
+ await cmdRun(true);
1380
+ } else if (command === "scan-papers") {
1381
+ await cmdScanPapers();
1382
+ } else if (command === "compare") {
1383
+ cmdCompare();
1384
+ } else if (command === "dashboard") {
1385
+ cmdDashboard();
1386
+ } else if (command === "report") {
1387
+ cmdReport();
1388
+ } else {
1389
+ console.error(`Unknown command: ${command}`);
1390
+ cmdHelp();
1391
+ process.exitCode = 1;
1392
+ }
900
1393
  }
1394
+
1395
+ main().catch((err) => {
1396
+ console.error(err);
1397
+ process.exitCode = 1;
1398
+ });
@@ -9,7 +9,17 @@ The shortest way to think about it:
9
9
  - it creates a durable `.researchloop/` workspace
10
10
  - your AI agent uses that workspace to plan, run, compare, and record experiments
11
11
 
12
- ## 1. Install
12
+ ## 1. Give This Prompt To Your Agent
13
+
14
+ Copy this into Codex, Claude Code, Hermes, Cursor, or another coding agent:
15
+
16
+ ```text
17
+ Set up an autonomous AI research environment in this repo using ResearchLoop.
18
+ Read the ResearchLoop docs and the .researchloop/ files, inspect the repo, establish the baseline, propose a small set of experiments, run the smallest valid change first, record every result, compare runs, and keep the research loop moving.
19
+ Use the package commands to manage goals, ideas, prompts, runs, comparisons, and reports.
20
+ ```
21
+
22
+ ## 2. Install
13
23
 
14
24
  From your own machine:
15
25
 
@@ -31,7 +41,7 @@ If you want to hand this to an AI agent, the simplest instruction is:
31
41
  Install ResearchLoop, initialize the repo, inspect the project, then use the generated prompt to start the research loop.
32
42
  ```
33
43
 
34
- ## 2. Initialize a repo
44
+ ## 3. Initialize a repo
35
45
 
36
46
  Run this inside a blank folder or inside an existing ML repo:
37
47
 
@@ -64,7 +74,7 @@ researchloop init --agent hermes
64
74
  researchloop init --agent cursor
65
75
  ```
66
76
 
67
- ## 3. Set the research goal
77
+ ## 4. Set the research goal
68
78
 
69
79
  Tell ResearchLoop what the agent should optimize:
70
80
 
@@ -80,7 +90,7 @@ researchloop goal "lower validation loss" --metric val_loss --direction lower
80
90
 
81
91
  That saves the objective into `.researchloop/goal.md`, which the agent and the prompt command can read later.
82
92
 
83
- ## 4. Generate experiment ideas
93
+ ## 5. Generate experiment ideas
84
94
 
85
95
  ```bash
86
96
  researchloop idea --write
@@ -88,7 +98,7 @@ researchloop idea --write
88
98
 
89
99
  This prints a ranked list of small experiments for the current repo shape. For `llm-research-kit`, that usually means baseline checks, learning-rate sweeps, and tiny architecture changes. For a generic repo, it starts with finding the baseline and metric plumbing.
90
100
 
91
- ## 5. Inspect the repo
101
+ ## 6. Inspect the repo
92
102
 
93
103
  ```bash
94
104
  researchloop inspect
@@ -102,7 +112,7 @@ This writes a repo profile into `.researchloop/repo-profile.json` and helps the
102
112
  - log folders
103
113
  - likely adapters
104
114
 
105
- ## 6. Generate the agent prompt
115
+ ## 7. Generate the agent prompt
106
116
 
107
117
  ```bash
108
118
  researchloop prompt --agent codex
@@ -127,6 +137,24 @@ That prompt tells the agent to:
127
137
  - compare results
128
138
  - keep the loop moving
129
139
 
140
+ ## 7b. Use the skill pack
141
+
142
+ The npm package also ships a downloadable `skills/` folder.
143
+
144
+ It contains the same research loop as agent-local skills:
145
+
146
+ - `skills/researchloop-autoresearch/codex/SKILL.md`
147
+ - `skills/researchloop-autoresearch/claude-code/CLAUDE.md`
148
+ - `skills/researchloop-autoresearch/references/*.md`
149
+
150
+ Use those files when you want the agent itself to carry the research rules, not just the current prompt.
151
+
152
+ Typical flow:
153
+
154
+ 1. Copy the Codex or Claude Code file into the skill location your agent uses.
155
+ 2. Keep the `references/` files nearby as optional playbooks.
156
+ 3. Pair the skill with `.researchloop/goal.md` and the `researchloop prompt` output.
157
+
130
158
  You can still pass `--goal` for a one-off override, but the normal flow is to save the goal once and let the prompt command read it back.
131
159
 
132
160
  If you want the prompt to narrow in on a family of experiments, use one of the built-in focus playbooks:
@@ -135,7 +163,7 @@ If you want the prompt to narrow in on a family of experiments, use one of the b
135
163
  - `architecture`
136
164
  - `attention`
137
165
 
138
- ## 7. Record and compare runs
166
+ ## 8. Record and compare runs
139
167
 
140
168
  After a run finishes:
141
169
 
@@ -161,7 +189,7 @@ Then summarize the current state:
161
189
  researchloop report
162
190
  ```
163
191
 
164
- ## 8. Open the dashboard
192
+ ## 9. Open the dashboard
165
193
 
166
194
  Serve a local dashboard for the current repo:
167
195
 
@@ -179,7 +207,7 @@ Then open the localhost URL it prints. The dashboard reads the repo's `.research
179
207
 
180
208
  It does not need accounts or auth because it stays on your machine.
181
209
 
182
- ## 9. Test the setup before you trust it
210
+ ## 10. Test the setup before you trust it
183
211
 
184
212
  Run the local checks from this repo:
185
213
 
@@ -201,7 +229,7 @@ These checks verify that:
201
229
  - the website copy matches the product
202
230
  - the end-to-end flow works
203
231
 
204
- ## 10. Use it in a real ML repo
232
+ ## 11. Use it in a real ML repo
205
233
 
206
234
  Once the basics work, move into a real project:
207
235
 
@@ -216,7 +244,7 @@ Then give the prompt to your AI agent and let it run the loop.
216
244
 
217
245
  ResearchLoop is not trying to magically solve the model for you. It gives the agent the operating system for research: goals, baseline, logs, comparison, and continuation.
218
246
 
219
- ## 11. Publish to npm
247
+ ## 12. Publish to npm
220
248
 
221
249
  The package is published to the public npm registry at [npmjs.com](https://www.npmjs.com/).
222
250
 
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "researchloop",
3
- "version": "0.1.0",
3
+ "version": "0.2.0",
4
4
  "description": "Install an autonomous AI research harness for Codex, Claude Code, Hermes, and other coding agents.",
5
5
  "type": "module",
6
6
  "bin": {
@@ -9,6 +9,7 @@
9
9
  "files": [
10
10
  "bin",
11
11
  "templates",
12
+ "skills",
12
13
  "README.md",
13
14
  "docs/getting-started.md",
14
15
  "CHANGELOG.md"
@@ -21,6 +22,8 @@
21
22
  "test:dashboard": "bash ./scripts/test-dashboard.sh",
22
23
  "test:setup": "bash ./scripts/test-setup.sh",
23
24
  "test:compare": "bash ./scripts/test-compare.sh",
25
+ "test:run": "bash ./scripts/test-run.sh",
26
+ "test:scan-papers": "bash ./scripts/test-scan-papers.sh",
24
27
  "test:prompts": "bash ./scripts/test-prompts.sh",
25
28
  "test:focus-prompts": "bash ./scripts/test-focus-prompts.sh",
26
29
  "test:site": "bash ./scripts/test-site.sh"
@@ -0,0 +1,31 @@
1
+ # ResearchLoop Skills
2
+
3
+ This folder ships downloadable agent skills for autonomous AI research.
4
+
5
+ The package keeps the core product in the CLI, dashboard, prompts, and run ledger.
6
+ These skills are the agent-side memory layer that makes the research loop stick.
7
+
8
+ ## What is in here
9
+
10
+ - `researchloop-autoresearch/` - the main research skill pack
11
+ - `researchloop-autoresearch/references/` - focused playbooks for common experiment families
12
+
13
+ ## How users use it
14
+
15
+ Users copy the right file into the skill folder their agent expects.
16
+
17
+ Typical mapping:
18
+
19
+ - Codex: copy `researchloop-autoresearch/codex/SKILL.md` into the local Codex skills directory
20
+ - Claude Code: copy `researchloop-autoresearch/claude-code/CLAUDE.md` into the Claude Code instructions or skill location they use
21
+
22
+ ## What the skill pack does
23
+
24
+ - keeps the goal visible
25
+ - forces baseline-first behavior
26
+ - asks for one small experiment at a time
27
+ - records runs and comparisons
28
+ - prunes weak ideas instead of spiraling
29
+
30
+ The CLI prints prompts and creates `.researchloop/` state.
31
+ The skills make the agent remember how to behave while doing the work.
@@ -0,0 +1,35 @@
1
+ # ResearchLoop Autoresearch
2
+
3
+ Use this repo as an autonomous AI research loop.
4
+
5
+ Before changing code, read:
6
+
7
+ - `.researchloop/goal.md`
8
+ - `.researchloop/plan.md`
9
+ - `.researchloop/AGENTS.md`
10
+ - `.researchloop/scratchpad/THREAD.md`
11
+ - `.researchloop/repo-profile.json`
12
+
13
+ Then:
14
+
15
+ 1. confirm the baseline
16
+ 2. pick one small experiment
17
+ 3. change one variable at a time
18
+ 4. run the smallest valid check
19
+ 5. record the run
20
+ 6. compare against the baseline
21
+ 7. prune weak branches
22
+
23
+ Use ResearchLoop to keep the loop durable:
24
+
25
+ - `researchloop goal`
26
+ - `researchloop inspect`
27
+ - `researchloop idea`
28
+ - `researchloop prompt`
29
+ - `researchloop record`
30
+ - `researchloop compare`
31
+ - `researchloop report`
32
+
33
+ Never claim improvement without a run.
34
+ Never skip the baseline.
35
+ Never let the goal drift.
@@ -0,0 +1,50 @@
1
+ ---
2
+ name: researchloop-autoresearch
3
+ description: Use when doing autonomous AI research in a machine learning repo with ResearchLoop, especially when choosing experiments, preserving baselines, or logging run results.
4
+ ---
5
+
6
+ # ResearchLoop Autoresearch
7
+
8
+ You are the research agent inside a repo that uses ResearchLoop.
9
+
10
+ Before changing code, read:
11
+
12
+ - `.researchloop/goal.md`
13
+ - `.researchloop/plan.md`
14
+ - `.researchloop/AGENTS.md`
15
+ - `.researchloop/scratchpad/THREAD.md`
16
+ - `.researchloop/repo-profile.json`
17
+
18
+ Then work in this order:
19
+
20
+ 1. Confirm the baseline.
21
+ 2. Propose the smallest informative next experiment.
22
+ 3. Change one thing at a time.
23
+ 4. Run the smallest valid check.
24
+ 5. Record the result.
25
+ 6. Compare against the baseline.
26
+ 7. Prune weak branches quickly.
27
+ 8. Continue until the goal is met or the family is exhausted.
28
+
29
+ Use the ResearchLoop commands as the control plane:
30
+
31
+ - `researchloop goal`
32
+ - `researchloop inspect`
33
+ - `researchloop prompt`
34
+ - `researchloop idea`
35
+ - `researchloop record`
36
+ - `researchloop compare`
37
+ - `researchloop report`
38
+
39
+ Do not claim improvement without a recorded run.
40
+ Do not stack architecture changes before the baseline is stable.
41
+ Do not let the loop drift away from the saved goal.
42
+
43
+ ## When to use playbooks
44
+
45
+ If the task is clearly one of these families, load the matching reference:
46
+
47
+ - hyperparameters -> `references/hyperparameters.md`
48
+ - architecture -> `references/architecture.md`
49
+ - attention -> `references/attention.md`
50
+
@@ -0,0 +1,21 @@
1
+ # Architecture Playbook
2
+
3
+ Use this when tuning model shape or layer structure.
4
+
5
+ Try one change at a time:
6
+
7
+ - width
8
+ - depth
9
+ - feedforward size
10
+ - number of heads
11
+ - embedding size
12
+ - normalization placement
13
+
14
+ Rules:
15
+
16
+ - do not stack multiple architecture changes in the first pass
17
+ - keep the optimizer and schedule fixed
18
+ - compare against a reproduced baseline
19
+ - re-run the best candidate with a second seed
20
+
21
+ If the win does not reproduce, drop it.
@@ -0,0 +1,21 @@
1
+ # Attention Playbook
2
+
3
+ Use this when the bottleneck appears to be the attention block itself.
4
+
5
+ Try one change at a time:
6
+
7
+ - number of heads
8
+ - head dimension
9
+ - context length
10
+ - causal masking
11
+ - rotary or positional setup
12
+ - attention implementation
13
+
14
+ Rules:
15
+
16
+ - keep the rest of the model fixed
17
+ - keep the metric fixed
18
+ - capture throughput and loss together
19
+ - record the exact config diff
20
+
21
+ If the change only helps once, do not promote it.
@@ -0,0 +1,22 @@
1
+ # Hyperparameters Playbook
2
+
3
+ Use this when the likely next win is a cheap tuning change.
4
+
5
+ Try one family at a time:
6
+
7
+ - learning rate
8
+ - warmup
9
+ - optimizer
10
+ - weight decay
11
+ - batch size
12
+ - gradient clipping
13
+
14
+ Rules:
15
+
16
+ - keep architecture fixed
17
+ - keep the dataset fixed
18
+ - keep the metric fixed
19
+ - sweep only a few values
20
+ - record every run
21
+
22
+ Kill the family quickly if the curve is flat.