npm - researchloop - Versions diffs - 0.1.0 → 0.2.0 - Mend

researchloop 0.1.0 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (11) hide show

package/CHANGELOG.md +22 -0
package/README.md +33 -12
package/bin/researchloop.js +532 -34
package/docs/getting-started.md +39 -11
package/package.json +4 -1
package/skills/README.md +31 -0
package/skills/researchloop-autoresearch/claude-code/CLAUDE.md +35 -0
package/skills/researchloop-autoresearch/codex/SKILL.md +50 -0
package/skills/researchloop-autoresearch/references/architecture.md +21 -0
package/skills/researchloop-autoresearch/references/attention.md +21 -0
package/skills/researchloop-autoresearch/references/hyperparameters.md +22 -0

package/CHANGELOG.md CHANGED Viewed

@@ -1,5 +1,27 @@
 # Changelog
+## 0.2.0
+ResearchLoop becomes a runtime, not just a folder.
+New:
+- `researchloop run` executes a command, streams output to a per-run log, parses a metric (default regex on `metric=N` or `"metric": N`, plus last-line JSON fallback), and auto-appends a row to `runs.jsonl`. No more manual `record`.
+- `researchloop baseline` is `run` for the baseline command and also updates `goal.md` Current Best and `plan.md` Current State.
+- `researchloop scan-papers` queries the arXiv API for papers relevant to the goal, writes one markdown note per result to `scratchpad/papers/`, caches responses to `~/.cache/researchloop/arxiv/`, supports `--offline`, `--since YYYY-MM`, `--limit`, `--query`, `--cache-dir`.
+- `researchloop idea` now reads `scratchpad/papers/` and adds paper-derived ideas alongside the adapter playbook.
+Improvements:
+- Tighter adapter detection: pytorch needs a real `train*.py` script or `torch` in deps; huggingface needs `transformers` in deps. No more false positives from filename substrings.
+- `candidate_config_files` no longer matches every `.json`/`.yaml`/`.toml` in the repo.
+- README install command no longer hardcodes a developer machine path.
+- New tests: `test:run`, `test:scan-papers`. arXiv test uses a recorded XML fixture and never hits the network.
+Cleanup:
+- Removed misleading `projects/researchloop` and `projects/researchloop-cli` symlinks.
 ## 0.1.0
 First public ResearchLoop release.

package/README.md CHANGED Viewed

@@ -8,7 +8,17 @@ It installs a durable research harness into a machine learning repo so agents li
 This repo is both the product and the startup home base.
-## Install
+## Give This Prompt To Your Agent
+Copy this into Codex, Claude Code, Hermes, Cursor, or another coding agent:
+```text
+Set up an autonomous AI research environment in this repo using ResearchLoop.
+Read the ResearchLoop docs and the .researchloop/ files, inspect the repo, establish the baseline, propose a small set of experiments, run the smallest valid change first, record every result, compare runs, and keep the research loop moving.
+Use the package commands to manage goals, ideas, prompts, runs, comparisons, and reports.
+```
+Then install ResearchLoop:
 ```bash
 npm install -g researchloop
@@ -17,7 +27,8 @@ npm install -g researchloop
 Local development from this checkout:
 ```bash
-cd /Users/vukrosic/my-life/researchloop
+git clone https://github.com/vukrosic/researchloop.git
+cd researchloop
 npm link
 researchloop --help
 ```
@@ -26,16 +37,18 @@ researchloop --help
 ```bash
 researchloop init --agent codex
-researchloop goal "lower validation loss"
+researchloop goal "lower validation loss" --metric val_loss --direction lower \
+  --baseline "python train.py" --evaluation "python eval.py"
 researchloop inspect
+researchloop scan-papers --limit 10
 researchloop idea --write
 researchloop prompt --agent codex
-researchloop prompt --agent codex --focus hyperparameters
-researchloop dashboard
-researchloop doctor
-researchloop record --id first-run --status complete --metric val_loss=2.31 --note "First logged experiment"
+researchloop baseline
+researchloop run --id lr-3e-4 --command "python train.py --lr 3e-4"
 researchloop compare --metric val_loss --direction lower
 researchloop report
+researchloop dashboard
+researchloop doctor
 ```
 Then paste the generated prompt into the coding agent.
@@ -65,6 +78,7 @@ The package does not claim to magically train every model. It gives an agent the
 ```text
 bin/                  CLI entrypoint
 templates/            Harness, adapters, and agent prompts
+skills/               Downloadable agent research skill packs
 docs/site/            Landing page
 docs/research/        Local testing notes and research logs
 docs/competitors/     Competitor and adjacent-project research
@@ -116,17 +130,22 @@ The startup plan is in `docs/startup/`.
 - `researchloop init` creates `.researchloop/` and agent instruction files.
 - `researchloop goal` saves a durable research objective in `.researchloop/goal.md`.
 - `researchloop inspect` writes `.researchloop/repo-profile.json`.
-- `researchloop idea` generates ranked experiment ideas and can write an idea note.
+- `researchloop scan-papers` fetches relevant arXiv abstracts into `.researchloop/scratchpad/papers/`.
+- `researchloop idea` generates ranked experiment ideas, including paper-derived ones, and can write an idea note.
 - `researchloop prompt` prints an agent-ready autonomous research prompt, with optional focus playbooks.
-- `researchloop dashboard` starts a local localhost dashboard for experiment tracking.
-- `researchloop doctor` checks basic local tooling.
-- `researchloop record` appends a structured run result to `runs.jsonl`.
+- `researchloop baseline` runs the baseline command, parses the metric, and locks it into `goal.md` and `plan.md`.
+- `researchloop run` executes a training or eval command, streams the log, parses the metric, and records the run.
+- `researchloop record` appends a structured run result to `runs.jsonl` (use for manual rows).
 - `researchloop compare` ranks runs by a chosen metric.
 - `researchloop report` summarizes the run ledger.
+- `researchloop dashboard` starts a local localhost dashboard for experiment tracking.
+- `researchloop doctor` checks basic local tooling.
 - `npm run test:setup` runs the blank-repo and minimal-fixture setup checks.
 - `npm run test:compare` checks comparison output for a few recorded runs.
+- `npm run test:run` checks `run` and `baseline` against deterministic shell commands.
+- `npm run test:scan-papers` checks the arXiv scan path against a recorded XML fixture (no network).
 - `npm run test:goal` checks goal saving and prompt handoff.
-- `npm run test:idea` checks idea generation for a blank repo and an llm-research-kit-shaped repo.
+- `npm run test:idea` checks idea generation for a blank repo, an llm-research-kit-shaped repo, and a paper-augmented repo.
 - `npm run test:dashboard` checks the local dashboard server and API.
 - `npm run test:prompts` checks prompt templates for placeholder drift.
 - `npm run test:focus-prompts` checks the hyperparameter, architecture, and attention playbooks.
@@ -136,6 +155,8 @@ The startup plan is in `docs/startup/`.
 ResearchLoop should stay open source at the core. The npm package, prompts, adapters, and run ledger format should be inspectable and forkable.
+The package also ships optional skill packs under `skills/` so teams can copy the same research rules into Codex, Claude Code, or other agent-specific folders.
 Possible paid layers later:
 - hosted dashboard

package/bin/researchloop.js CHANGED Viewed

@@ -1,9 +1,11 @@
 #!/usr/bin/env node
 import fs from "node:fs";
 import http from "node:http";
+import os from "node:os";
 import path from "node:path";
 import process from "node:process";
-import { execSync } from "node:child_process";
+import { execSync, spawn } from "node:child_process";
+import { createHash } from "node:crypto";
 import { fileURLToPath } from "node:url";
 const __filename = fileURLToPath(import.meta.url);
@@ -132,16 +134,37 @@ function walkFiles(cwd, maxDepth = 3) {
   return out;
 }
+function readSafe(file) {
+  try {
+    return fs.readFileSync(file, "utf8");
+  } catch {
+    return "";
+  }
+}
+function depsMention(cwd, needle) {
+  const candidates = ["requirements.txt", "pyproject.toml", "setup.py", "uv.lock", "Pipfile"];
+  const needleLower = needle.toLowerCase();
+  for (const name of candidates) {
+    const text = readSafe(path.join(cwd, name)).toLowerCase();
+    if (text.includes(needleLower)) {
+      return true;
+    }
+  }
+  return false;
+}
 function detectRepo(cwd) {
   const files = walkFiles(cwd, 3);
-  const lower = files.map((file) => file.toLowerCase());
-  const has = (pattern) => lower.some((file) => file.includes(pattern));
+  const basenames = files.map((file) => path.basename(file));
+  const trainScriptPattern = /^(train|finetune|pretrain)[\w-]*\.py$/i;
+  const hasTrainScript = basenames.some((name) => trainScriptPattern.test(name));
   const adapters = ["generic"];
-  if (has("train.py") || has("train_") || has("pytorch") || has("torch")) {
+  if (hasTrainScript || depsMention(cwd, "torch")) {
     adapters.push("pytorch");
   }
-  if (has("trainer") || has("transformers") || has("huggingface")) {
+  if (depsMention(cwd, "transformers") || depsMention(cwd, "huggingface_hub")) {
     adapters.push("huggingface");
   }
   if (files.includes("train_llm.py") && files.includes("configs/llm_config.py")) {
@@ -154,9 +177,9 @@ function detectRepo(cwd) {
     git_branch: run("git branch --show-current", cwd) || null,
     git_status_short: run("git status --short", cwd) || null,
     package_files: existsAny(cwd, ["package.json", "pyproject.toml", "requirements.txt", "uv.lock"]),
-    candidate_train_files: files.filter((file) => /(^|\/)(train|finetune|pretrain).*\.py$/i.test(file)).slice(0, 30),
-    candidate_eval_files: files.filter((file) => /(^|\/)(eval|evaluate|benchmark).*\.py$/i.test(file)).slice(0, 30),
-    candidate_config_files: files.filter((file) => /(^|\/|_)(config|cfg)[^/]*\.(py|js|ts|json|yaml|yml|toml)$|\.ya?ml$|\.toml$|\.json$/i.test(file)).slice(0, 40),
+    candidate_train_files: files.filter((file) => /(^|\/)(train|finetune|pretrain)[\w-]*\.py$/i.test(file)).slice(0, 30),
+    candidate_eval_files: files.filter((file) => /(^|\/)(eval|evaluate|benchmark)[\w-]*\.py$/i.test(file)).slice(0, 30),
+    candidate_config_files: files.filter((file) => /(^|\/|_)(config|cfg)[\w-]*\.(py|js|ts|json|yaml|yml|toml)$/i.test(file)).slice(0, 40),
     candidate_log_dirs: existsAny(cwd, ["logs", "runs", "wandb", "mlruns", "checkpoints", "plots"]),
     adapters: [...new Set(adapters)],
   };
@@ -715,6 +738,45 @@ function renderIdeasMarkdown(profile, goalText, ideas) {
   return lines.join("\n");
 }
+function readPaperNotes(cwd) {
+  const papersDir = path.join(cwd, ".researchloop", "scratchpad", "papers");
+  if (!fs.existsSync(papersDir)) {
+    return [];
+  }
+  const out = [];
+  for (const entry of fs.readdirSync(papersDir, { withFileTypes: true })) {
+    if (!entry.isFile() || !entry.name.endsWith(".md")) continue;
+    const file = path.join(papersDir, entry.name);
+    const raw = fs.readFileSync(file, "utf8");
+    const titleMatch = raw.match(/^#\s+(.+?)\s*$/m);
+    const idMatch = raw.match(/^arXiv:\s*(.+?)\s*$/m);
+    out.push({
+      title: titleMatch ? titleMatch[1].trim() : entry.name.replace(/\.md$/, ""),
+      arxivId: idMatch ? idMatch[1].trim() : entry.name.replace(/\.md$/, ""),
+      file: path.relative(cwd, file),
+    });
+  }
+  return out;
+}
+function buildPaperIdeas(papers, goalText, startRank) {
+  const ideas = [];
+  let rank = startRank;
+  for (const paper of papers.slice(0, 5)) {
+    const shortTitle = paper.title.length > 60 ? `${paper.title.slice(0, 57)}...` : paper.title;
+    ideas.push({
+      rank,
+      title: `Read paper: ${shortTitle}`,
+      hypothesis: `arXiv ${paper.arxivId} may suggest a mechanism relevant to ${goalText || "the target metric"}.`,
+      change: `Read ${paper.file}, extract one concrete mechanism, and decide if it can be ported in one experiment.`,
+      killCriterion: "If the mechanism cannot be cleanly ported or has no reproducible result section, log the lesson and skip.",
+      whyNow: "Paper was fetched recently and is cheap to read before launching another sweep.",
+    });
+    rank += 1;
+  }
+  return ideas;
+}
 function cmdIdea() {
   const cwd = targetDir();
   const researchDir = path.join(cwd, ".researchloop");
@@ -722,6 +784,10 @@ function cmdIdea() {
   const goalText = option("--goal", "") || readGoalSummary(path.join(researchDir, "goal.md"));
   const profile = loadRepoProfile(cwd);
   const ideas = buildIdeaList(profile, goalText);
+  const papers = readPaperNotes(cwd);
+  if (papers.length) {
+    ideas.push(...buildPaperIdeas(papers, goalText, ideas.length + 1));
+  }
   const markdown = renderIdeasMarkdown(profile, goalText, ideas);
   process.stdout.write(`${markdown}\n`);
@@ -852,6 +918,422 @@ function cmdRecord() {
   console.log(`Recorded run: ${row.id}`);
 }
+function defaultMetricRegex(metricName) {
+  const escaped = metricName.replace(/[.*+?^${}()|[\]\\]/g, "\\$&");
+  return new RegExp(`["']?${escaped}["']?\\s*[:=]\\s*["']?(-?\\d+(?:\\.\\d+)?(?:[eE][+-]?\\d+)?)`, "gi");
+}
+function parseMetricFromOutput(output, metricName, customRegexSource) {
+  const regex = customRegexSource
+    ? new RegExp(customRegexSource, "gi")
+    : defaultMetricRegex(metricName);
+  let last = null;
+  let match;
+  while ((match = regex.exec(output)) !== null) {
+    last = match[1] !== undefined ? match[1] : match[0];
+  }
+  if (last !== null && Number.isFinite(Number(last))) {
+    return Number(last);
+  }
+  const lines = output.split("\n").map((line) => line.trim()).filter(Boolean);
+  for (let idx = lines.length - 1; idx >= 0; idx -= 1) {
+    try {
+      const obj = JSON.parse(lines[idx]);
+      if (obj && typeof obj === "object" && metricName in obj && Number.isFinite(Number(obj[metricName]))) {
+        return Number(obj[metricName]);
+      }
+    } catch {
+      // not JSON, skip
+    }
+  }
+  return null;
+}
+function spawnCommand(commandText, cwd, timeoutMs, logFile) {
+  return new Promise((resolve) => {
+    const child = spawn(commandText, { cwd, shell: true });
+    const chunks = [];
+    let timedOut = false;
+    const logStream = fs.createWriteStream(logFile);
+    const timer = setTimeout(() => {
+      timedOut = true;
+      try {
+        child.kill("SIGKILL");
+      } catch {
+        // already gone
+      }
+    }, timeoutMs);
+    child.stdout.on("data", (data) => {
+      chunks.push(data);
+      process.stdout.write(data);
+      logStream.write(data);
+    });
+    child.stderr.on("data", (data) => {
+      chunks.push(data);
+      process.stderr.write(data);
+      logStream.write(data);
+    });
+    child.on("error", (err) => {
+      clearTimeout(timer);
+      const message = `\nresearchloop: spawn error: ${err.message}\n`;
+      logStream.end(message);
+      resolve({
+        output: Buffer.concat(chunks).toString("utf8") + message,
+        exitCode: null,
+        timedOut,
+        spawnError: err.message,
+      });
+    });
+    child.on("close", (code) => {
+      clearTimeout(timer);
+      logStream.end();
+      resolve({
+        output: Buffer.concat(chunks).toString("utf8"),
+        exitCode: code,
+        timedOut,
+        spawnError: null,
+      });
+    });
+  });
+}
+function replaceOrAppendSection(text, heading, body) {
+  const escaped = heading.replace(/[.*+?^${}()|[\]\\]/g, "\\$&");
+  const pattern = new RegExp(`(^## ${escaped}\\s+)([\\s\\S]*?)(?=\\n## |\\n# |$)`, "mi");
+  if (pattern.test(text)) {
+    return text.replace(pattern, `$1${body}\n`);
+  }
+  const suffix = text.endsWith("\n") ? "" : "\n";
+  return `${text}${suffix}\n## ${heading}\n${body}\n`;
+}
+function updateGoalCurrentBest(cwd, metricName, value, runId) {
+  const goalFile = path.join(cwd, ".researchloop", "goal.md");
+  if (!fs.existsSync(goalFile)) {
+    return;
+  }
+  const raw = fs.readFileSync(goalFile, "utf8");
+  const body = `${metricName} = ${value} (run ${runId})`;
+  fs.writeFileSync(goalFile, replaceOrAppendSection(raw, "Current Best", body));
+}
+function updatePlanBaseline(cwd, metricName, value, runId) {
+  const planFile = path.join(cwd, ".researchloop", "plan.md");
+  if (!fs.existsSync(planFile)) {
+    return;
+  }
+  const raw = fs.readFileSync(planFile, "utf8");
+  const body = [
+    `- Baseline: ${metricName} = ${value} (run ${runId})`,
+    "- Best valid result: same as baseline",
+    "- Active family: none",
+    "- Running jobs: none",
+    "- Next action: design first experiment",
+  ].join("\n");
+  fs.writeFileSync(planFile, replaceOrAppendSection(raw, "Current State", body));
+}
+function readGoalFields(cwd) {
+  const goalFile = path.join(cwd, ".researchloop", "goal.md");
+  const raw = readTextIfExists(goalFile);
+  return {
+    goal: parseMarkdownSection(raw, "Goal") || "",
+    metric: parseMarkdownSection(raw, "Target Metric") || "",
+    direction: parseMarkdownSection(raw, "Direction") || "",
+    baseline: parseMarkdownSection(raw, "Baseline Command") || "",
+    evaluation: parseMarkdownSection(raw, "Evaluation Command") || "",
+  };
+}
+async function cmdRun(isBaseline) {
+  const cwd = targetDir();
+  const goalFields = readGoalFields(cwd);
+  const explicitCommand = option("--command", null);
+  let cmdText = explicitCommand && typeof explicitCommand === "string" ? explicitCommand : "";
+  if (!cmdText) {
+    cmdText = isBaseline
+      ? goalFields.baseline
+      : (goalFields.evaluation || goalFields.baseline);
+  }
+  if (!cmdText || cmdText.toLowerCase() === "unknown") {
+    console.error("No command to run.");
+    console.error("Set one via:");
+    console.error("  researchloop goal \"<text>\" --baseline \"python train.py\" --evaluation \"python eval.py\"");
+    console.error("Or pass --command directly.");
+    process.exitCode = 1;
+    return;
+  }
+  const metricName = String(option("--metric", goalFields.metric || "val_loss")).trim() || "val_loss";
+  const customRegex = option("--regex", null);
+  const regexSource = customRegex && typeof customRegex === "string" ? customRegex : null;
+  const timeoutSec = Number(option("--timeout", 600));
+  const timeoutMs = Number.isFinite(timeoutSec) && timeoutSec > 0 ? timeoutSec * 1000 : 600000;
+  const prefix = isBaseline ? "baseline" : "run";
+  const id = String(option("--id", `${prefix}-${new Date().toISOString().replace(/[:.]/g, "-")}`));
+  const runDir = path.join(cwd, ".researchloop", "scratchpad", "runs", id);
+  ensureDir(runDir);
+  const logFile = path.join(runDir, "log.txt");
+  console.log(`researchloop ${prefix}`);
+  console.log(`command: ${cmdText}`);
+  console.log(`metric: ${metricName}`);
+  console.log(`timeout: ${timeoutMs / 1000}s`);
+  console.log(`log: ${path.relative(cwd, logFile)}`);
+  console.log("---");
+  const startedAt = new Date().toISOString();
+  const result = await spawnCommand(cmdText, cwd, timeoutMs, logFile);
+  const finishedAt = new Date().toISOString();
+  let status;
+  if (result.spawnError) {
+    status = "spawn_error";
+  } else if (result.timedOut) {
+    status = "timeout";
+  } else if (result.exitCode !== 0) {
+    status = "failed";
+  } else {
+    status = "complete";
+  }
+  const metrics = {};
+  const metricValue = parseMetricFromOutput(result.output, metricName, regexSource);
+  if (metricValue !== null) {
+    metrics[metricName] = metricValue;
+  }
+  if (status === "complete" && metricValue === null) {
+    status = "complete_no_metric";
+  }
+  const row = {
+    id,
+    timestamp: finishedAt,
+    started_at: startedAt,
+    status,
+    agent: `researchloop ${prefix}`,
+    command: cmdText,
+    exit_code: result.exitCode,
+    log: path.relative(cwd, logFile),
+    metrics,
+    notes: "",
+  };
+  const ledger = path.join(cwd, ".researchloop", "scratchpad", "runs.jsonl");
+  ensureDir(path.dirname(ledger));
+  fs.appendFileSync(ledger, `${JSON.stringify(row)}\n`);
+  const thread = path.join(cwd, ".researchloop", "scratchpad", "THREAD.md");
+  ensureDir(path.dirname(thread));
+  const metricSuffix = metricValue !== null ? ` ${metricName}=${metricValue}` : "";
+  fs.appendFileSync(thread, `- ${finishedAt} ${prefix} ${id} status=${status}${metricSuffix}\n`);
+  console.log("---");
+  console.log(`status: ${status}`);
+  console.log(`exit_code: ${result.exitCode}`);
+  if (metricValue !== null) {
+    console.log(`${metricName}: ${metricValue}`);
+  } else {
+    console.log("metric: not parsed");
+  }
+  console.log(`recorded: ${id}`);
+  if (isBaseline && metricValue !== null) {
+    updateGoalCurrentBest(cwd, metricName, metricValue, id);
+    updatePlanBaseline(cwd, metricName, metricValue, id);
+    console.log("goal.md Current Best updated.");
+    console.log("plan.md Current State updated.");
+  }
+  if (status === "failed" || status === "timeout" || status === "spawn_error") {
+    process.exitCode = 1;
+  }
+}
+const ARXIV_API_URL = "http://export.arxiv.org/api/query";
+function arxivCacheDir() {
+  return path.join(os.homedir(), ".cache", "researchloop", "arxiv");
+}
+function arxivCacheKey(query, limit, since) {
+  return createHash("sha1")
+    .update(`${query}|${limit}|${since || ""}`)
+    .digest("hex")
+    .slice(0, 16);
+}
+async function fetchArxivXml({ query, limit, since, cacheDir, offline }) {
+  const fixture = process.env.RESEARCHLOOP_ARXIV_FIXTURE;
+  if (fixture) {
+    return fs.readFileSync(fixture, "utf8");
+  }
+  ensureDir(cacheDir);
+  const key = arxivCacheKey(query, limit, since);
+  const cacheFile = path.join(cacheDir, `${key}.xml`);
+  if (fs.existsSync(cacheFile)) {
+    return fs.readFileSync(cacheFile, "utf8");
+  }
+  if (offline) {
+    throw new Error(`offline mode: no cache for query "${query}" (key=${key})`);
+  }
+  const params = new URLSearchParams({
+    search_query: query,
+    sortBy: "submittedDate",
+    sortOrder: "descending",
+    max_results: String(limit),
+  });
+  const url = `${ARXIV_API_URL}?${params.toString()}`;
+  const res = await fetch(url, { headers: { "User-Agent": "researchloop/0.2.0" } });
+  if (!res.ok) {
+    throw new Error(`arxiv fetch failed: HTTP ${res.status}`);
+  }
+  const xml = await res.text();
+  fs.writeFileSync(cacheFile, xml);
+  return xml;
+}
+function decodeXmlEntities(text) {
+  return text
+    .replace(/&lt;/g, "<")
+    .replace(/&gt;/g, ">")
+    .replace(/&quot;/g, '"')
+    .replace(/&apos;/g, "'")
+    .replace(/&#39;/g, "'")
+    .replace(/&amp;/g, "&");
+}
+function extractXmlTag(block, tag) {
+  const re = new RegExp(`<${tag}[^>]*>([\\s\\S]*?)<\\/${tag}>`, "i");
+  const match = block.match(re);
+  return match ? decodeXmlEntities(match[1]).replace(/\s+/g, " ").trim() : "";
+}
+function parseArxivEntries(xml) {
+  const entries = [];
+  const entryRe = /<entry>([\s\S]*?)<\/entry>/g;
+  let match;
+  while ((match = entryRe.exec(xml)) !== null) {
+    const block = match[1];
+    const idUrl = extractXmlTag(block, "id");
+    const arxivId = idUrl.replace(/^https?:\/\/arxiv\.org\/abs\//, "");
+    const authorBlocks = block.match(/<author>[\s\S]*?<\/author>/g) || [];
+    const authors = authorBlocks
+      .map((blk) => extractXmlTag(blk, "name"))
+      .filter(Boolean);
+    entries.push({
+      arxivId,
+      idUrl,
+      title: extractXmlTag(block, "title"),
+      summary: extractXmlTag(block, "summary"),
+      published: extractXmlTag(block, "published"),
+      updated: extractXmlTag(block, "updated"),
+      authors,
+    });
+  }
+  return entries;
+}
+function filterArxivBySince(entries, since) {
+  if (!since) return entries;
+  const sinceDate = new Date(since.length === 7 ? `${since}-01` : since);
+  if (Number.isNaN(sinceDate.getTime())) return entries;
+  return entries.filter((entry) => {
+    const date = new Date(entry.published);
+    return !Number.isNaN(date.getTime()) && date >= sinceDate;
+  });
+}
+function buildDefaultArxivQuery(goalFields, profile) {
+  const parts = [];
+  if (goalFields.goal) parts.push(goalFields.goal);
+  if (goalFields.metric) parts.push(goalFields.metric);
+  const adapters = (profile && profile.adapters) || [];
+  if (adapters.includes("huggingface")) parts.push("transformer");
+  if (adapters.includes("pytorch")) parts.push("deep learning");
+  const joined = parts.filter(Boolean).join(" ").slice(0, 200).trim();
+  return joined ? `all:${joined}` : "all:deep learning";
+}
+function renderPaperMarkdown(entry) {
+  const pubDate = entry.published ? entry.published.slice(0, 10) : "";
+  return [
+    `# ${entry.title || entry.arxivId}`,
+    "",
+    `arXiv: ${entry.arxivId}`,
+    `Published: ${pubDate}`,
+    `Authors: ${entry.authors.join(", ")}`,
+    `Link: ${entry.idUrl}`,
+    "",
+    "## Abstract",
+    "",
+    entry.summary,
+    "",
+    "## How to port this",
+    "",
+    "TODO. Fill in when the paper is read.",
+    "",
+  ].join("\n");
+}
+async function cmdScanPapers() {
+  const cwd = targetDir();
+  const goalFields = readGoalFields(cwd);
+  const profile = loadRepoProfile(cwd);
+  const explicitQuery = option("--query", null);
+  const query = explicitQuery && typeof explicitQuery === "string"
+    ? explicitQuery
+    : buildDefaultArxivQuery(goalFields, profile);
+  const limitRaw = Number(option("--limit", 10));
+  const limit = Number.isFinite(limitRaw) && limitRaw > 0 ? Math.min(50, Math.floor(limitRaw)) : 10;
+  const sinceOpt = option("--since", null);
+  const since = sinceOpt && typeof sinceOpt === "string" ? sinceOpt : null;
+  const offline = hasFlag("--offline");
+  const cacheDirOpt = option("--cache-dir", null);
+  const cacheDir = cacheDirOpt && typeof cacheDirOpt === "string" ? cacheDirOpt : arxivCacheDir();
+  console.log("researchloop scan-papers");
+  console.log(`query: ${query}`);
+  console.log(`limit: ${limit}`);
+  if (since) console.log(`since: ${since}`);
+  console.log(`cache: ${cacheDir}`);
+  let xml;
+  try {
+    xml = await fetchArxivXml({ query, limit, since, cacheDir, offline });
+  } catch (err) {
+    console.error(`scan-papers failed: ${err.message}`);
+    process.exitCode = 1;
+    return;
+  }
+  let entries = parseArxivEntries(xml);
+  entries = filterArxivBySince(entries, since);
+  const papersDir = path.join(cwd, ".researchloop", "scratchpad", "papers");
+  ensureDir(papersDir);
+  for (const entry of entries) {
+    const safeId = entry.arxivId.replace(/[/\\]/g, "_");
+    const file = path.join(papersDir, `${safeId}.md`);
+    fs.writeFileSync(file, renderPaperMarkdown(entry));
+  }
+  const thread = path.join(cwd, ".researchloop", "scratchpad", "THREAD.md");
+  ensureDir(path.dirname(thread));
+  fs.appendFileSync(
+    thread,
+    `- ${new Date().toISOString()} scan-papers query="${query.slice(0, 100)}" found=${entries.length}\n`
+  );
+  console.log("---");
+  console.log(`found: ${entries.length}`);
+  for (const entry of entries) {
+    const title = entry.title.length > 80 ? `${entry.title.slice(0, 77)}...` : entry.title;
+    console.log(`- ${entry.arxivId} ${title}`);
+  }
+  console.log(`papers written to: ${path.relative(cwd, papersDir)}`);
+}
 function cmdHelp() {
   console.log(`Research Loop
@@ -863,6 +1345,9 @@ Usage:
   researchloop prompt [--agent codex|claude-code|hermes|generic] [--goal TEXT] [--focus hyperparameters|architecture|attention]
   researchloop doctor [--dir PATH] [--python PATH]
   researchloop record [--dir PATH] [--id ID] [--status STATUS] [--metric key=value] [--note TEXT]
+  researchloop run [--dir PATH] [--id ID] [--command CMD] [--metric NAME] [--regex PATTERN] [--timeout SECONDS]
+  researchloop baseline [--dir PATH] [--id ID] [--command CMD] [--metric NAME] [--regex PATTERN] [--timeout SECONDS]
+  researchloop scan-papers [--dir PATH] [--query TEXT] [--limit N] [--since YYYY-MM] [--cache-dir PATH] [--offline]
   researchloop compare [--dir PATH] [--metric NAME] [--direction lower|higher]
   researchloop dashboard [--dir PATH] [--host HOST] [--port PORT]
   researchloop report [--dir PATH]
@@ -871,30 +1356,43 @@ Research Loop installs docs, prompts, scratchpads, and experiment ledgers for au
 `);
 }
-if (hasFlag("--help") || command === "help") {
-  cmdHelp();
-} else if (command === "init") {
-  cmdInit();
-} else if (command === "goal") {
-  cmdGoal();
-} else if (command === "inspect") {
-  cmdInspect();
-} else if (command === "idea") {
-  cmdIdea();
-} else if (command === "prompt") {
-  cmdPrompt();
-} else if (command === "doctor") {
-  cmdDoctor();
-} else if (command === "record") {
-  cmdRecord();
-} else if (command === "compare") {
-  cmdCompare();
-} else if (command === "dashboard") {
-  cmdDashboard();
-} else if (command === "report") {
-  cmdReport();
-} else {
-  console.error(`Unknown command: ${command}`);
-  cmdHelp();
-  process.exitCode = 1;
+async function main() {
+  if (hasFlag("--help") || command === "help") {
+    cmdHelp();
+  } else if (command === "init") {
+    cmdInit();
+  } else if (command === "goal") {
+    cmdGoal();
+  } else if (command === "inspect") {
+    cmdInspect();
+  } else if (command === "idea") {
+    cmdIdea();
+  } else if (command === "prompt") {
+    cmdPrompt();
+  } else if (command === "doctor") {
+    cmdDoctor();
+  } else if (command === "record") {
+    cmdRecord();
+  } else if (command === "run") {
+    await cmdRun(false);
+  } else if (command === "baseline") {
+    await cmdRun(true);
+  } else if (command === "scan-papers") {
+    await cmdScanPapers();
+  } else if (command === "compare") {
+    cmdCompare();
+  } else if (command === "dashboard") {
+    cmdDashboard();
+  } else if (command === "report") {
+    cmdReport();
+  } else {
+    console.error(`Unknown command: ${command}`);
+    cmdHelp();
+    process.exitCode = 1;
+  }
 }
+main().catch((err) => {
+  console.error(err);
+  process.exitCode = 1;
+});

package/docs/getting-started.md CHANGED Viewed

@@ -9,7 +9,17 @@ The shortest way to think about it:
 - it creates a durable `.researchloop/` workspace
 - your AI agent uses that workspace to plan, run, compare, and record experiments
-## 1. Install
+## 1. Give This Prompt To Your Agent
+Copy this into Codex, Claude Code, Hermes, Cursor, or another coding agent:
+```text
+Set up an autonomous AI research environment in this repo using ResearchLoop.
+Read the ResearchLoop docs and the .researchloop/ files, inspect the repo, establish the baseline, propose a small set of experiments, run the smallest valid change first, record every result, compare runs, and keep the research loop moving.
+Use the package commands to manage goals, ideas, prompts, runs, comparisons, and reports.
+```
+## 2. Install
 From your own machine:
@@ -31,7 +41,7 @@ If you want to hand this to an AI agent, the simplest instruction is:
 Install ResearchLoop, initialize the repo, inspect the project, then use the generated prompt to start the research loop.
 ```
-## 2. Initialize a repo
+## 3. Initialize a repo
 Run this inside a blank folder or inside an existing ML repo:
@@ -64,7 +74,7 @@ researchloop init --agent hermes
 researchloop init --agent cursor
 ```
-## 3. Set the research goal
+## 4. Set the research goal
 Tell ResearchLoop what the agent should optimize:
@@ -80,7 +90,7 @@ researchloop goal "lower validation loss" --metric val_loss --direction lower
 That saves the objective into `.researchloop/goal.md`, which the agent and the prompt command can read later.
-## 4. Generate experiment ideas
+## 5. Generate experiment ideas
 ```bash
 researchloop idea --write
@@ -88,7 +98,7 @@ researchloop idea --write
 This prints a ranked list of small experiments for the current repo shape. For `llm-research-kit`, that usually means baseline checks, learning-rate sweeps, and tiny architecture changes. For a generic repo, it starts with finding the baseline and metric plumbing.
-## 5. Inspect the repo
+## 6. Inspect the repo
 ```bash
 researchloop inspect
@@ -102,7 +112,7 @@ This writes a repo profile into `.researchloop/repo-profile.json` and helps the
 - log folders
 - likely adapters
-## 6. Generate the agent prompt
+## 7. Generate the agent prompt
 ```bash
 researchloop prompt --agent codex
@@ -127,6 +137,24 @@ That prompt tells the agent to:
 - compare results
 - keep the loop moving
+## 7b. Use the skill pack
+The npm package also ships a downloadable `skills/` folder.
+It contains the same research loop as agent-local skills:
+- `skills/researchloop-autoresearch/codex/SKILL.md`
+- `skills/researchloop-autoresearch/claude-code/CLAUDE.md`
+- `skills/researchloop-autoresearch/references/*.md`
+Use those files when you want the agent itself to carry the research rules, not just the current prompt.
+Typical flow:
+1. Copy the Codex or Claude Code file into the skill location your agent uses.
+2. Keep the `references/` files nearby as optional playbooks.
+3. Pair the skill with `.researchloop/goal.md` and the `researchloop prompt` output.
 You can still pass `--goal` for a one-off override, but the normal flow is to save the goal once and let the prompt command read it back.
 If you want the prompt to narrow in on a family of experiments, use one of the built-in focus playbooks:
@@ -135,7 +163,7 @@ If you want the prompt to narrow in on a family of experiments, use one of the b
 - `architecture`
 - `attention`
-## 7. Record and compare runs
+## 8. Record and compare runs
 After a run finishes:
@@ -161,7 +189,7 @@ Then summarize the current state:
 researchloop report
 ```
-## 8. Open the dashboard
+## 9. Open the dashboard
 Serve a local dashboard for the current repo:
@@ -179,7 +207,7 @@ Then open the localhost URL it prints. The dashboard reads the repo's `.research
 It does not need accounts or auth because it stays on your machine.
-## 9. Test the setup before you trust it
+## 10. Test the setup before you trust it
 Run the local checks from this repo:
@@ -201,7 +229,7 @@ These checks verify that:
 - the website copy matches the product
 - the end-to-end flow works
-## 10. Use it in a real ML repo
+## 11. Use it in a real ML repo
 Once the basics work, move into a real project:
@@ -216,7 +244,7 @@ Then give the prompt to your AI agent and let it run the loop.
 ResearchLoop is not trying to magically solve the model for you. It gives the agent the operating system for research: goals, baseline, logs, comparison, and continuation.
-## 11. Publish to npm
+## 12. Publish to npm
 The package is published to the public npm registry at [npmjs.com](https://www.npmjs.com/).

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "researchloop",
-  "version": "0.1.0",
+  "version": "0.2.0",
   "description": "Install an autonomous AI research harness for Codex, Claude Code, Hermes, and other coding agents.",
   "type": "module",
   "bin": {
@@ -9,6 +9,7 @@
   "files": [
     "bin",
     "templates",
+    "skills",
     "README.md",
     "docs/getting-started.md",
     "CHANGELOG.md"
@@ -21,6 +22,8 @@
     "test:dashboard": "bash ./scripts/test-dashboard.sh",
     "test:setup": "bash ./scripts/test-setup.sh",
     "test:compare": "bash ./scripts/test-compare.sh",
+    "test:run": "bash ./scripts/test-run.sh",
+    "test:scan-papers": "bash ./scripts/test-scan-papers.sh",
     "test:prompts": "bash ./scripts/test-prompts.sh",
     "test:focus-prompts": "bash ./scripts/test-focus-prompts.sh",
     "test:site": "bash ./scripts/test-site.sh"

package/skills/README.md ADDED Viewed

@@ -0,0 +1,31 @@
+# ResearchLoop Skills
+This folder ships downloadable agent skills for autonomous AI research.
+The package keeps the core product in the CLI, dashboard, prompts, and run ledger.
+These skills are the agent-side memory layer that makes the research loop stick.
+## What is in here
+- `researchloop-autoresearch/` - the main research skill pack
+- `researchloop-autoresearch/references/` - focused playbooks for common experiment families
+## How users use it
+Users copy the right file into the skill folder their agent expects.
+Typical mapping:
+- Codex: copy `researchloop-autoresearch/codex/SKILL.md` into the local Codex skills directory
+- Claude Code: copy `researchloop-autoresearch/claude-code/CLAUDE.md` into the Claude Code instructions or skill location they use
+## What the skill pack does
+- keeps the goal visible
+- forces baseline-first behavior
+- asks for one small experiment at a time
+- records runs and comparisons
+- prunes weak ideas instead of spiraling
+The CLI prints prompts and creates `.researchloop/` state.
+The skills make the agent remember how to behave while doing the work.

package/skills/researchloop-autoresearch/claude-code/CLAUDE.md ADDED Viewed

@@ -0,0 +1,35 @@
+# ResearchLoop Autoresearch
+Use this repo as an autonomous AI research loop.
+Before changing code, read:
+- `.researchloop/goal.md`
+- `.researchloop/plan.md`
+- `.researchloop/AGENTS.md`
+- `.researchloop/scratchpad/THREAD.md`
+- `.researchloop/repo-profile.json`
+Then:
+1. confirm the baseline
+2. pick one small experiment
+3. change one variable at a time
+4. run the smallest valid check
+5. record the run
+6. compare against the baseline
+7. prune weak branches
+Use ResearchLoop to keep the loop durable:
+- `researchloop goal`
+- `researchloop inspect`
+- `researchloop idea`
+- `researchloop prompt`
+- `researchloop record`
+- `researchloop compare`
+- `researchloop report`
+Never claim improvement without a run.
+Never skip the baseline.
+Never let the goal drift.

package/skills/researchloop-autoresearch/codex/SKILL.md ADDED Viewed

@@ -0,0 +1,50 @@
+---
+name: researchloop-autoresearch
+description: Use when doing autonomous AI research in a machine learning repo with ResearchLoop, especially when choosing experiments, preserving baselines, or logging run results.
+---
+# ResearchLoop Autoresearch
+You are the research agent inside a repo that uses ResearchLoop.
+Before changing code, read:
+- `.researchloop/goal.md`
+- `.researchloop/plan.md`
+- `.researchloop/AGENTS.md`
+- `.researchloop/scratchpad/THREAD.md`
+- `.researchloop/repo-profile.json`
+Then work in this order:
+1. Confirm the baseline.
+2. Propose the smallest informative next experiment.
+3. Change one thing at a time.
+4. Run the smallest valid check.
+5. Record the result.
+6. Compare against the baseline.
+7. Prune weak branches quickly.
+8. Continue until the goal is met or the family is exhausted.
+Use the ResearchLoop commands as the control plane:
+- `researchloop goal`
+- `researchloop inspect`
+- `researchloop prompt`
+- `researchloop idea`
+- `researchloop record`
+- `researchloop compare`
+- `researchloop report`
+Do not claim improvement without a recorded run.
+Do not stack architecture changes before the baseline is stable.
+Do not let the loop drift away from the saved goal.
+## When to use playbooks
+If the task is clearly one of these families, load the matching reference:
+- hyperparameters -> `references/hyperparameters.md`
+- architecture -> `references/architecture.md`
+- attention -> `references/attention.md`

package/skills/researchloop-autoresearch/references/architecture.md ADDED Viewed

@@ -0,0 +1,21 @@
+# Architecture Playbook
+Use this when tuning model shape or layer structure.
+Try one change at a time:
+- width
+- depth
+- feedforward size
+- number of heads
+- embedding size
+- normalization placement
+Rules:
+- do not stack multiple architecture changes in the first pass
+- keep the optimizer and schedule fixed
+- compare against a reproduced baseline
+- re-run the best candidate with a second seed
+If the win does not reproduce, drop it.

package/skills/researchloop-autoresearch/references/attention.md ADDED Viewed

@@ -0,0 +1,21 @@
+# Attention Playbook
+Use this when the bottleneck appears to be the attention block itself.
+Try one change at a time:
+- number of heads
+- head dimension
+- context length
+- causal masking
+- rotary or positional setup
+- attention implementation
+Rules:
+- keep the rest of the model fixed
+- keep the metric fixed
+- capture throughput and loss together
+- record the exact config diff
+If the change only helps once, do not promote it.

package/skills/researchloop-autoresearch/references/hyperparameters.md ADDED Viewed

@@ -0,0 +1,22 @@
+# Hyperparameters Playbook
+Use this when the likely next win is a cheap tuning change.
+Try one family at a time:
+- learning rate
+- warmup
+- optimizer
+- weight decay
+- batch size
+- gradient clipping
+Rules:
+- keep architecture fixed
+- keep the dataset fixed
+- keep the metric fixed
+- sweep only a few values
+- record every run
+Kill the family quickly if the curve is flat.