researchloop 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,283 @@
1
+ # ResearchLoop Getting Started
2
+
3
+ ResearchLoop is an open source npm package that helps an AI agent run a disciplined research loop inside a machine learning repo.
4
+
5
+ The shortest way to think about it:
6
+
7
+ - you install the CLI
8
+ - you point it at a repo
9
+ - it creates a durable `.researchloop/` workspace
10
+ - your AI agent uses that workspace to plan, run, compare, and record experiments
11
+
12
+ ## 1. Install
13
+
14
+ From your own machine:
15
+
16
+ ```bash
17
+ npm install -g researchloop
18
+ ```
19
+
20
+ For local development from this repo:
21
+
22
+ ```bash
23
+ cd /Users/vukrosic/my-life/researchloop
24
+ npm link
25
+ researchloop --help
26
+ ```
27
+
28
+ If you want to hand this to an AI agent, the simplest instruction is:
29
+
30
+ ```text
31
+ Install ResearchLoop, initialize the repo, inspect the project, then use the generated prompt to start the research loop.
32
+ ```
33
+
34
+ ## 2. Initialize a repo
35
+
36
+ Run this inside a blank folder or inside an existing ML repo:
37
+
38
+ ```bash
39
+ researchloop init --agent codex
40
+ ```
41
+
42
+ That creates:
43
+
44
+ ```text
45
+ .researchloop/
46
+ AGENTS.md
47
+ goal.md
48
+ plan.md
49
+ repo-profile.json
50
+ scratchpad/
51
+ THREAD.md
52
+ runs.jsonl
53
+ ideas/
54
+ papers/
55
+ variants/
56
+ sweeps/
57
+ ```
58
+
59
+ If you want the agent-specific file for another tool, use:
60
+
61
+ ```bash
62
+ researchloop init --agent claude-code
63
+ researchloop init --agent hermes
64
+ researchloop init --agent cursor
65
+ ```
66
+
67
+ ## 3. Set the research goal
68
+
69
+ Tell ResearchLoop what the agent should optimize:
70
+
71
+ ```bash
72
+ researchloop goal "lower validation loss"
73
+ ```
74
+
75
+ You can also add structure:
76
+
77
+ ```bash
78
+ researchloop goal "lower validation loss" --metric val_loss --direction lower
79
+ ```
80
+
81
+ That saves the objective into `.researchloop/goal.md`, which the agent and the prompt command can read later.
82
+
83
+ ## 4. Generate experiment ideas
84
+
85
+ ```bash
86
+ researchloop idea --write
87
+ ```
88
+
89
+ This prints a ranked list of small experiments for the current repo shape. For `llm-research-kit`, that usually means baseline checks, learning-rate sweeps, and tiny architecture changes. For a generic repo, it starts with finding the baseline and metric plumbing.
90
+
91
+ ## 5. Inspect the repo
92
+
93
+ ```bash
94
+ researchloop inspect
95
+ ```
96
+
97
+ This writes a repo profile into `.researchloop/repo-profile.json` and helps the agent understand:
98
+
99
+ - possible training files
100
+ - possible eval files
101
+ - config files
102
+ - log folders
103
+ - likely adapters
104
+
105
+ ## 6. Generate the agent prompt
106
+
107
+ ```bash
108
+ researchloop prompt --agent codex
109
+ ```
110
+
111
+ Paste the output into your AI agent.
112
+
113
+ You can also attach a focused playbook:
114
+
115
+ ```bash
116
+ researchloop prompt --agent codex --focus hyperparameters
117
+ researchloop prompt --agent codex --focus architecture
118
+ researchloop prompt --agent codex --focus attention
119
+ ```
120
+
121
+ That prompt tells the agent to:
122
+
123
+ - read the `.researchloop/` files
124
+ - establish a baseline
125
+ - propose small experiments
126
+ - record runs
127
+ - compare results
128
+ - keep the loop moving
129
+
130
+ You can still pass `--goal` for a one-off override, but the normal flow is to save the goal once and let the prompt command read it back.
131
+
132
+ If you want the prompt to narrow in on a family of experiments, use one of the built-in focus playbooks:
133
+
134
+ - `hyperparameters`
135
+ - `architecture`
136
+ - `attention`
137
+
138
+ ## 7. Record and compare runs
139
+
140
+ After a run finishes:
141
+
142
+ ```bash
143
+ researchloop record --id first-run --status complete --metric val_loss=2.31 --note "first logged experiment"
144
+ ```
145
+
146
+ To compare runs:
147
+
148
+ ```bash
149
+ researchloop compare --metric val_loss --direction lower
150
+ ```
151
+
152
+ For metrics where higher is better:
153
+
154
+ ```bash
155
+ researchloop compare --metric accuracy --direction higher
156
+ ```
157
+
158
+ Then summarize the current state:
159
+
160
+ ```bash
161
+ researchloop report
162
+ ```
163
+
164
+ ## 8. Open the dashboard
165
+
166
+ Serve a local dashboard for the current repo:
167
+
168
+ ```bash
169
+ researchloop dashboard
170
+ ```
171
+
172
+ Then open the localhost URL it prints. The dashboard reads the repo's `.researchloop/` files and shows:
173
+
174
+ - the saved goal
175
+ - the run ledger
176
+ - the best run so far
177
+ - the latest run
178
+ - a small trend chart for the main metric
179
+
180
+ It does not need accounts or auth because it stays on your machine.
181
+
182
+ ## 9. Test the setup before you trust it
183
+
184
+ Run the local checks from this repo:
185
+
186
+ ```bash
187
+ npm run smoke
188
+ npm run test:compare
189
+ npm run test:setup
190
+ npm run test:prompts
191
+ npm run test:site
192
+ npm run smoke:e2e
193
+ ```
194
+
195
+ These checks verify that:
196
+
197
+ - the CLI starts
198
+ - the setup flow works in a blank folder
199
+ - `compare` ranks runs
200
+ - prompt templates are clean
201
+ - the website copy matches the product
202
+ - the end-to-end flow works
203
+
204
+ ## 10. Use it in a real ML repo
205
+
206
+ Once the basics work, move into a real project:
207
+
208
+ ```bash
209
+ cd /path/to/your/ml-repo
210
+ researchloop init --agent codex
211
+ researchloop inspect
212
+ researchloop prompt --agent codex --goal "improve validation loss"
213
+ ```
214
+
215
+ Then give the prompt to your AI agent and let it run the loop.
216
+
217
+ ResearchLoop is not trying to magically solve the model for you. It gives the agent the operating system for research: goals, baseline, logs, comparison, and continuation.
218
+
219
+ ## 11. Publish to npm
220
+
221
+ The package is published to the public npm registry at [npmjs.com](https://www.npmjs.com/).
222
+
223
+ Before publishing:
224
+
225
+ ```bash
226
+ npm login
227
+ npm whoami
228
+ npm pack --dry-run
229
+ ```
230
+
231
+ Make sure the package name is available and the contents look right.
232
+
233
+ For this repo, the package name is currently `researchloop` in `package.json`. If that name is available in your npm account, publish with:
234
+
235
+ ```bash
236
+ npm publish
237
+ ```
238
+
239
+ If you later switch to a scoped package like `@yourname/researchloop`, publish with:
240
+
241
+ ```bash
242
+ npm publish --access public
243
+ ```
244
+
245
+ Common release flow:
246
+
247
+ ```bash
248
+ npm version patch
249
+ git push --follow-tags
250
+ npm publish
251
+ ```
252
+
253
+ Typical release checklist:
254
+
255
+ 1. run the local tests
256
+ 2. check `npm pack --dry-run`
257
+ 3. bump the version
258
+ 4. publish to npm
259
+ 5. update the website and README if the usage changed
260
+
261
+ ## 12. Where users install it from
262
+
263
+ Users install it from the npm registry with:
264
+
265
+ ```bash
266
+ npm install -g researchloop
267
+ ```
268
+
269
+ If they prefer local use inside one repo:
270
+
271
+ ```bash
272
+ npm install researchloop
273
+ ```
274
+
275
+ Then they run the CLI from that environment.
276
+
277
+ ## 13. The one-line handoff to an AI agent
278
+
279
+ If you want the shortest possible instruction for Codex, Claude Code, Hermes, or a similar agent, give it this:
280
+
281
+ ```text
282
+ Use ResearchLoop: run init, inspect the repo, read .researchloop/AGENTS.md and goal.md, establish the baseline, then run small experiments, record results, compare runs, and keep the research loop moving.
283
+ ```
package/package.json ADDED
@@ -0,0 +1,37 @@
1
+ {
2
+ "name": "researchloop",
3
+ "version": "0.1.0",
4
+ "description": "Install an autonomous AI research harness for Codex, Claude Code, Hermes, and other coding agents.",
5
+ "type": "module",
6
+ "bin": {
7
+ "researchloop": "./bin/researchloop.js"
8
+ },
9
+ "files": [
10
+ "bin",
11
+ "templates",
12
+ "README.md",
13
+ "docs/getting-started.md",
14
+ "CHANGELOG.md"
15
+ ],
16
+ "scripts": {
17
+ "smoke": "node ./bin/researchloop.js --help",
18
+ "smoke:e2e": "bash ./scripts/smoke-e2e.sh",
19
+ "test:goal": "bash ./scripts/test-goal.sh",
20
+ "test:idea": "bash ./scripts/test-idea.sh",
21
+ "test:dashboard": "bash ./scripts/test-dashboard.sh",
22
+ "test:setup": "bash ./scripts/test-setup.sh",
23
+ "test:compare": "bash ./scripts/test-compare.sh",
24
+ "test:prompts": "bash ./scripts/test-prompts.sh",
25
+ "test:focus-prompts": "bash ./scripts/test-focus-prompts.sh",
26
+ "test:site": "bash ./scripts/test-site.sh"
27
+ },
28
+ "keywords": [
29
+ "ai-research",
30
+ "codex",
31
+ "claude-code",
32
+ "agent",
33
+ "experiments",
34
+ "ablation"
35
+ ],
36
+ "license": "MIT"
37
+ }
@@ -0,0 +1,18 @@
1
+ # Generic Adapter
2
+
3
+ Use this when the repository type is unknown.
4
+
5
+ Inspection targets:
6
+ - README and docs
7
+ - package manager files
8
+ - train, eval, benchmark, and test scripts
9
+ - config files
10
+ - logs and checkpoint folders
11
+
12
+ Required fields:
13
+ - baseline command
14
+ - evaluation command
15
+ - primary metric
16
+ - target direction
17
+ - allowed changes
18
+ - forbidden changes
@@ -0,0 +1,15 @@
1
+ # Hugging Face Trainer Adapter
2
+
3
+ Useful files:
4
+ - `training_args`
5
+ - `Trainer`
6
+ - `accelerate`
7
+ - `config.json`
8
+ - dataset loading scripts
9
+ - metric callbacks
10
+
11
+ Default first experiments:
12
+ 1. Confirm a tiny run works.
13
+ 2. Identify evaluation metric and save path.
14
+ 3. Sweep learning rate and schedule before model architecture.
15
+ 4. Preserve dataset and split unless the goal allows data changes.
@@ -0,0 +1,20 @@
1
+ # LLM Research Kit Adapter
2
+
3
+ Useful files:
4
+ - `train_llm.py`
5
+ - `configs/llm_config.py`
6
+ - `configs/dataset_config.py`
7
+ - `training/trainer.py`
8
+ - `training/evaluation.py`
9
+ - `optimizers/`
10
+ - `plots/metrics_*.json`
11
+
12
+ MacBook mode:
13
+ - Use MPS or CPU for smoke tests.
14
+ - Keep `torch.compile` disabled outside CUDA.
15
+ - Prefer tiny configs for local proof-of-life.
16
+
17
+ CUDA mode:
18
+ - Enable compile and mixed precision when supported.
19
+ - Use repeated seeds before claiming wins.
20
+ - Run pruning before promoting stacked changes.
@@ -0,0 +1,26 @@
1
+ # PyTorch Adapter
2
+
3
+ Useful files:
4
+ - `train.py`
5
+ - `eval.py`
6
+ - `configs/`
7
+ - `requirements.txt`
8
+ - `pyproject.toml`
9
+ - `checkpoints/`
10
+ - `logs/`
11
+
12
+ Default first experiments:
13
+ 1. Run a one-batch smoke test.
14
+ 2. Establish baseline metric parsing.
15
+ 3. Try one optimizer or schedule ablation.
16
+ 4. Reproduce any apparent win.
17
+
18
+ Common knobs:
19
+ - optimizer
20
+ - learning rate
21
+ - weight decay
22
+ - schedule
23
+ - warmup
24
+ - precision
25
+ - initialization
26
+ - gradient clipping
@@ -0,0 +1,47 @@
1
+ # Research Loop Agent Rules
2
+
3
+ You are an autonomous research engineer working in this repository.
4
+
5
+ ## Mission
6
+
7
+ Improve the target metric through small, documented experiments.
8
+ Read `.researchloop/goal.md`, `.researchloop/plan.md`, and `.researchloop/scratchpad/THREAD.md` before making changes.
9
+
10
+ ## Hard Rules
11
+
12
+ 1. Do not claim a result unless you ran the command or can point to an existing log.
13
+ 2. Establish or identify a baseline before optimizing.
14
+ 3. Keep each experiment small enough to isolate the causal change.
15
+ 4. Log every meaningful action to `.researchloop/scratchpad/THREAD.md`.
16
+ 5. Add every run to `.researchloop/scratchpad/runs.jsonl`.
17
+ 6. Write idea notes before coding non-trivial experiments.
18
+ 7. Define a kill criterion before launching a sweep.
19
+ 8. Reproduce promising wins before treating them as real.
20
+ 9. Run pruning or leave-one-out checks before promoting a stacked recipe.
21
+ 10. Preserve user work and avoid unrelated refactors.
22
+
23
+ ## Autonomy
24
+
25
+ If the next step is clear, take it. If two paths are reasonable, choose one, log why, and continue.
26
+ Do not stop only because the current idea failed. A failed idea should produce a note, a lesson, and the next candidate.
27
+
28
+ ## Scratchpad
29
+
30
+ - `THREAD.md`: append-only chronological mission log.
31
+ - `runs.jsonl`: structured run ledger.
32
+ - `ideas/`: one file per idea, with mechanism, prior art, ablation plan, and kill criterion.
33
+ - `papers/`: paper notes with exact recipe details and a "how to port this" section.
34
+ - `variants/`: generated code/config variants.
35
+ - `sweeps/`: grouped sweep notes and outputs.
36
+ - `picklist.md`: prioritized candidates and ruled-out families.
37
+ - `audits.md`: benchmark-rule checks, portability checks, and claim audits.
38
+
39
+ ## Experiment Loop
40
+
41
+ 1. Inspect the repo and find the baseline command.
42
+ 2. Define the allowed and forbidden change surfaces.
43
+ 3. Propose 3-7 ranked experiments.
44
+ 4. Run the cheapest useful experiment first.
45
+ 5. Parse and record metrics.
46
+ 6. Decide: reproduce, refine, prune, or pivot.
47
+ 7. Keep `plan.md` current and `THREAD.md` chronological.
@@ -0,0 +1,22 @@
1
+ # Research Goal
2
+
3
+ ## Goal
4
+
5
+ Describe the metric to improve and the constraints.
6
+
7
+ Example:
8
+
9
+ - Target metric: validation loss
10
+ - Direction: lower is better
11
+ - Baseline command: unknown
12
+ - Evaluation command: unknown
13
+ - Allowed changes: optimizer, schedules, initialization, hyperparameters
14
+ - Forbidden changes: data, architecture, batch size, benchmark definition
15
+
16
+ ## Current Best
17
+
18
+ Unknown.
19
+
20
+ ## Notes
21
+
22
+ Use `researchloop inspect` to generate a repo profile, then ask an agent to fill in the missing benchmark details.
@@ -0,0 +1,22 @@
1
+ # Research Plan
2
+
3
+ This is mutable state. Keep it short and current.
4
+ The chronological log belongs in `scratchpad/THREAD.md`.
5
+
6
+ ## Current State
7
+
8
+ - Baseline: unknown
9
+ - Best valid result: unknown
10
+ - Active family: none
11
+ - Running jobs: none
12
+ - Next action: inspect repo and establish baseline
13
+
14
+ ## Picklist
15
+
16
+ 1. Establish baseline.
17
+ 2. Identify metric extraction.
18
+ 3. Run one smoke experiment.
19
+
20
+ ## Ruled Out
21
+
22
+ None yet.
@@ -0,0 +1,13 @@
1
+ # Research Thread
2
+
3
+ Append one entry per meaningful event.
4
+
5
+ Format:
6
+
7
+ ```text
8
+ YYYY-MM-DD HH:MM - event
9
+ - What changed:
10
+ - Evidence:
11
+ - Decision:
12
+ - Next:
13
+ ```
@@ -0,0 +1,14 @@
1
+ # Audits
2
+
3
+ Use this file for benchmark-rule checks, portability checks, and claim audits.
4
+
5
+ ## Benchmark Rules
6
+
7
+ - Baseline command: unknown
8
+ - Evaluation command: unknown
9
+ - Allowed changes: unknown
10
+ - Forbidden changes: unknown
11
+
12
+ ## Claims
13
+
14
+ No claims yet.
@@ -0,0 +1,15 @@
1
+ # Picklist
2
+
3
+ Promote ideas here only after they have a note, a minimal test plan, and a kill criterion.
4
+
5
+ ## Active
6
+
7
+ - Establish baseline.
8
+
9
+ ## Backlog
10
+
11
+ - Add candidates after repo inspection.
12
+
13
+ ## Ruled Out
14
+
15
+ - None yet.