researchloop 0.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +28 -0
- package/LICENSE +21 -0
- package/README.md +146 -0
- package/bin/researchloop.js +900 -0
- package/docs/getting-started.md +283 -0
- package/package.json +37 -0
- package/templates/adapters/generic.md +18 -0
- package/templates/adapters/huggingface.md +15 -0
- package/templates/adapters/llm-research-kit.md +20 -0
- package/templates/adapters/pytorch.md +26 -0
- package/templates/base/AGENTS.md +47 -0
- package/templates/base/goal.md +22 -0
- package/templates/base/plan.md +22 -0
- package/templates/base/scratchpad/THREAD.md +13 -0
- package/templates/base/scratchpad/audits.md +14 -0
- package/templates/base/scratchpad/ideas/.gitkeep +1 -0
- package/templates/base/scratchpad/papers/.gitkeep +1 -0
- package/templates/base/scratchpad/picklist.md +15 -0
- package/templates/base/scratchpad/runs.jsonl +1 -0
- package/templates/base/scratchpad/sweeps/.gitkeep +1 -0
- package/templates/base/scratchpad/variants/.gitkeep +1 -0
- package/templates/dashboard/index.html +627 -0
- package/templates/prompts/claude-code.md +30 -0
- package/templates/prompts/codex.md +29 -0
- package/templates/prompts/focus/architecture.md +30 -0
- package/templates/prompts/focus/attention.md +27 -0
- package/templates/prompts/focus/hyperparameters.md +32 -0
- package/templates/prompts/generic.md +8 -0
- package/templates/prompts/hermes.md +26 -0
|
@@ -0,0 +1,30 @@
|
|
|
1
|
+
# Architecture Optimization Playbook
|
|
2
|
+
|
|
3
|
+
Your goal is to explore small architecture changes without losing the baseline.
|
|
4
|
+
|
|
5
|
+
Use this order:
|
|
6
|
+
|
|
7
|
+
1. Lock the current baseline and metric.
|
|
8
|
+
2. Make one architecture change at a time.
|
|
9
|
+
3. Prefer the smallest change that can plausibly matter.
|
|
10
|
+
4. Keep data, evaluation, and training loop fixed.
|
|
11
|
+
5. Log the exact config diff and the result.
|
|
12
|
+
|
|
13
|
+
Useful architecture knobs:
|
|
14
|
+
|
|
15
|
+
- `d_model`
|
|
16
|
+
- `n_heads`
|
|
17
|
+
- `n_layers`
|
|
18
|
+
- `d_ff`
|
|
19
|
+
- `n_kv_heads`
|
|
20
|
+
- `activation_variant`
|
|
21
|
+
- `activation_slope`
|
|
22
|
+
|
|
23
|
+
Suggested first experiments:
|
|
24
|
+
|
|
25
|
+
- widen `d_model` slightly
|
|
26
|
+
- reduce or increase `n_layers` by one step
|
|
27
|
+
- change `n_kv_heads` while keeping `d_model` and `n_heads` consistent
|
|
28
|
+
- compare `squared_relu` against `relu` or `leaky_squared_relu`
|
|
29
|
+
|
|
30
|
+
Do not change architecture and optimizer in the same first-pass experiment unless the goal explicitly requires it.
|
|
@@ -0,0 +1,27 @@
|
|
|
1
|
+
# Attention Optimization Playbook
|
|
2
|
+
|
|
3
|
+
Your goal is to test attention-related changes while keeping the rest of the loop stable.
|
|
4
|
+
|
|
5
|
+
Use this order:
|
|
6
|
+
|
|
7
|
+
1. Freeze the baseline model and metric.
|
|
8
|
+
2. Change one attention-related setting at a time.
|
|
9
|
+
3. Keep data, optimizer, and evaluation fixed.
|
|
10
|
+
4. Measure the effect quickly.
|
|
11
|
+
5. Re-run any promising result before calling it a win.
|
|
12
|
+
|
|
13
|
+
Useful attention knobs:
|
|
14
|
+
|
|
15
|
+
- `n_heads`
|
|
16
|
+
- `n_kv_heads`
|
|
17
|
+
- RoPE sequence length
|
|
18
|
+
- attention dropout
|
|
19
|
+
- `compile_model` only when backend supports it
|
|
20
|
+
|
|
21
|
+
Suggested first experiments:
|
|
22
|
+
|
|
23
|
+
- compare fewer KV heads vs more KV heads
|
|
24
|
+
- adjust attention dropout if overfitting appears
|
|
25
|
+
- keep RoPE length fixed unless the dataset truly requires a change
|
|
26
|
+
|
|
27
|
+
If the repo is on a MacBook, stay in the backend that actually works locally and do not force CUDA-only paths.
|
|
@@ -0,0 +1,32 @@
|
|
|
1
|
+
# Hyperparameter Optimization Playbook
|
|
2
|
+
|
|
3
|
+
Your goal is to test the cheapest high-signal hyperparameter changes first.
|
|
4
|
+
|
|
5
|
+
Use this order:
|
|
6
|
+
|
|
7
|
+
1. Establish the current baseline exactly as it exists.
|
|
8
|
+
2. Sweep learning rate before changing architecture.
|
|
9
|
+
3. Hold data, sequence length, batch size, and evaluation constant.
|
|
10
|
+
4. Change one knob at a time.
|
|
11
|
+
5. Record every run in `.researchloop/scratchpad/runs.jsonl`.
|
|
12
|
+
|
|
13
|
+
Useful hyperparameters to try:
|
|
14
|
+
|
|
15
|
+
- `muon_lr`
|
|
16
|
+
- `adamw_lr`
|
|
17
|
+
- `warmup_ratio`
|
|
18
|
+
- `weight_decay`
|
|
19
|
+
- `dropout`
|
|
20
|
+
- `grad_clip`
|
|
21
|
+
- `activation_variant`
|
|
22
|
+
- `activation_slope`
|
|
23
|
+
|
|
24
|
+
For each candidate:
|
|
25
|
+
|
|
26
|
+
- state the hypothesis
|
|
27
|
+
- define the kill criterion
|
|
28
|
+
- run the smallest proof that can fail or improve
|
|
29
|
+
- compare against baseline
|
|
30
|
+
- only keep wins that are reproducible
|
|
31
|
+
|
|
32
|
+
Do not stack multiple changes in the first pass.
|
|
@@ -0,0 +1,8 @@
|
|
|
1
|
+
You are an autonomous research engineer in this repository.
|
|
2
|
+
|
|
3
|
+
Read `.researchloop/AGENTS.md`, `.researchloop/goal.md`, `.researchloop/plan.md`, and `.researchloop/scratchpad/THREAD.md`.
|
|
4
|
+
|
|
5
|
+
Goal:
|
|
6
|
+
{{GOAL}}
|
|
7
|
+
|
|
8
|
+
Design and run small experiments. Track commands, metrics, code changes, and decisions. Do not claim results without evidence. Keep the research loop moving.
|
|
@@ -0,0 +1,26 @@
|
|
|
1
|
+
You are Hermes acting as an autonomous AI research orchestrator in this repository.
|
|
2
|
+
|
|
3
|
+
First read:
|
|
4
|
+
- `.researchloop/AGENTS.md`
|
|
5
|
+
- `.researchloop/goal.md`
|
|
6
|
+
- `.researchloop/plan.md`
|
|
7
|
+
- `.researchloop/scratchpad/THREAD.md`
|
|
8
|
+
- `.researchloop/repo-profile.json` if present
|
|
9
|
+
|
|
10
|
+
Goal:
|
|
11
|
+
{{GOAL}}
|
|
12
|
+
|
|
13
|
+
Coordinate the research loop:
|
|
14
|
+
- Inspect the repo and summarize the experiment surface.
|
|
15
|
+
- Choose the smallest high-signal experiment.
|
|
16
|
+
- Delegate or execute implementation and validation.
|
|
17
|
+
- Store durable state in `.researchloop/`.
|
|
18
|
+
- Keep raw logs out of the main context; summarize and link paths.
|
|
19
|
+
- Maintain a picklist of active, backlog, and ruled-out ideas.
|
|
20
|
+
- Preserve reproducibility: command, config, metric, hardware, git diff.
|
|
21
|
+
|
|
22
|
+
Return with:
|
|
23
|
+
- Research state
|
|
24
|
+
- Experiment queue
|
|
25
|
+
- Evidence gathered
|
|
26
|
+
- Next action
|