researchloop 0.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +28 -0
- package/LICENSE +21 -0
- package/README.md +146 -0
- package/bin/researchloop.js +900 -0
- package/docs/getting-started.md +283 -0
- package/package.json +37 -0
- package/templates/adapters/generic.md +18 -0
- package/templates/adapters/huggingface.md +15 -0
- package/templates/adapters/llm-research-kit.md +20 -0
- package/templates/adapters/pytorch.md +26 -0
- package/templates/base/AGENTS.md +47 -0
- package/templates/base/goal.md +22 -0
- package/templates/base/plan.md +22 -0
- package/templates/base/scratchpad/THREAD.md +13 -0
- package/templates/base/scratchpad/audits.md +14 -0
- package/templates/base/scratchpad/ideas/.gitkeep +1 -0
- package/templates/base/scratchpad/papers/.gitkeep +1 -0
- package/templates/base/scratchpad/picklist.md +15 -0
- package/templates/base/scratchpad/runs.jsonl +1 -0
- package/templates/base/scratchpad/sweeps/.gitkeep +1 -0
- package/templates/base/scratchpad/variants/.gitkeep +1 -0
- package/templates/dashboard/index.html +627 -0
- package/templates/prompts/claude-code.md +30 -0
- package/templates/prompts/codex.md +29 -0
- package/templates/prompts/focus/architecture.md +30 -0
- package/templates/prompts/focus/attention.md +27 -0
- package/templates/prompts/focus/hyperparameters.md +32 -0
- package/templates/prompts/generic.md +8 -0
- package/templates/prompts/hermes.md +26 -0
|
@@ -0,0 +1,283 @@
|
|
|
1
|
+
# ResearchLoop Getting Started
|
|
2
|
+
|
|
3
|
+
ResearchLoop is an open source npm package that helps an AI agent run a disciplined research loop inside a machine learning repo.
|
|
4
|
+
|
|
5
|
+
The shortest way to think about it:
|
|
6
|
+
|
|
7
|
+
- you install the CLI
|
|
8
|
+
- you point it at a repo
|
|
9
|
+
- it creates a durable `.researchloop/` workspace
|
|
10
|
+
- your AI agent uses that workspace to plan, run, compare, and record experiments
|
|
11
|
+
|
|
12
|
+
## 1. Install
|
|
13
|
+
|
|
14
|
+
From your own machine:
|
|
15
|
+
|
|
16
|
+
```bash
|
|
17
|
+
npm install -g researchloop
|
|
18
|
+
```
|
|
19
|
+
|
|
20
|
+
For local development from this repo:
|
|
21
|
+
|
|
22
|
+
```bash
|
|
23
|
+
cd /Users/vukrosic/my-life/researchloop
|
|
24
|
+
npm link
|
|
25
|
+
researchloop --help
|
|
26
|
+
```
|
|
27
|
+
|
|
28
|
+
If you want to hand this to an AI agent, the simplest instruction is:
|
|
29
|
+
|
|
30
|
+
```text
|
|
31
|
+
Install ResearchLoop, initialize the repo, inspect the project, then use the generated prompt to start the research loop.
|
|
32
|
+
```
|
|
33
|
+
|
|
34
|
+
## 2. Initialize a repo
|
|
35
|
+
|
|
36
|
+
Run this inside a blank folder or inside an existing ML repo:
|
|
37
|
+
|
|
38
|
+
```bash
|
|
39
|
+
researchloop init --agent codex
|
|
40
|
+
```
|
|
41
|
+
|
|
42
|
+
That creates:
|
|
43
|
+
|
|
44
|
+
```text
|
|
45
|
+
.researchloop/
|
|
46
|
+
AGENTS.md
|
|
47
|
+
goal.md
|
|
48
|
+
plan.md
|
|
49
|
+
repo-profile.json
|
|
50
|
+
scratchpad/
|
|
51
|
+
THREAD.md
|
|
52
|
+
runs.jsonl
|
|
53
|
+
ideas/
|
|
54
|
+
papers/
|
|
55
|
+
variants/
|
|
56
|
+
sweeps/
|
|
57
|
+
```
|
|
58
|
+
|
|
59
|
+
If you want the agent-specific file for another tool, use:
|
|
60
|
+
|
|
61
|
+
```bash
|
|
62
|
+
researchloop init --agent claude-code
|
|
63
|
+
researchloop init --agent hermes
|
|
64
|
+
researchloop init --agent cursor
|
|
65
|
+
```
|
|
66
|
+
|
|
67
|
+
## 3. Set the research goal
|
|
68
|
+
|
|
69
|
+
Tell ResearchLoop what the agent should optimize:
|
|
70
|
+
|
|
71
|
+
```bash
|
|
72
|
+
researchloop goal "lower validation loss"
|
|
73
|
+
```
|
|
74
|
+
|
|
75
|
+
You can also add structure:
|
|
76
|
+
|
|
77
|
+
```bash
|
|
78
|
+
researchloop goal "lower validation loss" --metric val_loss --direction lower
|
|
79
|
+
```
|
|
80
|
+
|
|
81
|
+
That saves the objective into `.researchloop/goal.md`, which the agent and the prompt command can read later.
|
|
82
|
+
|
|
83
|
+
## 4. Generate experiment ideas
|
|
84
|
+
|
|
85
|
+
```bash
|
|
86
|
+
researchloop idea --write
|
|
87
|
+
```
|
|
88
|
+
|
|
89
|
+
This prints a ranked list of small experiments for the current repo shape. For `llm-research-kit`, that usually means baseline checks, learning-rate sweeps, and tiny architecture changes. For a generic repo, it starts with finding the baseline and metric plumbing.
|
|
90
|
+
|
|
91
|
+
## 5. Inspect the repo
|
|
92
|
+
|
|
93
|
+
```bash
|
|
94
|
+
researchloop inspect
|
|
95
|
+
```
|
|
96
|
+
|
|
97
|
+
This writes a repo profile into `.researchloop/repo-profile.json` and helps the agent understand:
|
|
98
|
+
|
|
99
|
+
- possible training files
|
|
100
|
+
- possible eval files
|
|
101
|
+
- config files
|
|
102
|
+
- log folders
|
|
103
|
+
- likely adapters
|
|
104
|
+
|
|
105
|
+
## 6. Generate the agent prompt
|
|
106
|
+
|
|
107
|
+
```bash
|
|
108
|
+
researchloop prompt --agent codex
|
|
109
|
+
```
|
|
110
|
+
|
|
111
|
+
Paste the output into your AI agent.
|
|
112
|
+
|
|
113
|
+
You can also attach a focused playbook:
|
|
114
|
+
|
|
115
|
+
```bash
|
|
116
|
+
researchloop prompt --agent codex --focus hyperparameters
|
|
117
|
+
researchloop prompt --agent codex --focus architecture
|
|
118
|
+
researchloop prompt --agent codex --focus attention
|
|
119
|
+
```
|
|
120
|
+
|
|
121
|
+
That prompt tells the agent to:
|
|
122
|
+
|
|
123
|
+
- read the `.researchloop/` files
|
|
124
|
+
- establish a baseline
|
|
125
|
+
- propose small experiments
|
|
126
|
+
- record runs
|
|
127
|
+
- compare results
|
|
128
|
+
- keep the loop moving
|
|
129
|
+
|
|
130
|
+
You can still pass `--goal` for a one-off override, but the normal flow is to save the goal once and let the prompt command read it back.
|
|
131
|
+
|
|
132
|
+
If you want the prompt to narrow in on a family of experiments, use one of the built-in focus playbooks:
|
|
133
|
+
|
|
134
|
+
- `hyperparameters`
|
|
135
|
+
- `architecture`
|
|
136
|
+
- `attention`
|
|
137
|
+
|
|
138
|
+
## 7. Record and compare runs
|
|
139
|
+
|
|
140
|
+
After a run finishes:
|
|
141
|
+
|
|
142
|
+
```bash
|
|
143
|
+
researchloop record --id first-run --status complete --metric val_loss=2.31 --note "first logged experiment"
|
|
144
|
+
```
|
|
145
|
+
|
|
146
|
+
To compare runs:
|
|
147
|
+
|
|
148
|
+
```bash
|
|
149
|
+
researchloop compare --metric val_loss --direction lower
|
|
150
|
+
```
|
|
151
|
+
|
|
152
|
+
For metrics where higher is better:
|
|
153
|
+
|
|
154
|
+
```bash
|
|
155
|
+
researchloop compare --metric accuracy --direction higher
|
|
156
|
+
```
|
|
157
|
+
|
|
158
|
+
Then summarize the current state:
|
|
159
|
+
|
|
160
|
+
```bash
|
|
161
|
+
researchloop report
|
|
162
|
+
```
|
|
163
|
+
|
|
164
|
+
## 8. Open the dashboard
|
|
165
|
+
|
|
166
|
+
Serve a local dashboard for the current repo:
|
|
167
|
+
|
|
168
|
+
```bash
|
|
169
|
+
researchloop dashboard
|
|
170
|
+
```
|
|
171
|
+
|
|
172
|
+
Then open the localhost URL it prints. The dashboard reads the repo's `.researchloop/` files and shows:
|
|
173
|
+
|
|
174
|
+
- the saved goal
|
|
175
|
+
- the run ledger
|
|
176
|
+
- the best run so far
|
|
177
|
+
- the latest run
|
|
178
|
+
- a small trend chart for the main metric
|
|
179
|
+
|
|
180
|
+
It does not need accounts or auth because it stays on your machine.
|
|
181
|
+
|
|
182
|
+
## 9. Test the setup before you trust it
|
|
183
|
+
|
|
184
|
+
Run the local checks from this repo:
|
|
185
|
+
|
|
186
|
+
```bash
|
|
187
|
+
npm run smoke
|
|
188
|
+
npm run test:compare
|
|
189
|
+
npm run test:setup
|
|
190
|
+
npm run test:prompts
|
|
191
|
+
npm run test:site
|
|
192
|
+
npm run smoke:e2e
|
|
193
|
+
```
|
|
194
|
+
|
|
195
|
+
These checks verify that:
|
|
196
|
+
|
|
197
|
+
- the CLI starts
|
|
198
|
+
- the setup flow works in a blank folder
|
|
199
|
+
- `compare` ranks runs
|
|
200
|
+
- prompt templates are clean
|
|
201
|
+
- the website copy matches the product
|
|
202
|
+
- the end-to-end flow works
|
|
203
|
+
|
|
204
|
+
## 10. Use it in a real ML repo
|
|
205
|
+
|
|
206
|
+
Once the basics work, move into a real project:
|
|
207
|
+
|
|
208
|
+
```bash
|
|
209
|
+
cd /path/to/your/ml-repo
|
|
210
|
+
researchloop init --agent codex
|
|
211
|
+
researchloop inspect
|
|
212
|
+
researchloop prompt --agent codex --goal "improve validation loss"
|
|
213
|
+
```
|
|
214
|
+
|
|
215
|
+
Then give the prompt to your AI agent and let it run the loop.
|
|
216
|
+
|
|
217
|
+
ResearchLoop is not trying to magically solve the model for you. It gives the agent the operating system for research: goals, baseline, logs, comparison, and continuation.
|
|
218
|
+
|
|
219
|
+
## 11. Publish to npm
|
|
220
|
+
|
|
221
|
+
The package is published to the public npm registry at [npmjs.com](https://www.npmjs.com/).
|
|
222
|
+
|
|
223
|
+
Before publishing:
|
|
224
|
+
|
|
225
|
+
```bash
|
|
226
|
+
npm login
|
|
227
|
+
npm whoami
|
|
228
|
+
npm pack --dry-run
|
|
229
|
+
```
|
|
230
|
+
|
|
231
|
+
Make sure the package name is available and the contents look right.
|
|
232
|
+
|
|
233
|
+
For this repo, the package name is currently `researchloop` in `package.json`. If that name is available in your npm account, publish with:
|
|
234
|
+
|
|
235
|
+
```bash
|
|
236
|
+
npm publish
|
|
237
|
+
```
|
|
238
|
+
|
|
239
|
+
If you later switch to a scoped package like `@yourname/researchloop`, publish with:
|
|
240
|
+
|
|
241
|
+
```bash
|
|
242
|
+
npm publish --access public
|
|
243
|
+
```
|
|
244
|
+
|
|
245
|
+
Common release flow:
|
|
246
|
+
|
|
247
|
+
```bash
|
|
248
|
+
npm version patch
|
|
249
|
+
git push --follow-tags
|
|
250
|
+
npm publish
|
|
251
|
+
```
|
|
252
|
+
|
|
253
|
+
Typical release checklist:
|
|
254
|
+
|
|
255
|
+
1. run the local tests
|
|
256
|
+
2. check `npm pack --dry-run`
|
|
257
|
+
3. bump the version
|
|
258
|
+
4. publish to npm
|
|
259
|
+
5. update the website and README if the usage changed
|
|
260
|
+
|
|
261
|
+
## 12. Where users install it from
|
|
262
|
+
|
|
263
|
+
Users install it from the npm registry with:
|
|
264
|
+
|
|
265
|
+
```bash
|
|
266
|
+
npm install -g researchloop
|
|
267
|
+
```
|
|
268
|
+
|
|
269
|
+
If they prefer local use inside one repo:
|
|
270
|
+
|
|
271
|
+
```bash
|
|
272
|
+
npm install researchloop
|
|
273
|
+
```
|
|
274
|
+
|
|
275
|
+
Then they run the CLI from that environment.
|
|
276
|
+
|
|
277
|
+
## 13. The one-line handoff to an AI agent
|
|
278
|
+
|
|
279
|
+
If you want the shortest possible instruction for Codex, Claude Code, Hermes, or a similar agent, give it this:
|
|
280
|
+
|
|
281
|
+
```text
|
|
282
|
+
Use ResearchLoop: run init, inspect the repo, read .researchloop/AGENTS.md and goal.md, establish the baseline, then run small experiments, record results, compare runs, and keep the research loop moving.
|
|
283
|
+
```
|
package/package.json
ADDED
|
@@ -0,0 +1,37 @@
|
|
|
1
|
+
{
|
|
2
|
+
"name": "researchloop",
|
|
3
|
+
"version": "0.1.0",
|
|
4
|
+
"description": "Install an autonomous AI research harness for Codex, Claude Code, Hermes, and other coding agents.",
|
|
5
|
+
"type": "module",
|
|
6
|
+
"bin": {
|
|
7
|
+
"researchloop": "./bin/researchloop.js"
|
|
8
|
+
},
|
|
9
|
+
"files": [
|
|
10
|
+
"bin",
|
|
11
|
+
"templates",
|
|
12
|
+
"README.md",
|
|
13
|
+
"docs/getting-started.md",
|
|
14
|
+
"CHANGELOG.md"
|
|
15
|
+
],
|
|
16
|
+
"scripts": {
|
|
17
|
+
"smoke": "node ./bin/researchloop.js --help",
|
|
18
|
+
"smoke:e2e": "bash ./scripts/smoke-e2e.sh",
|
|
19
|
+
"test:goal": "bash ./scripts/test-goal.sh",
|
|
20
|
+
"test:idea": "bash ./scripts/test-idea.sh",
|
|
21
|
+
"test:dashboard": "bash ./scripts/test-dashboard.sh",
|
|
22
|
+
"test:setup": "bash ./scripts/test-setup.sh",
|
|
23
|
+
"test:compare": "bash ./scripts/test-compare.sh",
|
|
24
|
+
"test:prompts": "bash ./scripts/test-prompts.sh",
|
|
25
|
+
"test:focus-prompts": "bash ./scripts/test-focus-prompts.sh",
|
|
26
|
+
"test:site": "bash ./scripts/test-site.sh"
|
|
27
|
+
},
|
|
28
|
+
"keywords": [
|
|
29
|
+
"ai-research",
|
|
30
|
+
"codex",
|
|
31
|
+
"claude-code",
|
|
32
|
+
"agent",
|
|
33
|
+
"experiments",
|
|
34
|
+
"ablation"
|
|
35
|
+
],
|
|
36
|
+
"license": "MIT"
|
|
37
|
+
}
|
|
@@ -0,0 +1,18 @@
|
|
|
1
|
+
# Generic Adapter
|
|
2
|
+
|
|
3
|
+
Use this when the repository type is unknown.
|
|
4
|
+
|
|
5
|
+
Inspection targets:
|
|
6
|
+
- README and docs
|
|
7
|
+
- package manager files
|
|
8
|
+
- train, eval, benchmark, and test scripts
|
|
9
|
+
- config files
|
|
10
|
+
- logs and checkpoint folders
|
|
11
|
+
|
|
12
|
+
Required fields:
|
|
13
|
+
- baseline command
|
|
14
|
+
- evaluation command
|
|
15
|
+
- primary metric
|
|
16
|
+
- target direction
|
|
17
|
+
- allowed changes
|
|
18
|
+
- forbidden changes
|
|
@@ -0,0 +1,15 @@
|
|
|
1
|
+
# Hugging Face Trainer Adapter
|
|
2
|
+
|
|
3
|
+
Useful files:
|
|
4
|
+
- `training_args`
|
|
5
|
+
- `Trainer`
|
|
6
|
+
- `accelerate`
|
|
7
|
+
- `config.json`
|
|
8
|
+
- dataset loading scripts
|
|
9
|
+
- metric callbacks
|
|
10
|
+
|
|
11
|
+
Default first experiments:
|
|
12
|
+
1. Confirm a tiny run works.
|
|
13
|
+
2. Identify evaluation metric and save path.
|
|
14
|
+
3. Sweep learning rate and schedule before model architecture.
|
|
15
|
+
4. Preserve dataset and split unless the goal allows data changes.
|
|
@@ -0,0 +1,20 @@
|
|
|
1
|
+
# LLM Research Kit Adapter
|
|
2
|
+
|
|
3
|
+
Useful files:
|
|
4
|
+
- `train_llm.py`
|
|
5
|
+
- `configs/llm_config.py`
|
|
6
|
+
- `configs/dataset_config.py`
|
|
7
|
+
- `training/trainer.py`
|
|
8
|
+
- `training/evaluation.py`
|
|
9
|
+
- `optimizers/`
|
|
10
|
+
- `plots/metrics_*.json`
|
|
11
|
+
|
|
12
|
+
MacBook mode:
|
|
13
|
+
- Use MPS or CPU for smoke tests.
|
|
14
|
+
- Keep `torch.compile` disabled outside CUDA.
|
|
15
|
+
- Prefer tiny configs for local proof-of-life.
|
|
16
|
+
|
|
17
|
+
CUDA mode:
|
|
18
|
+
- Enable compile and mixed precision when supported.
|
|
19
|
+
- Use repeated seeds before claiming wins.
|
|
20
|
+
- Run pruning before promoting stacked changes.
|
|
@@ -0,0 +1,26 @@
|
|
|
1
|
+
# PyTorch Adapter
|
|
2
|
+
|
|
3
|
+
Useful files:
|
|
4
|
+
- `train.py`
|
|
5
|
+
- `eval.py`
|
|
6
|
+
- `configs/`
|
|
7
|
+
- `requirements.txt`
|
|
8
|
+
- `pyproject.toml`
|
|
9
|
+
- `checkpoints/`
|
|
10
|
+
- `logs/`
|
|
11
|
+
|
|
12
|
+
Default first experiments:
|
|
13
|
+
1. Run a one-batch smoke test.
|
|
14
|
+
2. Establish baseline metric parsing.
|
|
15
|
+
3. Try one optimizer or schedule ablation.
|
|
16
|
+
4. Reproduce any apparent win.
|
|
17
|
+
|
|
18
|
+
Common knobs:
|
|
19
|
+
- optimizer
|
|
20
|
+
- learning rate
|
|
21
|
+
- weight decay
|
|
22
|
+
- schedule
|
|
23
|
+
- warmup
|
|
24
|
+
- precision
|
|
25
|
+
- initialization
|
|
26
|
+
- gradient clipping
|
|
@@ -0,0 +1,47 @@
|
|
|
1
|
+
# Research Loop Agent Rules
|
|
2
|
+
|
|
3
|
+
You are an autonomous research engineer working in this repository.
|
|
4
|
+
|
|
5
|
+
## Mission
|
|
6
|
+
|
|
7
|
+
Improve the target metric through small, documented experiments.
|
|
8
|
+
Read `.researchloop/goal.md`, `.researchloop/plan.md`, and `.researchloop/scratchpad/THREAD.md` before making changes.
|
|
9
|
+
|
|
10
|
+
## Hard Rules
|
|
11
|
+
|
|
12
|
+
1. Do not claim a result unless you ran the command or can point to an existing log.
|
|
13
|
+
2. Establish or identify a baseline before optimizing.
|
|
14
|
+
3. Keep each experiment small enough to isolate the causal change.
|
|
15
|
+
4. Log every meaningful action to `.researchloop/scratchpad/THREAD.md`.
|
|
16
|
+
5. Add every run to `.researchloop/scratchpad/runs.jsonl`.
|
|
17
|
+
6. Write idea notes before coding non-trivial experiments.
|
|
18
|
+
7. Define a kill criterion before launching a sweep.
|
|
19
|
+
8. Reproduce promising wins before treating them as real.
|
|
20
|
+
9. Run pruning or leave-one-out checks before promoting a stacked recipe.
|
|
21
|
+
10. Preserve user work and avoid unrelated refactors.
|
|
22
|
+
|
|
23
|
+
## Autonomy
|
|
24
|
+
|
|
25
|
+
If the next step is clear, take it. If two paths are reasonable, choose one, log why, and continue.
|
|
26
|
+
Do not stop only because the current idea failed. A failed idea should produce a note, a lesson, and the next candidate.
|
|
27
|
+
|
|
28
|
+
## Scratchpad
|
|
29
|
+
|
|
30
|
+
- `THREAD.md`: append-only chronological mission log.
|
|
31
|
+
- `runs.jsonl`: structured run ledger.
|
|
32
|
+
- `ideas/`: one file per idea, with mechanism, prior art, ablation plan, and kill criterion.
|
|
33
|
+
- `papers/`: paper notes with exact recipe details and a "how to port this" section.
|
|
34
|
+
- `variants/`: generated code/config variants.
|
|
35
|
+
- `sweeps/`: grouped sweep notes and outputs.
|
|
36
|
+
- `picklist.md`: prioritized candidates and ruled-out families.
|
|
37
|
+
- `audits.md`: benchmark-rule checks, portability checks, and claim audits.
|
|
38
|
+
|
|
39
|
+
## Experiment Loop
|
|
40
|
+
|
|
41
|
+
1. Inspect the repo and find the baseline command.
|
|
42
|
+
2. Define the allowed and forbidden change surfaces.
|
|
43
|
+
3. Propose 3-7 ranked experiments.
|
|
44
|
+
4. Run the cheapest useful experiment first.
|
|
45
|
+
5. Parse and record metrics.
|
|
46
|
+
6. Decide: reproduce, refine, prune, or pivot.
|
|
47
|
+
7. Keep `plan.md` current and `THREAD.md` chronological.
|
|
@@ -0,0 +1,22 @@
|
|
|
1
|
+
# Research Goal
|
|
2
|
+
|
|
3
|
+
## Goal
|
|
4
|
+
|
|
5
|
+
Describe the metric to improve and the constraints.
|
|
6
|
+
|
|
7
|
+
Example:
|
|
8
|
+
|
|
9
|
+
- Target metric: validation loss
|
|
10
|
+
- Direction: lower is better
|
|
11
|
+
- Baseline command: unknown
|
|
12
|
+
- Evaluation command: unknown
|
|
13
|
+
- Allowed changes: optimizer, schedules, initialization, hyperparameters
|
|
14
|
+
- Forbidden changes: data, architecture, batch size, benchmark definition
|
|
15
|
+
|
|
16
|
+
## Current Best
|
|
17
|
+
|
|
18
|
+
Unknown.
|
|
19
|
+
|
|
20
|
+
## Notes
|
|
21
|
+
|
|
22
|
+
Use `researchloop inspect` to generate a repo profile, then ask an agent to fill in the missing benchmark details.
|
|
@@ -0,0 +1,22 @@
|
|
|
1
|
+
# Research Plan
|
|
2
|
+
|
|
3
|
+
This is mutable state. Keep it short and current.
|
|
4
|
+
The chronological log belongs in `scratchpad/THREAD.md`.
|
|
5
|
+
|
|
6
|
+
## Current State
|
|
7
|
+
|
|
8
|
+
- Baseline: unknown
|
|
9
|
+
- Best valid result: unknown
|
|
10
|
+
- Active family: none
|
|
11
|
+
- Running jobs: none
|
|
12
|
+
- Next action: inspect repo and establish baseline
|
|
13
|
+
|
|
14
|
+
## Picklist
|
|
15
|
+
|
|
16
|
+
1. Establish baseline.
|
|
17
|
+
2. Identify metric extraction.
|
|
18
|
+
3. Run one smoke experiment.
|
|
19
|
+
|
|
20
|
+
## Ruled Out
|
|
21
|
+
|
|
22
|
+
None yet.
|
|
@@ -0,0 +1,14 @@
|
|
|
1
|
+
# Audits
|
|
2
|
+
|
|
3
|
+
Use this file for benchmark-rule checks, portability checks, and claim audits.
|
|
4
|
+
|
|
5
|
+
## Benchmark Rules
|
|
6
|
+
|
|
7
|
+
- Baseline command: unknown
|
|
8
|
+
- Evaluation command: unknown
|
|
9
|
+
- Allowed changes: unknown
|
|
10
|
+
- Forbidden changes: unknown
|
|
11
|
+
|
|
12
|
+
## Claims
|
|
13
|
+
|
|
14
|
+
No claims yet.
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
|