@metaharness/weight-eft 0.1.0 → 0.1.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +38 -14
- package/package.json +26 -10
package/README.md
CHANGED
|
@@ -1,23 +1,47 @@
|
|
|
1
1
|
# @metaharness/weight-eft
|
|
2
2
|
|
|
3
|
-
**
|
|
4
|
-
evolution (*freeze the model, evolve the harness*) to **gradient / weight**
|
|
5
|
-
self-learning on the **open cheap tier**.
|
|
3
|
+
> **Make cheap open-source LLMs solve more coding tasks on their own.** Fine-tune them (LoRA) on your AI agent's *past successful runs*, so your pipeline calls expensive frontier models (GPT, Claude) **less often** — and your cost-per-fix drops.
|
|
6
4
|
|
|
7
|
-
|
|
5
|
+
[](https://www.npmjs.com/package/@metaharness/weight-eft)
|
|
6
|
+
[](./LICENSE)
|
|
7
|
+
[](https://nodejs.org)
|
|
8
8
|
|
|
9
|
-
|
|
9
|
+
```bash
|
|
10
|
+
npm i @metaharness/weight-eft
|
|
11
|
+
```
|
|
12
|
+
|
|
13
|
+
## What is this? (plain language)
|
|
14
|
+
|
|
15
|
+
If you run an **AI coding agent**, you probably use a **model cascade**: a cheap
|
|
16
|
+
model (GLM / Qwen / DeepSeek) tries first, and only the hard problems
|
|
17
|
+
**escalate** to an expensive frontier model (GPT / Claude). Every escalation
|
|
18
|
+
costs real money.
|
|
19
|
+
|
|
20
|
+
**`weight-eft` makes the cheap model smarter** by fine-tuning it with **LoRA** on
|
|
21
|
+
the trajectories your agent *already solved* — turning your run history into
|
|
22
|
+
training data. The cheap model then resolves more issues by itself, so you
|
|
23
|
+
**escalate less and pay less per solved task.**
|
|
24
|
+
|
|
25
|
+
It's a self-improving loop: **your agent's wins become the next model's training set.**
|
|
26
|
+
|
|
27
|
+
- **Input:** your agent's run archive (successful + failed trajectories).
|
|
28
|
+
- **Output:** portable LoRA training data — **SFT + DPO** in standard formats
|
|
29
|
+
(OpenAI chat JSONL / TRL / axolotl / unsloth) **+ a GPU training plan**.
|
|
30
|
+
- **Goal:** lower **cost-per-resolved**, not a leaderboard score.
|
|
31
|
+
|
|
32
|
+
## Why it exists (the honest, bounded thesis)
|
|
10
33
|
|
|
11
|
-
|
|
12
|
-
|
|
13
|
-
|
|
14
|
-
|
|
15
|
-
|
|
34
|
+
We attack the **cost axis, not the capability ceiling.** A small (7-14B) local
|
|
35
|
+
fine-tune **will not** out-reason a frontier model on the hardest problems —
|
|
36
|
+
that's a model-capability ceiling (measured: clean-eval ~37.3%, ADR-198 / §53).
|
|
37
|
+
The win is **fewer escalations** (lower cost), and the tooling keeps the
|
|
38
|
+
telemetry honest about exactly that: the eval metric is
|
|
39
|
+
**escalation-rate-reduction + cost/resolved**, *never* "we beat the frontier."
|
|
16
40
|
|
|
17
|
-
|
|
18
|
-
|
|
19
|
-
|
|
20
|
-
|
|
41
|
+
Under the hood this is the gradient/weight counterpart to Darwin's gradient-free
|
|
42
|
+
policy evolution (*freeze the model, evolve the harness*) — here we **also**
|
|
43
|
+
evolve the cheap model's *weights*, on the open tier, from the harness's own
|
|
44
|
+
archive.
|
|
21
45
|
|
|
22
46
|
## The data recipe (on/off-policy)
|
|
23
47
|
|
package/package.json
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "@metaharness/weight-eft",
|
|
3
|
-
"version": "0.1.
|
|
4
|
-
"description": "
|
|
3
|
+
"version": "0.1.1",
|
|
4
|
+
"description": "Fine-tune cheap open-source LLMs (GLM, Qwen, DeepSeek) on your AI coding agent's successful runs with LoRA (SFT + DPO) so your model cascade escalates to expensive frontier models (GPT, Claude) less often — cutting cost-per-resolved. Turns run history into portable training data (OpenAI/TRL/axolotl JSONL) with a built-in contamination guard and reward-hacking filter.",
|
|
5
5
|
"type": "module",
|
|
6
6
|
"main": "./dist/index.js",
|
|
7
7
|
"types": "./dist/index.d.ts",
|
|
@@ -32,20 +32,36 @@
|
|
|
32
32
|
"llm",
|
|
33
33
|
"lora",
|
|
34
34
|
"fine-tuning",
|
|
35
|
-
"
|
|
36
|
-
"weight-eft",
|
|
35
|
+
"peft",
|
|
37
36
|
"sft",
|
|
38
37
|
"dpo",
|
|
39
|
-
"
|
|
40
|
-
"
|
|
38
|
+
"rlhf-alternative",
|
|
39
|
+
"model-distillation",
|
|
40
|
+
"knowledge-distillation",
|
|
41
|
+
"ai-agents",
|
|
42
|
+
"coding-agent",
|
|
43
|
+
"agentic",
|
|
44
|
+
"llm-agent",
|
|
41
45
|
"swe-bench",
|
|
42
|
-
"
|
|
43
|
-
"
|
|
44
|
-
"
|
|
46
|
+
"llm-routing",
|
|
47
|
+
"model-cascade",
|
|
48
|
+
"cost-optimization",
|
|
49
|
+
"llm-cost",
|
|
50
|
+
"openrouter",
|
|
51
|
+
"qwen",
|
|
52
|
+
"deepseek",
|
|
53
|
+
"training-data",
|
|
54
|
+
"jsonl",
|
|
55
|
+
"trl",
|
|
56
|
+
"axolotl",
|
|
57
|
+
"unsloth",
|
|
58
|
+
"self-improving",
|
|
59
|
+
"weight-eft",
|
|
60
|
+
"metaharness"
|
|
45
61
|
],
|
|
46
62
|
"author": "rUv <ruv@ruv.net>",
|
|
47
63
|
"license": "MIT",
|
|
48
|
-
"homepage": "https://github.com/ruvnet/agent-harness-generator",
|
|
64
|
+
"homepage": "https://github.com/ruvnet/agent-harness-generator/tree/main/packages/weight-eft#readme",
|
|
49
65
|
"repository": {
|
|
50
66
|
"type": "git",
|
|
51
67
|
"url": "https://github.com/ruvnet/agent-harness-generator",
|