npm - @metaharness/weight-eft - Versions diffs - 0.1.0 → 0.1.1 - Mend

@metaharness/weight-eft 0.1.0 → 0.1.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (2) hide show

package/README.md +38 -14
package/package.json +26 -10

package/README.md CHANGED Viewed

@@ -1,23 +1,47 @@
 # @metaharness/weight-eft
-**Evolutionary fine-tuning** — the bridge from Darwin's gradient-FREE policy
-evolution (*freeze the model, evolve the harness*) to **gradient / weight**
-self-learning on the **open cheap tier**.
+> **Make cheap open-source LLMs solve more coding tasks on their own.** Fine-tune them (LoRA) on your AI agent's *past successful runs*, so your pipeline calls expensive frontier models (GPT, Claude) **less often** — and your cost-per-fix drops.
-## Thesis (honest, bounded)
+[![npm version](https://img.shields.io/npm/v/@metaharness/weight-eft.svg)](https://www.npmjs.com/package/@metaharness/weight-eft)
+[![license: MIT](https://img.shields.io/npm/l/@metaharness/weight-eft.svg)](./LICENSE)
+[![node](https://img.shields.io/node/v/@metaharness/weight-eft.svg)](https://nodejs.org)
-We attack the **cost-Pareto axis, not the frontier ceiling.**
+```bash
+npm i @metaharness/weight-eft
+```
+## What is this? (plain language)
+If you run an **AI coding agent**, you probably use a **model cascade**: a cheap
+model (GLM / Qwen / DeepSeek) tries first, and only the hard problems
+**escalate** to an expensive frontier model (GPT / Claude). Every escalation
+costs real money.
+**`weight-eft` makes the cheap model smarter** by fine-tuning it with **LoRA** on
+the trajectories your agent *already solved* — turning your run history into
+training data. The cheap model then resolves more issues by itself, so you
+**escalate less and pay less per solved task.**
+It's a self-improving loop: **your agent's wins become the next model's training set.**
+- **Input:** your agent's run archive (successful + failed trajectories).
+- **Output:** portable LoRA training data — **SFT + DPO** in standard formats
+  (OpenAI chat JSONL / TRL / axolotl / unsloth) **+ a GPU training plan**.
+- **Goal:** lower **cost-per-resolved**, not a leaderboard score.
+## Why it exists (the honest, bounded thesis)
-The metaharness cascade runs a cheap open model first (GLM / Qwen / DeepSeek)
-and **escalates to a frontier model** (Opus / GPT) only on the hard tail. Each
-escalation costs ~$0.50. `weight-eft` **distills the harness's archival
-success into the cheap tier via LoRA** so the cheap model resolves more issues
-on its own → **the cascade escalates less often** → **$/resolved drops.**
+We attack the **cost axis, not the capability ceiling.** A small (7-14B) local
+fine-tune **will not** out-reason a frontier model on the hardest problems —
+that's a model-capability ceiling (measured: clean-eval ~37.3%, ADR-198 / §53).
+The win is **fewer escalations** (lower cost), and the tooling keeps the
+telemetry honest about exactly that: the eval metric is
+**escalation-rate-reduction + cost/resolved**, *never* "we beat the frontier."
-A 7-14B local-GPU tune **will not crack the hard tail** — that's a frontier
-reasoning ceiling (clean-eval ~37.3%, ADR-177 §53). The win is **fewer
-escalations**, and the telemetry stays honest about that. The eval metric is
-**escalation-rate-reduction + cost/resolved**, *not* hard-tail cracking.
+Under the hood this is the gradient/weight counterpart to Darwin's gradient-free
+policy evolution (*freeze the model, evolve the harness*) — here we **also**
+evolve the cheap model's *weights*, on the open tier, from the harness's own
+archive.
 ## The data recipe (on/off-policy)

package/package.json CHANGED Viewed

@@ -1,7 +1,7 @@
 {
   "name": "@metaharness/weight-eft",
-  "version": "0.1.0",
-  "description": "Evolutionary fine-tuning — distill the harness's archival success into the open cheap tier (GLM/Qwen) via LoRA so the cost-cascade escalates to a frontier model less often. SFT-distills ALL gold-resolved trajectories; on-policy DPO on GLM-vs-GLM pairs only. Attacks the cost-Pareto axis (fewer escalations), NOT the frontier reasoning ceiling. Strict train/eval instance-ID disjointness (the contamination guard).",
+  "version": "0.1.1",
+  "description": "Fine-tune cheap open-source LLMs (GLM, Qwen, DeepSeek) on your AI coding agent's successful runs with LoRA (SFT + DPO) so your model cascade escalates to expensive frontier models (GPT, Claude) less often — cutting cost-per-resolved. Turns run history into portable training data (OpenAI/TRL/axolotl JSONL) with a built-in contamination guard and reward-hacking filter.",
   "type": "module",
   "main": "./dist/index.js",
   "types": "./dist/index.d.ts",
@@ -32,20 +32,36 @@
     "llm",
     "lora",
     "fine-tuning",
-    "evolutionary-fine-tuning",
-    "weight-eft",
+    "peft",
     "sft",
     "dpo",
-    "distillation",
-    "cost-optimization",
+    "rlhf-alternative",
+    "model-distillation",
+    "knowledge-distillation",
+    "ai-agents",
+    "coding-agent",
+    "agentic",
+    "llm-agent",
     "swe-bench",
-    "metaharness",
-    "darwin-mode",
-    "contamination-guard"
+    "llm-routing",
+    "model-cascade",
+    "cost-optimization",
+    "llm-cost",
+    "openrouter",
+    "qwen",
+    "deepseek",
+    "training-data",
+    "jsonl",
+    "trl",
+    "axolotl",
+    "unsloth",
+    "self-improving",
+    "weight-eft",
+    "metaharness"
   ],
   "author": "rUv <ruv@ruv.net>",
   "license": "MIT",
-  "homepage": "https://github.com/ruvnet/agent-harness-generator",
+  "homepage": "https://github.com/ruvnet/agent-harness-generator/tree/main/packages/weight-eft#readme",
   "repository": {
     "type": "git",
     "url": "https://github.com/ruvnet/agent-harness-generator",