@metaharness/darwin 0.2.5 → 0.2.6

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -2,6 +2,10 @@
2
2
 
3
3
  All notable changes to this package. Dates UTC.
4
4
 
5
+ ## 0.2.6 — 2026-06-19
6
+
7
+ - **3-tier hybrid = 175/300 = 58.3%** [52.7,63.8] on full SWE-bench Lite (ADR-154), VERIFIED (55/55 sage-added reproduced). v4-pro(88)->sonnet Scholar(+33)->opus Sage(+54). 7.6x the 7.7% baseline; conservative lower bound (Sage partial). Blended ~$0.74/instance.
8
+
5
9
  ## 0.2.5 — 2026-06-19
6
10
 
7
11
  - New ceiling (ADR-152): **v4-pro + Scholar hybrid = 121/300 = 40.3%** [34.9,46.0] on full SWE-bench Lite — 5.2x the 7.7% baseline. Two levers stack: stronger cheap base (v4-pro, 88/300) + frontier-tail escalation (sonnet-4 recovers 33/212). Blended ~$0.39/instance.
package/README.md CHANGED
@@ -286,6 +286,7 @@ context + symbol-aware localization + search/replace patch, `deepseek-chat`, ~$0
286
286
  | **+ closed-loop repair (test-feedback, ≤3)** | 46/300 = **15.3%** | **[11.7, 19.8]** | 149 |
287
287
  | **+ swap base → deepseek-v4-pro (cheap)** | 88/300 = **29.3%** | **[24.5, 34.7]** | 151 |
288
288
  | **+ v4-pro + Scholar hybrid** | 121/300 = **40.3%** | **[34.9, 46.0]** | 152 |
289
+ | **+ Sage (opus-4.8) — 3-tier** | 175/300 = **58.3%** | **[52.7, 63.8]** | 154 |
289
290
  | **+ Barbarian&Scholar hybrid (cheap+frontier tail)** | 100/300 = **33.3%** | **[28.2, 38.8]** | 148 |
290
291
 
291
292
  The closed-loop repair loop **~doubles** the resolve-rate (7.7% → 15.3%) on the *same cheap model*
package/package.json CHANGED
@@ -1,7 +1,7 @@
1
1
  {
2
2
  "name": "@metaharness/darwin",
3
- "version": "0.2.5",
4
- "description": "An LLM supercharger and cost optimizer: freeze the model class, evolve the harness. Measured on full SWE-bench Lite (300, official swebench Docker): 7.7% open-loop -> 15.3% +repair -> 29.3% (deepseek-v4-pro base) -> 40.3% v4-pro+frontier-tail hybrid, ~$0.01-$0.39/instance (vs $1-20 for frontier agents). The harness, not the model, is the lever. Dependency-free (Node built-ins).",
3
+ "version": "0.2.6",
4
+ "description": "An LLM supercharger and cost optimizer: freeze the model class, evolve the harness. Measured on full SWE-bench Lite (300, official swebench Docker, verified): 7.7% open-loop -> 15.3% +repair -> 29.3% (v4-pro base) -> 40.3% 2-tier -> 58.3% 3-tier cheap->frontier escalation, ~$0.01-$0.74/instance (vs $1-20 for frontier agents). The harness, not the model, is the lever. Dependency-free (Node built-ins).",
5
5
  "type": "module",
6
6
  "main": "./dist/index.js",
7
7
  "types": "./dist/index.d.ts",