agent.libx.js 0.93.33 → 0.93.35

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (3) hide show
  1. package/README.md +59 -9
  2. package/package.json +2 -2
  3. package/cli/cli.ts +0 -2362
package/README.md CHANGED
@@ -1,6 +1,21 @@
1
1
  # agent.libx.js
2
2
 
3
- **A coding agent on par with Claude Code — that also runs where Claude Code can't.**
3
+ [![npm](https://img.shields.io/npm/v/agent.libx.js?color=cb3837&logo=npm)](https://www.npmjs.com/package/agent.libx.js)
4
+ [![publish](https://github.com/Livshitz/agent.libx.js/actions/workflows/publish.yml/badge.svg)](https://github.com/Livshitz/agent.libx.js/actions/workflows/publish.yml)
5
+ [![license](https://img.shields.io/npm/l/agent.libx.js)](./LICENSE)
6
+ ![runtime](https://img.shields.io/badge/runtime-Bun-black?logo=bun)
7
+ ![edge-ready](https://img.shields.io/badge/runs-node%20%C2%B7%20browser%20%C2%B7%20edge-brightgreen)
8
+
9
+ **A coding agent that matches Claude Code on correctness — then beats it on cost, tokens, and tool-efficiency, and runs where Claude Code can't (sandbox, browser/edge, database).**
10
+
11
+ <!-- DEMO GIF: record a ~15s session (asciinema rec demo.cast → agg demo.cast docs/demo.gif, or a terminal-screen capture) and replace this comment with:
12
+ <p align="center"><img src="docs/demo.gif" alt="agentx fixing a bug" width="720"></p> -->
13
+
14
+ ```console
15
+ $ agentx "There's a bug in math.js — add() subtracts instead of adds. Fix it."
16
+ ⚙ Edit math.js a - b → a + b
17
+ ✓ Fixed: `a - b` → `a + b` in add().
18
+ ```
4
19
 
5
20
  By default it's a full-strength terminal coding agent: real disk, real shell, and the same Read/Edit/Grep/permissions/streaming-DX surface you'd expect from Claude Code. The difference is its two host couplings are swappable seams:
6
21
 
@@ -11,16 +26,41 @@ So the *same* agent loop also runs **sandboxed** (in-memory VFS, real disk untou
11
26
 
12
27
  Claude Code is the floor; running isolated, on the edge, or hybrid is the ceiling.
13
28
 
29
+ ## How it stacks up vs Claude Code
30
+
31
+ **Correctness parity — efficiency, cost, and reach are the lead.** Hard 7-task coding suite, Sonnet, *denoised* (each task ×3, no lucky run promotes; `SUITE=hard bun compare/run.ts`):
32
+
33
+ | | agent.libx.js | Claude Code |
34
+ |---|---|---|
35
+ | Correctness | 7/7 | 7/7 — **parity** |
36
+ | Tool-calls | **16** | 28 — **−43%** |
37
+ | Tokens | **69k** | 171k — **2.5× fewer** |
38
+ | Wall-time | **~100s** | 133s — **~25% faster** |
39
+
40
+ **Cost** (9-task hard suite, USD-metered, vs CC-on-Opus): **$0.49** single-tier Sonnet (**5.4× cheaper**) · **$0.82** three-tier voice/duplex (**3.3× cheaper**) vs CC-Opus **$2.67** — at quality parity (16/18 vs 17/18 passes).
41
+
42
+ Plus things Claude Code simply doesn't do:
43
+
44
+ - **Runs where CC can't** — the *same* agent loop runs on real disk, an in-memory **sandbox**, the **browser/edge** (no Node, no `/bin/sh`), or a **database-backed** workspace. Swap the filesystem, not the agent.
45
+ - **Keyless web search, built in** — `WebSearch` works in any deployment with no API key (DuckDuckGo; auto-upgrades to Tavily if you set one). CC's search is Anthropic-server-bound.
46
+ - **Context-safe by default** — a 1 MB `Grep`/`Read`/MCP result is auto-paginated and can't blow the window; buried detail is recovered via a cheap context-isolated `Ask` peek — **~5.3× cheaper and more accurate** than re-fetching, in a head-to-head.
47
+ - **It improves its own efficiency** — an autonomous evolution loop cut its own tool-use **~50% (32 → 15** on the core suite, denoised), self-discovered, not hand-tuned — the same lever behind the efficiency lead above.
48
+
49
+ *Honest scope:* the win is **efficiency / cost / reach**, not a claim of smarter reasoning — correctness is parity. All figures are denoised and reproducible (see [Eval & compare](#eval--compare)); full boards in [`mind/09-outperform.md`](./mind/09-outperform.md).
50
+
14
51
  ## Quickstart
15
52
 
53
+ Point it at your project — no clone needed (requires [Bun](https://bun.sh)):
54
+
16
55
  ```bash
17
- bun install # links wcli (file:), ai.libx.js + libx.js (bun link)
18
- bun test # 34 unit/integration tests (no API key needed)
19
- ANTHROPIC_API_KEY=… bun examples/run-sonnet.ts # drive a real model
20
- bun eval/run.ts # quantitative eval scorecard (real model)
56
+ export ANTHROPIC_API_KEY=… # or OPENAI_API_KEY / GOOGLE_API_KEY / GROQ_API_KEY
57
+ bunx agent.libx.js "find and fix the failing test" # run once in the current directory
58
+ bunx agent.libx.js # …or open the interactive REPL
21
59
  ```
22
60
 
23
- ## Use it
61
+ Want a permanent command? `bun add -g agent.libx.js`, then just `agentx` (and `agentx --duplex` for voice). The agent has full real-disk + shell access by default (like Claude Code); add `--sandbox` to work on an in-memory copy instead. See [The `agentx` CLI](#the-agentx-cli) for flags, sessions, and slash commands.
62
+
63
+ ## Use it as a library
24
64
 
25
65
  ```ts
26
66
  import { AIClient } from 'ai.libx.js';
@@ -103,14 +143,24 @@ Full design + threat model + results: [`mind/08-self-evolve.md`](./mind/08-self-
103
143
 
104
144
  ## Status
105
145
 
106
- **v1 (done):** loop + hybrid tools + Mem/Disk backends + deterministic `FakeAIClient` tests + real-model run. **5/5 pass@1** on the behavioral eval (Sonnet 4.6); the head-to-head started at correctness parity with Claude Code but ~2× the tool calls (≈28 vs 15) — a gap the **self-evolution loop has now closed autonomously**: it drove its own baseline from 32 → 15 tool-calls (denoised over 3 runs) and ties Claude Code in a fresh head-to-head (15 vs 15). **112 tests green.**
146
+ **v1 (done):** loop + hybrid tools + Mem/Disk backends + deterministic `FakeAIClient` tests + real-model run. **5/5 pass@1** on the behavioral eval (Sonnet 4.6); the head-to-head started at correctness parity with Claude Code but ~2× the tool calls (≈28 vs 15) — a gap the **self-evolution loop has now closed autonomously**: it drove its own baseline from 32 → 15 tool-calls (denoised over 3 runs) and ties Claude Code in a fresh head-to-head (15 vs 15). **820+ tests green.**
107
147
 
108
148
  See [`mind/`](./mind/) for the full vision, architecture, decision journal, roadmap, eval + head-to-head results, the [parity plan](./mind/05-parity.md), and the [self-evolution design](./mind/08-self-evolve.md).
109
149
 
110
- ## Eval & compare
150
+ ## Develop & evaluate
151
+
152
+ Hacking on the runtime itself (from a clone):
153
+
154
+ ```bash
155
+ bun install # links wcli (file:), ai.libx.js + libx.js (bun link)
156
+ bun test # 820+ unit/integration tests (offline via FakeAIClient, no key)
157
+ ANTHROPIC_API_KEY=… bun examples/run-sonnet.ts # drive a real model end-to-end
158
+ ```
159
+
160
+ Eval & head-to-head (real model):
111
161
 
112
162
  ```bash
113
163
  bun eval/run.ts # behavioral scorecard (our agent over MemFilesystem)
114
164
  bun compare/seed-tasks.ts # materialize task specs into .tmp/tasks/
115
- bun compare/run.ts # head-to-head vs Claude Code (needs `claude` CLI)
165
+ bun compare/run.ts # head-to-head vs Claude Code (needs the `claude` CLI)
116
166
  ```
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "agent.libx.js",
3
- "version": "0.93.33",
3
+ "version": "0.93.35",
4
4
  "description": "Edge-native AI agent runtime — drives a virtual filesystem via any LLM (ai.libx.js). Same bytes run in node, browser, or edge.",
5
5
  "type": "module",
6
6
  "main": "./dist/index.js",
@@ -46,7 +46,7 @@
46
46
  "node": ">=18"
47
47
  },
48
48
  "bin": {
49
- "agentx": "cli/cli.ts"
49
+ "agentx": "dist/cli.js"
50
50
  },
51
51
  "files": [
52
52
  "dist",