agent.libx.js 0.93.33 → 0.93.34

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (3) hide show
  1. package/README.md +44 -9
  2. package/package.json +2 -2
  3. package/cli/cli.ts +0 -2362
package/README.md CHANGED
@@ -1,6 +1,6 @@
1
1
  # agent.libx.js
2
2
 
3
- **A coding agent on par with Claude Code — that also runs where Claude Code can't.**
3
+ **A coding agent that matches Claude Code on correctness then beats it on cost, tokens, and tool-efficiency, and runs where Claude Code can't (sandbox, browser/edge, database).**
4
4
 
5
5
  By default it's a full-strength terminal coding agent: real disk, real shell, and the same Read/Edit/Grep/permissions/streaming-DX surface you'd expect from Claude Code. The difference is its two host couplings are swappable seams:
6
6
 
@@ -11,16 +11,41 @@ So the *same* agent loop also runs **sandboxed** (in-memory VFS, real disk untou
11
11
 
12
12
  Claude Code is the floor; running isolated, on the edge, or hybrid is the ceiling.
13
13
 
14
+ ## How it stacks up vs Claude Code
15
+
16
+ **Correctness parity — efficiency, cost, and reach are the lead.** Hard 7-task coding suite, Sonnet, *denoised* (each task ×3, no lucky run promotes; `SUITE=hard bun compare/run.ts`):
17
+
18
+ | | agent.libx.js | Claude Code |
19
+ |---|---|---|
20
+ | Correctness | 7/7 | 7/7 — **parity** |
21
+ | Tool-calls | **16** | 28 — **−43%** |
22
+ | Tokens | **69k** | 171k — **2.5× fewer** |
23
+ | Wall-time | **~100s** | 133s — **~25% faster** |
24
+
25
+ **Cost** (9-task hard suite, USD-metered, vs CC-on-Opus): **$0.49** single-tier Sonnet (**5.4× cheaper**) · **$0.82** three-tier voice/duplex (**3.3× cheaper**) vs CC-Opus **$2.67** — at quality parity (16/18 vs 17/18 passes).
26
+
27
+ Plus things Claude Code simply doesn't do:
28
+
29
+ - **Runs where CC can't** — the *same* agent loop runs on real disk, an in-memory **sandbox**, the **browser/edge** (no Node, no `/bin/sh`), or a **database-backed** workspace. Swap the filesystem, not the agent.
30
+ - **Keyless web search, built in** — `WebSearch` works in any deployment with no API key (DuckDuckGo; auto-upgrades to Tavily if you set one). CC's search is Anthropic-server-bound.
31
+ - **Context-safe by default** — a 1 MB `Grep`/`Read`/MCP result is auto-paginated and can't blow the window; buried detail is recovered via a cheap context-isolated `Ask` peek — **~5.3× cheaper and more accurate** than re-fetching, in a head-to-head.
32
+ - **It improves its own efficiency** — an autonomous evolution loop cut its own tool-use **~50% (32 → 15** on the core suite, denoised), self-discovered, not hand-tuned — the same lever behind the efficiency lead above.
33
+
34
+ *Honest scope:* the win is **efficiency / cost / reach**, not a claim of smarter reasoning — correctness is parity. All figures are denoised and reproducible (see [Eval & compare](#eval--compare)); full boards in [`mind/09-outperform.md`](./mind/09-outperform.md).
35
+
14
36
  ## Quickstart
15
37
 
38
+ Point it at your project — no clone needed (requires [Bun](https://bun.sh)):
39
+
16
40
  ```bash
17
- bun install # links wcli (file:), ai.libx.js + libx.js (bun link)
18
- bun test # 34 unit/integration tests (no API key needed)
19
- ANTHROPIC_API_KEY=… bun examples/run-sonnet.ts # drive a real model
20
- bun eval/run.ts # quantitative eval scorecard (real model)
41
+ export ANTHROPIC_API_KEY=… # or OPENAI_API_KEY / GOOGLE_API_KEY / GROQ_API_KEY
42
+ bunx agent.libx.js "find and fix the failing test" # run once in the current directory
43
+ bunx agent.libx.js # …or open the interactive REPL
21
44
  ```
22
45
 
23
- ## Use it
46
+ Want a permanent command? `bun add -g agent.libx.js`, then just `agentx` (and `agentx --duplex` for voice). The agent has full real-disk + shell access by default (like Claude Code); add `--sandbox` to work on an in-memory copy instead. See [The `agentx` CLI](#the-agentx-cli) for flags, sessions, and slash commands.
47
+
48
+ ## Use it as a library
24
49
 
25
50
  ```ts
26
51
  import { AIClient } from 'ai.libx.js';
@@ -103,14 +128,24 @@ Full design + threat model + results: [`mind/08-self-evolve.md`](./mind/08-self-
103
128
 
104
129
  ## Status
105
130
 
106
- **v1 (done):** loop + hybrid tools + Mem/Disk backends + deterministic `FakeAIClient` tests + real-model run. **5/5 pass@1** on the behavioral eval (Sonnet 4.6); the head-to-head started at correctness parity with Claude Code but ~2× the tool calls (≈28 vs 15) — a gap the **self-evolution loop has now closed autonomously**: it drove its own baseline from 32 → 15 tool-calls (denoised over 3 runs) and ties Claude Code in a fresh head-to-head (15 vs 15). **112 tests green.**
131
+ **v1 (done):** loop + hybrid tools + Mem/Disk backends + deterministic `FakeAIClient` tests + real-model run. **5/5 pass@1** on the behavioral eval (Sonnet 4.6); the head-to-head started at correctness parity with Claude Code but ~2× the tool calls (≈28 vs 15) — a gap the **self-evolution loop has now closed autonomously**: it drove its own baseline from 32 → 15 tool-calls (denoised over 3 runs) and ties Claude Code in a fresh head-to-head (15 vs 15). **820+ tests green.**
107
132
 
108
133
  See [`mind/`](./mind/) for the full vision, architecture, decision journal, roadmap, eval + head-to-head results, the [parity plan](./mind/05-parity.md), and the [self-evolution design](./mind/08-self-evolve.md).
109
134
 
110
- ## Eval & compare
135
+ ## Develop & evaluate
136
+
137
+ Hacking on the runtime itself (from a clone):
138
+
139
+ ```bash
140
+ bun install # links wcli (file:), ai.libx.js + libx.js (bun link)
141
+ bun test # 820+ unit/integration tests (offline via FakeAIClient, no key)
142
+ ANTHROPIC_API_KEY=… bun examples/run-sonnet.ts # drive a real model end-to-end
143
+ ```
144
+
145
+ Eval & head-to-head (real model):
111
146
 
112
147
  ```bash
113
148
  bun eval/run.ts # behavioral scorecard (our agent over MemFilesystem)
114
149
  bun compare/seed-tasks.ts # materialize task specs into .tmp/tasks/
115
- bun compare/run.ts # head-to-head vs Claude Code (needs `claude` CLI)
150
+ bun compare/run.ts # head-to-head vs Claude Code (needs the `claude` CLI)
116
151
  ```
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "agent.libx.js",
3
- "version": "0.93.33",
3
+ "version": "0.93.34",
4
4
  "description": "Edge-native AI agent runtime — drives a virtual filesystem via any LLM (ai.libx.js). Same bytes run in node, browser, or edge.",
5
5
  "type": "module",
6
6
  "main": "./dist/index.js",
@@ -46,7 +46,7 @@
46
46
  "node": ">=18"
47
47
  },
48
48
  "bin": {
49
- "agentx": "cli/cli.ts"
49
+ "agentx": "dist/cli.js"
50
50
  },
51
51
  "files": [
52
52
  "dist",