agent.libx.js 0.93.33 → 0.93.35
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +59 -9
- package/package.json +2 -2
- package/cli/cli.ts +0 -2362
package/README.md
CHANGED
|
@@ -1,6 +1,21 @@
|
|
|
1
1
|
# agent.libx.js
|
|
2
2
|
|
|
3
|
-
|
|
3
|
+
[](https://www.npmjs.com/package/agent.libx.js)
|
|
4
|
+
[](https://github.com/Livshitz/agent.libx.js/actions/workflows/publish.yml)
|
|
5
|
+
[](./LICENSE)
|
|
6
|
+

|
|
7
|
+

|
|
8
|
+
|
|
9
|
+
**A coding agent that matches Claude Code on correctness — then beats it on cost, tokens, and tool-efficiency, and runs where Claude Code can't (sandbox, browser/edge, database).**
|
|
10
|
+
|
|
11
|
+
<!-- DEMO GIF: record a ~15s session (asciinema rec demo.cast → agg demo.cast docs/demo.gif, or a terminal-screen capture) and replace this comment with:
|
|
12
|
+
<p align="center"><img src="docs/demo.gif" alt="agentx fixing a bug" width="720"></p> -->
|
|
13
|
+
|
|
14
|
+
```console
|
|
15
|
+
$ agentx "There's a bug in math.js — add() subtracts instead of adds. Fix it."
|
|
16
|
+
⚙ Edit math.js a - b → a + b
|
|
17
|
+
✓ Fixed: `a - b` → `a + b` in add().
|
|
18
|
+
```
|
|
4
19
|
|
|
5
20
|
By default it's a full-strength terminal coding agent: real disk, real shell, and the same Read/Edit/Grep/permissions/streaming-DX surface you'd expect from Claude Code. The difference is its two host couplings are swappable seams:
|
|
6
21
|
|
|
@@ -11,16 +26,41 @@ So the *same* agent loop also runs **sandboxed** (in-memory VFS, real disk untou
|
|
|
11
26
|
|
|
12
27
|
Claude Code is the floor; running isolated, on the edge, or hybrid is the ceiling.
|
|
13
28
|
|
|
29
|
+
## How it stacks up vs Claude Code
|
|
30
|
+
|
|
31
|
+
**Correctness parity — efficiency, cost, and reach are the lead.** Hard 7-task coding suite, Sonnet, *denoised* (each task ×3, no lucky run promotes; `SUITE=hard bun compare/run.ts`):
|
|
32
|
+
|
|
33
|
+
| | agent.libx.js | Claude Code |
|
|
34
|
+
|---|---|---|
|
|
35
|
+
| Correctness | 7/7 | 7/7 — **parity** |
|
|
36
|
+
| Tool-calls | **16** | 28 — **−43%** |
|
|
37
|
+
| Tokens | **69k** | 171k — **2.5× fewer** |
|
|
38
|
+
| Wall-time | **~100s** | 133s — **~25% faster** |
|
|
39
|
+
|
|
40
|
+
**Cost** (9-task hard suite, USD-metered, vs CC-on-Opus): **$0.49** single-tier Sonnet (**5.4× cheaper**) · **$0.82** three-tier voice/duplex (**3.3× cheaper**) vs CC-Opus **$2.67** — at quality parity (16/18 vs 17/18 passes).
|
|
41
|
+
|
|
42
|
+
Plus things Claude Code simply doesn't do:
|
|
43
|
+
|
|
44
|
+
- **Runs where CC can't** — the *same* agent loop runs on real disk, an in-memory **sandbox**, the **browser/edge** (no Node, no `/bin/sh`), or a **database-backed** workspace. Swap the filesystem, not the agent.
|
|
45
|
+
- **Keyless web search, built in** — `WebSearch` works in any deployment with no API key (DuckDuckGo; auto-upgrades to Tavily if you set one). CC's search is Anthropic-server-bound.
|
|
46
|
+
- **Context-safe by default** — a 1 MB `Grep`/`Read`/MCP result is auto-paginated and can't blow the window; buried detail is recovered via a cheap context-isolated `Ask` peek — **~5.3× cheaper and more accurate** than re-fetching, in a head-to-head.
|
|
47
|
+
- **It improves its own efficiency** — an autonomous evolution loop cut its own tool-use **~50% (32 → 15** on the core suite, denoised), self-discovered, not hand-tuned — the same lever behind the efficiency lead above.
|
|
48
|
+
|
|
49
|
+
*Honest scope:* the win is **efficiency / cost / reach**, not a claim of smarter reasoning — correctness is parity. All figures are denoised and reproducible (see [Eval & compare](#eval--compare)); full boards in [`mind/09-outperform.md`](./mind/09-outperform.md).
|
|
50
|
+
|
|
14
51
|
## Quickstart
|
|
15
52
|
|
|
53
|
+
Point it at your project — no clone needed (requires [Bun](https://bun.sh)):
|
|
54
|
+
|
|
16
55
|
```bash
|
|
17
|
-
|
|
18
|
-
|
|
19
|
-
|
|
20
|
-
bun eval/run.ts # quantitative eval scorecard (real model)
|
|
56
|
+
export ANTHROPIC_API_KEY=… # or OPENAI_API_KEY / GOOGLE_API_KEY / GROQ_API_KEY
|
|
57
|
+
bunx agent.libx.js "find and fix the failing test" # run once in the current directory
|
|
58
|
+
bunx agent.libx.js # …or open the interactive REPL
|
|
21
59
|
```
|
|
22
60
|
|
|
23
|
-
|
|
61
|
+
Want a permanent command? `bun add -g agent.libx.js`, then just `agentx` (and `agentx --duplex` for voice). The agent has full real-disk + shell access by default (like Claude Code); add `--sandbox` to work on an in-memory copy instead. See [The `agentx` CLI](#the-agentx-cli) for flags, sessions, and slash commands.
|
|
62
|
+
|
|
63
|
+
## Use it as a library
|
|
24
64
|
|
|
25
65
|
```ts
|
|
26
66
|
import { AIClient } from 'ai.libx.js';
|
|
@@ -103,14 +143,24 @@ Full design + threat model + results: [`mind/08-self-evolve.md`](./mind/08-self-
|
|
|
103
143
|
|
|
104
144
|
## Status
|
|
105
145
|
|
|
106
|
-
**v1 (done):** loop + hybrid tools + Mem/Disk backends + deterministic `FakeAIClient` tests + real-model run. **5/5 pass@1** on the behavioral eval (Sonnet 4.6); the head-to-head started at correctness parity with Claude Code but ~2× the tool calls (≈28 vs 15) — a gap the **self-evolution loop has now closed autonomously**: it drove its own baseline from 32 → 15 tool-calls (denoised over 3 runs) and ties Claude Code in a fresh head-to-head (15 vs 15). **
|
|
146
|
+
**v1 (done):** loop + hybrid tools + Mem/Disk backends + deterministic `FakeAIClient` tests + real-model run. **5/5 pass@1** on the behavioral eval (Sonnet 4.6); the head-to-head started at correctness parity with Claude Code but ~2× the tool calls (≈28 vs 15) — a gap the **self-evolution loop has now closed autonomously**: it drove its own baseline from 32 → 15 tool-calls (denoised over 3 runs) and ties Claude Code in a fresh head-to-head (15 vs 15). **820+ tests green.**
|
|
107
147
|
|
|
108
148
|
See [`mind/`](./mind/) for the full vision, architecture, decision journal, roadmap, eval + head-to-head results, the [parity plan](./mind/05-parity.md), and the [self-evolution design](./mind/08-self-evolve.md).
|
|
109
149
|
|
|
110
|
-
##
|
|
150
|
+
## Develop & evaluate
|
|
151
|
+
|
|
152
|
+
Hacking on the runtime itself (from a clone):
|
|
153
|
+
|
|
154
|
+
```bash
|
|
155
|
+
bun install # links wcli (file:), ai.libx.js + libx.js (bun link)
|
|
156
|
+
bun test # 820+ unit/integration tests (offline via FakeAIClient, no key)
|
|
157
|
+
ANTHROPIC_API_KEY=… bun examples/run-sonnet.ts # drive a real model end-to-end
|
|
158
|
+
```
|
|
159
|
+
|
|
160
|
+
Eval & head-to-head (real model):
|
|
111
161
|
|
|
112
162
|
```bash
|
|
113
163
|
bun eval/run.ts # behavioral scorecard (our agent over MemFilesystem)
|
|
114
164
|
bun compare/seed-tasks.ts # materialize task specs into .tmp/tasks/
|
|
115
|
-
bun compare/run.ts # head-to-head vs Claude Code (needs `claude` CLI)
|
|
165
|
+
bun compare/run.ts # head-to-head vs Claude Code (needs the `claude` CLI)
|
|
116
166
|
```
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "agent.libx.js",
|
|
3
|
-
"version": "0.93.
|
|
3
|
+
"version": "0.93.35",
|
|
4
4
|
"description": "Edge-native AI agent runtime — drives a virtual filesystem via any LLM (ai.libx.js). Same bytes run in node, browser, or edge.",
|
|
5
5
|
"type": "module",
|
|
6
6
|
"main": "./dist/index.js",
|
|
@@ -46,7 +46,7 @@
|
|
|
46
46
|
"node": ">=18"
|
|
47
47
|
},
|
|
48
48
|
"bin": {
|
|
49
|
-
"agentx": "
|
|
49
|
+
"agentx": "dist/cli.js"
|
|
50
50
|
},
|
|
51
51
|
"files": [
|
|
52
52
|
"dist",
|