agent.libx.js 0.93.32 → 0.93.34

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (3) hide show
  1. package/README.md +46 -10
  2. package/package.json +2 -2
  3. package/cli/cli.ts +0 -2362
package/README.md CHANGED
@@ -1,6 +1,6 @@
1
1
  # agent.libx.js
2
2
 
3
- **A coding agent on par with Claude Code — that also runs where Claude Code can't.**
3
+ **A coding agent that matches Claude Code on correctness then beats it on cost, tokens, and tool-efficiency, and runs where Claude Code can't (sandbox, browser/edge, database).**
4
4
 
5
5
  By default it's a full-strength terminal coding agent: real disk, real shell, and the same Read/Edit/Grep/permissions/streaming-DX surface you'd expect from Claude Code. The difference is its two host couplings are swappable seams:
6
6
 
@@ -11,16 +11,41 @@ So the *same* agent loop also runs **sandboxed** (in-memory VFS, real disk untou
11
11
 
12
12
  Claude Code is the floor; running isolated, on the edge, or hybrid is the ceiling.
13
13
 
14
+ ## How it stacks up vs Claude Code
15
+
16
+ **Correctness parity — efficiency, cost, and reach are the lead.** Hard 7-task coding suite, Sonnet, *denoised* (each task ×3, no lucky run promotes; `SUITE=hard bun compare/run.ts`):
17
+
18
+ | | agent.libx.js | Claude Code |
19
+ |---|---|---|
20
+ | Correctness | 7/7 | 7/7 — **parity** |
21
+ | Tool-calls | **16** | 28 — **−43%** |
22
+ | Tokens | **69k** | 171k — **2.5× fewer** |
23
+ | Wall-time | **~100s** | 133s — **~25% faster** |
24
+
25
+ **Cost** (9-task hard suite, USD-metered, vs CC-on-Opus): **$0.49** single-tier Sonnet (**5.4× cheaper**) · **$0.82** three-tier voice/duplex (**3.3× cheaper**) vs CC-Opus **$2.67** — at quality parity (16/18 vs 17/18 passes).
26
+
27
+ Plus things Claude Code simply doesn't do:
28
+
29
+ - **Runs where CC can't** — the *same* agent loop runs on real disk, an in-memory **sandbox**, the **browser/edge** (no Node, no `/bin/sh`), or a **database-backed** workspace. Swap the filesystem, not the agent.
30
+ - **Keyless web search, built in** — `WebSearch` works in any deployment with no API key (DuckDuckGo; auto-upgrades to Tavily if you set one). CC's search is Anthropic-server-bound.
31
+ - **Context-safe by default** — a 1 MB `Grep`/`Read`/MCP result is auto-paginated and can't blow the window; buried detail is recovered via a cheap context-isolated `Ask` peek — **~5.3× cheaper and more accurate** than re-fetching, in a head-to-head.
32
+ - **It improves its own efficiency** — an autonomous evolution loop cut its own tool-use **~50% (32 → 15** on the core suite, denoised), self-discovered, not hand-tuned — the same lever behind the efficiency lead above.
33
+
34
+ *Honest scope:* the win is **efficiency / cost / reach**, not a claim of smarter reasoning — correctness is parity. All figures are denoised and reproducible (see [Eval & compare](#eval--compare)); full boards in [`mind/09-outperform.md`](./mind/09-outperform.md).
35
+
14
36
  ## Quickstart
15
37
 
38
+ Point it at your project — no clone needed (requires [Bun](https://bun.sh)):
39
+
16
40
  ```bash
17
- bun install # links wcli (file:), ai.libx.js + libx.js (bun link)
18
- bun test # 34 unit/integration tests (no API key needed)
19
- ANTHROPIC_API_KEY=… bun examples/run-sonnet.ts # drive a real model
20
- bun eval/run.ts # quantitative eval scorecard (real model)
41
+ export ANTHROPIC_API_KEY=… # or OPENAI_API_KEY / GOOGLE_API_KEY / GROQ_API_KEY
42
+ bunx agent.libx.js "find and fix the failing test" # run once in the current directory
43
+ bunx agent.libx.js # …or open the interactive REPL
21
44
  ```
22
45
 
23
- ## Use it
46
+ Want a permanent command? `bun add -g agent.libx.js`, then just `agentx` (and `agentx --duplex` for voice). The agent has full real-disk + shell access by default (like Claude Code); add `--sandbox` to work on an in-memory copy instead. See [The `agentx` CLI](#the-agentx-cli) for flags, sessions, and slash commands.
47
+
48
+ ## Use it as a library
24
49
 
25
50
  ```ts
26
51
  import { AIClient } from 'ai.libx.js';
@@ -44,7 +69,8 @@ console.log(res.finishReason, await fs.readFile('/src/x.ts'));
44
69
  - **`Edit`** — exact unique-substring replace, with a read-before-edit staleness guard.
45
70
  - **`Grep`/`Glob`/`Write`/`MultiEdit`** — structured, typed results straight from the VFS (no `bash` parsing). The selectable tool set the self-evolution loop mutates over.
46
71
  - **`TodoWrite`** — a planning scratchpad; **`Task`** — spawn a depth-limited child agent over the VFS (`subagents: true`); **`SlashCommand`** — reusable prompt templates from `<dir>/*.md` (`commandsDir`); plus a real **MCP client** (`src/mcp.client.ts`, node-only — stdio/HTTP JSON-RPC handshake + discovery) that feeds the edge-safe **MCP adapter** (`mcpToolsToAgentTools`), so any MCP server's tools become agent tools.
47
- - **`WebFetch`/`WebSearch`** — opt-in network tools (not in the default set): fetch a URL as readable text, or search via a configured provider (`TAVILY_API_KEY`). Enable per project with `tools: [...,'WebFetch']` in config. Factory-built with an injectable `fetch`, so they stay edge-portable and testable.
72
+ - **`WebFetch`/`WebSearch`** — fetch a URL as readable text, or search the web. **Keyless by default** (WebSearch uses DuckDuckGo; auto-upgrades to Tavily when `TAVILY_API_KEY` is set) and **auto-enabled in the CLI**. Factory-built with an injectable `fetch`, so they stay edge-portable and testable. (In the library they're opt-in by name: `tools: [...,'WebSearch']`.)
73
+ - **Oversized-output pagination** — any tool result over a byte ceiling (`maxToolResultBytes`, default 60k) is cropped to page 1 with a marker (refine the query / read further), so one big `Grep`/`Read`/MCP/web result can't blow the context window. In the CLI (**on by default**; `--no-scratch` to disable) the full output instead spills **losslessly** to a **scratch** file and the model recovers specifics via `Grep`/`Read` or **`Ask`** — a cheap, context-isolated peek that returns just the answer (the raw blob never re-enters context).
48
74
 
49
75
  ## Agentic subsystems
50
76
 
@@ -102,14 +128,24 @@ Full design + threat model + results: [`mind/08-self-evolve.md`](./mind/08-self-
102
128
 
103
129
  ## Status
104
130
 
105
- **v1 (done):** loop + hybrid tools + Mem/Disk backends + deterministic `FakeAIClient` tests + real-model run. **5/5 pass@1** on the behavioral eval (Sonnet 4.6); the head-to-head started at correctness parity with Claude Code but ~2× the tool calls (≈28 vs 15) — a gap the **self-evolution loop has now closed autonomously**: it drove its own baseline from 32 → 15 tool-calls (denoised over 3 runs) and ties Claude Code in a fresh head-to-head (15 vs 15). **112 tests green.**
131
+ **v1 (done):** loop + hybrid tools + Mem/Disk backends + deterministic `FakeAIClient` tests + real-model run. **5/5 pass@1** on the behavioral eval (Sonnet 4.6); the head-to-head started at correctness parity with Claude Code but ~2× the tool calls (≈28 vs 15) — a gap the **self-evolution loop has now closed autonomously**: it drove its own baseline from 32 → 15 tool-calls (denoised over 3 runs) and ties Claude Code in a fresh head-to-head (15 vs 15). **820+ tests green.**
106
132
 
107
133
  See [`mind/`](./mind/) for the full vision, architecture, decision journal, roadmap, eval + head-to-head results, the [parity plan](./mind/05-parity.md), and the [self-evolution design](./mind/08-self-evolve.md).
108
134
 
109
- ## Eval & compare
135
+ ## Develop & evaluate
136
+
137
+ Hacking on the runtime itself (from a clone):
138
+
139
+ ```bash
140
+ bun install # links wcli (file:), ai.libx.js + libx.js (bun link)
141
+ bun test # 820+ unit/integration tests (offline via FakeAIClient, no key)
142
+ ANTHROPIC_API_KEY=… bun examples/run-sonnet.ts # drive a real model end-to-end
143
+ ```
144
+
145
+ Eval & head-to-head (real model):
110
146
 
111
147
  ```bash
112
148
  bun eval/run.ts # behavioral scorecard (our agent over MemFilesystem)
113
149
  bun compare/seed-tasks.ts # materialize task specs into .tmp/tasks/
114
- bun compare/run.ts # head-to-head vs Claude Code (needs `claude` CLI)
150
+ bun compare/run.ts # head-to-head vs Claude Code (needs the `claude` CLI)
115
151
  ```
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "agent.libx.js",
3
- "version": "0.93.32",
3
+ "version": "0.93.34",
4
4
  "description": "Edge-native AI agent runtime — drives a virtual filesystem via any LLM (ai.libx.js). Same bytes run in node, browser, or edge.",
5
5
  "type": "module",
6
6
  "main": "./dist/index.js",
@@ -46,7 +46,7 @@
46
46
  "node": ">=18"
47
47
  },
48
48
  "bin": {
49
- "agentx": "cli/cli.ts"
49
+ "agentx": "dist/cli.js"
50
50
  },
51
51
  "files": [
52
52
  "dist",