npm - reasonix - Versions diffs - 0.4.13 → 0.4.14 - Mend

reasonix 0.4.13 → 0.4.14

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (7) hide show

package/README.md CHANGED Viewed

@@ -9,227 +9,388 @@
 **A DeepSeek-native AI coding assistant in your terminal.** Ink TUI. MCP
 first-class. No LangChain.
+---
+## Quick start (60 seconds)
+**1. Get a DeepSeek API key.** Free credit on signup:
+<https://platform.deepseek.com/api_keys>
+**2. Run it.** No install needed.
 ```bash
 npx reasonix
 ```
-One command. First run walks you through a 30-second wizard (API key →
-preset → pick MCP servers from a checklist); every run after that drops
-straight into chat with your tools wired up. Inside the chat, type `/help`.
+First run walks you through a 30-second wizard:
-Why bother with yet another agent framework? Because every abstraction
-here earns its weight against a DeepSeek-specific property — dirt-cheap
-tokens, R1 reasoning traces, automatic prefix caching, JSON mode.
-Generic wrappers treat DeepSeek as "OpenAI with a different base URL"
-and leave these advantages on the table. Reasonix leans into them:
-on the same τ-bench-lite workload,
-[**94.4% cache hit, ~40% cheaper tokens, 100% pass rate**](#validated-numbers)
-vs. a cache-hostile baseline.
+- paste your API key (saved to `~/.reasonix/config.json`)
+- pick a preset — `fast` (cheap chat, default), `smart` (+R1 reasoning), `max` (+self-consistency branching)
+- multi-select MCP servers from a catalog (filesystem, memory, github, puppeteer, …)
----
+Every run after that drops you straight into chat.
-## What you get
+**3. Inside the chat.** Type anything and hit Enter. Type `/help` to see
+every command. The status bar at the top shows cache hit %, cost so far,
+balance, and context usage. Press `Esc` to cancel whatever is running.
-| Feature | How it works | Opt in |
-|---|---|---|
-| **Setup wizard** | First run of `npx reasonix`: pick preset, multi-select MCP servers from a curated catalog, saved to config so the next run just launches chat | always on (first run) |
-| **MCP (stdio + SSE)** | Multi-server bridge — every MCP tool inherits Cache-First + repair + context-safety automatically. `reasonix mcp list` shows the catalog | always on |
-| **Cache-First Loop** | Immutable prefix + append-only log = prefix byte-stable across turns → DeepSeek's automatic prefix cache hits at 70–95% | always on |
-| **Context safety net** | Tool results capped at 32k chars · oversized sessions auto-heal on load · `/compact` to shrink further · ctx gauge in the status bar · Esc to abort exploration and get a forced summary | always on |
-| **R1 Thought Harvesting** | Parses `reasoning_content` into typed `{ subgoals, hypotheses, uncertainties, rejectedPaths }` via a cheap V3 call | `/preset smart` |
-| **Self-Consistency Branching** | Runs N parallel samples at spread temperatures; picks the one with the fewest flagged uncertainties | `/preset max` / `/branch N` |
-| **Tool-Call Repair** | Auto-flattens deep/wide schemas, scavenges tool calls leaked into `<think>`, repairs truncated JSON, breaks call-storms | always on |
-| **Retry layer** | Exponential backoff + jitter on 408/429/500/502/503/504 and network errors. 4xx auth errors don't retry | always on |
-| **Ink TUI** | Live cache-hit / cost / context panel. Streams R1 thinking to a compact preview. Renders Markdown (bold / lists / code / stripped LaTeX) | always on |
+```
+reasonix › explain what this project does
+assistant
+  …streams R1 reasoning into a dim preview, then writes the answer…
+status bar: cache hit 92% · cost $0.001 · ctx 8k/131k (6%) · balance 12.34 CNY
+```
+Requires Node ≥ 18. Works on macOS, Linux, Windows (Git Bash + PowerShell).
 ---
-## Why not just use LangChain?
+## Using `reasonix code` — your terminal pair programmer
-Even on the default `fast` preset (no harvest, no branching), Reasonix bakes
-in five DeepSeek-specific defences that generic agent frameworks leave to you:
+Scoped to the directory you launch from. The model has native
+`read_file` / `write_file` / `edit_file` / `list_directory` /
+`search_files` / `directory_tree` / `get_file_info` /
+`create_directory` / `move_file` tools, all sandboxed — any path that
+resolves outside the launch root (including `..` and symlink escapes)
+is refused.
-| | Reasonix default | generic frameworks |
-|---|---|---|
-| Prefix-stable loop (→ 85–95% cache hit) | ✅ | ❌ prompts rebuilt each turn |
-| Auto-flatten deep tool schemas | ✅ | ❌ DeepSeek drops args |
-| Retry with jittered backoff (429/503) | ✅ | ❌ custom callbacks |
-| Scavenge tool calls leaked into `<think>` | ✅ | ❌ |
-| Call-storm breaker on identical-arg repeats | ✅ | ❌ |
-| Live cache-hit / cost / vs-Claude panel | ✅ | ❌ |
-| First-run config prompt + Markdown TUI | ✅ | ❌ |
+```bash
+cd my-project
+npx reasonix code
+```
-Harvest and self-consistency branching are bonuses on top. The everyday
-win is that **a plain chat with Reasonix already pays for ~40% less tokens
-than the same chat through a naive LangChain setup**, because the prefix
-actually stays byte-stable.
+### Walkthrough: fix a bug
-## Validated numbers
+```
+reasonix code › users.ts 里 findByEmail 对大小写敏感导致登录失败，帮我改成大小写不敏感
-**τ-bench-lite** — 8 multi-turn tool-use tasks × 3 repeats = 48 runs per
-side. Same tools / same prompt / same client on both sides, sole variable
-is prefix stability. Measured on live DeepSeek `deepseek-chat`:
+assistant
+  ▸ R1 reasoning · 1243 chars of thought
+  ▸ tool<search_files> → src/users.ts, src/users.test.ts
+  ▸ tool<read_file>    → (src/users.ts, 412 chars)
+  ▸ tool<read_file>    → (src/users.test.ts, 820 chars)
+  ▸ 找到了。findByEmail 直接用 === 比对。我把查询和存储都规范化为小写，并补上一条测试。
-| metric | baseline (cache-hostile) | Reasonix | delta |
-|---|---:|---:|---:|
-| runs | 24 | 24 | — |
-| **cache hit** | 46.6% | **94.4%** | **+47.7pp** |
-| cost / task | $0.002599 | $0.001579 | **−39% (×0.61)** |
-| vs Claude Sonnet 4.6 (token-count estimate) | — | — | **~96% cheaper** |
-| pass rate | 96% (23/24) | **100% (24/24)** | Reasonix held the guardrail on every run |
+src/users.ts
+<<<<<<< SEARCH
+  return users.find(u => u.email === email);
+=======
+  const needle = email.toLowerCase();
+  return users.find(u => u.email.toLowerCase() === needle);
+>>>>>>> REPLACE
-**Verify it yourself — no API key, zero cost:**
+src/users.test.ts
+<<<<<<< SEARCH
+  it("finds by email", () => {
+    expect(findByEmail("a@b.com")).toBeDefined();
+  });
+=======
+  it("finds by email", () => {
+    expect(findByEmail("a@b.com")).toBeDefined();
+  });
-```bash
-git clone https://github.com/esengine/reasonix.git && cd reasonix && npm install
-npx reasonix replay benchmarks/tau-bench/transcripts/t01_address_happy.reasonix.r1.jsonl
-npx reasonix diff \
-  benchmarks/tau-bench/transcripts/t01_address_happy.baseline.r1.jsonl \
-  benchmarks/tau-bench/transcripts/t01_address_happy.reasonix.r1.jsonl
-```
+  it("is case-insensitive", () => {
+    expect(findByEmail("A@B.COM")).toBeDefined();
+  });
+>>>>>>> REPLACE
-The JSONL transcripts committed in `benchmarks/tau-bench/transcripts/`
-carry per-turn `usage`, `cost`, and `prefixHash`. Reasonix's prefix hash
-stays byte-stable across every model call; baseline's prefix churns on
-every turn. The cache delta is *mechanically* attributable to log
-stability, not to a different system prompt.
+▸ 2 pending edits across 2 files — /apply to write · /discard to drop
+```
-Full 48-run report: [`benchmarks/tau-bench/report.md`][r]. Reproduce
-with your own API key: `npx tsx benchmarks/tau-bench/runner.ts --repeats 3`.
+**At this point nothing has been written to disk.** You have three
+options:
-[r]: ./benchmarks/tau-bench/report.md
+- `/apply` — commit the pending blocks. You'll see `✓ applied
+  src/users.ts` and `✓ applied src/users.test.ts`. A snapshot of the
+  pre-edit file is kept so `/undo` can roll back.
+- `/discard` — throw the blocks away without writing.
+- Keep chatting — ask for adjustments. Say "also cover the empty
+  string case" and the model proposes another block set.
-### MCP — works out of the box
+After applying:
-Any [MCP](https://spec.modelcontextprotocol.io/) server's tools inherit
-Cache-First + repair + context-safety automatically. The wizard (`npx
-reasonix`) lets you multi-select from a curated catalog — no flags, no
-JSON-by-hand. Three live reference runs:
+```
+reasonix code › /commit "fix: findByEmail case-insensitive"
+▸ git add -A && git commit -m "fix: findByEmail case-insensitive"
+  [main a1b2c3d] fix: findByEmail case-insensitive
+```
-| server | turns | tool calls | cache hit | cost | vs Claude |
-|---|---:|---:|---:|---:|---:|
-| bundled demo (`add` / `echo` / `get_time`) | 2 | 1 | **96.6%** (turn 2) | $0.000254 | −94.0% |
-| official `@modelcontextprotocol/server-filesystem` | 5 | 4 | **96.7%** overall | $0.001235 | −97.0% |
-| **both concurrently** (`demo_add` + `fs_write_file`) | 5 | 4 | **81.1%** | $0.001852 | −95.9% |
+`/commit` runs `git add -A && git commit -m ...` from the sandbox root.
-The third row is the ecosystem proof: two MCP servers running as
-separate subprocesses, tools from both exercised in one conversation.
-**One single prefix hash across all 5 turns** — byte-stability survives
-concurrent MCP subprocesses.
+### Walkthrough: explore before editing
-Reproduce without an API key (replay the committed transcripts):
+For "what does this code do?" questions the model uses the read-side
+tools and replies in prose — no SEARCH/REPLACE blocks, no file writes.
+Ask to change something only when you mean it:
-```bash
-npx reasonix replay benchmarks/tau-bench/transcripts/mcp-demo.add.jsonl
-npx reasonix replay benchmarks/tau-bench/transcripts/mcp-filesystem.jsonl
 ```
+reasonix code › 这个项目的路由是怎么组织的？
+assistant
+  ▸ tool<directory_tree> → (src/ tree, 47 entries)
+  ▸ tool<read_file> → (src/router.ts, 1.2 KB)
+  ▸ 路由分三层：顶层 AppRouter 注册 tab，每个 tab 用 React Router 的
+    nested routes 写子路径，最后 …
+```
+If the SEARCH text doesn't match the file byte-for-byte, `edit_file`
+refuses the edit loudly rather than fuzzy-matching. The model sees the
+error and retries with the correct search text — silent wrong edits are
+worse than visible rejections.
-Supported transports: **stdio** (local `npx` or binary) and **HTTP+SSE**
-(remote / hosted servers, MCP 2024-11-05 spec). Pass an `http(s)://`
-URL to `--mcp` and Reasonix opens the SSE stream and POSTs JSON-RPC
-to the endpoint the server advertises.
+### Things to try
-[mcp]: ./benchmarks/tau-bench/transcripts/mcp-demo.add.jsonl
+- `/tool 1` — dump the last tool call's full output (when the 400-char
+  inline clip isn't enough).
+- `/think` — see the model's full R1 reasoning for the last turn
+  (reasoner preset only).
+- `/undo` — roll back the last applied edit batch.
+- `/new` — start fresh in the same directory without losing the
+  session file.
+- Drop `--no-session` for an ephemeral session that doesn't persist.
+```bash
+npx reasonix code src/           # narrower sandbox (only src/ is writable)
+npx reasonix code --no-session   # ephemeral — nothing saved to disk
+npx reasonix code --preset max   # R1 reasoning + 3-way self-consistency
+```
 ---
-## Usage
+## Using `reasonix` — general chat
-### One command
+Same TUI, no filesystem tools unless you opt in via MCP. Good for
+drafting, Q&A, schema design, architecture discussions, or driving
+your own MCP servers. Sessions persist per name under
+`~/.reasonix/sessions/`.
 ```bash
-npx reasonix
+npx reasonix                             # uses saved config + wizard-selected MCP
+npx reasonix --preset smart              # one-shot override
+npx reasonix --session design            # named session
+npx reasonix --session design            # resume it later — history intact
+```
+### Walkthrough: a multi-turn session with R1 reasoning
 ```
+reasonix › /preset smart
+▸ switched to smart · model deepseek-reasoner · harvest on · branch off
+reasonix › 我要给一个 Flutter 应用设计限时折扣的弹窗展示规则。目标：
+      每天首次打开时弹一次，连续弹 3 天后休眠 7 天。怎么实现？
+assistant
+  ▸ R1 reasoning · 2410 chars of thought
+  ‹ subgoals (3): 持久化展示计数 · 判断是否过了 24h · 休眠窗口判断
+  ‹ hypotheses (2): SharedPreferences 存计数 · lastShownAt 时间戳
+  ‹ uncertainties (1): 用户换设备后重置的策略
+  建议数据模型：
+    lastShownAt: DateTime
+    consecutiveShows: int (0..3)
+    sleepUntil: DateTime?
+  …
+```
+`/think` dumps the full R1 thought trace; `/status` shows the current
+model / flags / context use; `/retry` re-samples the same prompt with
+a fresh random seed (useful when the first answer missed something).
-First run: a wizard asks for your API key, lets you pick a preset
-(fast / smart / max), then offers a multi-select checklist of MCP
-servers — filesystem, memory, github, puppeteer, everything. Everything
-is saved to `~/.reasonix/config.json`. Subsequent runs drop straight
-into chat.
+### Walkthrough: attach MCP tools on the fly
-### Inside the chat
+```bash
+# Attach the official filesystem server sandboxed to /tmp/scratch,
+# plus a remote knowledge-base over SSE.
+npx reasonix \
+  --mcp "fs=npx -y @modelcontextprotocol/server-filesystem /tmp/scratch" \
+  --mcp "kb=https://mcp.example.com/sse"
+```
-A status bar at the top shows cache hit %, cost, Claude-equivalent, and
-the **context gauge** (`ctx 42k/131k (32%)` — yellow at 50%, red + a
-`/compact` nudge at 80%). A command strip under the input lists the
-slash commands:
+Inside the chat:
 ```
-/help                   full list + hints
-/preset <fast|smart|max> one-tap bundles (model + harvest + branch)
-/mcp                    list attached MCP servers and tools
-/compact [cap]          shrink oversized tool results in history
-/sessions · /forget     list / delete saved sessions
-/setup                  reconfigure (exits and tells you to run `reasonix setup`)
-/clear · /exit
+reasonix › /mcp
+▸ fs (stdio, 11 tools)   fs_read_file · fs_list_directory · fs_write_file · …
+▸ kb (sse,   4 tools)    kb_search · kb_get · kb_list_collections · kb_stat
+reasonix › 在 /tmp/scratch 下把所有 .log 文件里含 "ERROR" 的行收集到 errors.txt
+assistant
+  ▸ tool<fs_search_files> → 4 matches
+  ▸ tool<fs_read_file>    → …
+  ▸ tool<fs_write_file>   → wrote 2.4 KB to errors.txt
+  ▸ 已写入 errors.txt — 共 38 行，分布在 4 个源文件中。
 ```
-**Esc while thinking** — abort the current exploration and force the
-model to summarize what it already found. No more "model ran 24 tool
-calls and gave up" — you get an answer every time.
+MCP tools go through the same Cache-First + repair + context-safety
+plumbing as native tools, including the 32k result cap and live
+progress-notification rendering.
-Sessions live as JSONL under `~/.reasonix/sessions/<name>.jsonl` —
-every message appended atomically, so killing the CLI never loses
-context. Oversized tool results auto-heal on load, so poisoning a
-session with one giant `read_file` doesn't brick your history.
+### When to use `reasonix` vs `reasonix code`
-### Code mode — `npx reasonix code`
+| situation | command |
+|---|---|
+| Editing files in the current project | `reasonix code` |
+| Exploring a project without writing files | `reasonix code` (it only writes on `/apply`) |
+| Design / architecture / research chat | `reasonix` |
+| Driving your own MCP servers | `reasonix --mcp "..."` |
+| One-shot question, no TUI | `reasonix run "..."` |
+| Reproducing a prior session / benchmark | `reasonix replay path.jsonl` |
-A thin opinionated layer on top of chat: filesystem MCP bridged at
-`cwd`, coding system prompt, reasoner preset, per-directory session.
-The model proposes edits as **SEARCH/REPLACE blocks**:
+---
+## Commands inside the session
+| command | what it does |
+|---|---|
+| `/help` | full command reference with hints |
+| `/status` | current model · flags · context · session |
+| `/preset <fast\|smart\|max>` | one-tap bundle (model + harvest + branch) |
+| `/model <id>` | switch DeepSeek model (`deepseek-chat`, `deepseek-reasoner`) |
+| `/harvest [on\|off]` | toggle R1 plan-state extraction |
+| `/branch <N\|off>` | run N parallel samples per turn, pick best (N ≥ 2) |
+| `/mcp` | list attached MCP servers and their tools |
+| `/tool [N]` | dump the Nth tool call's full output (1 = latest) |
+| `/think` | dump the last turn's full R1 reasoning |
+| `/retry` | truncate and resend your last message (fresh sample) |
+| `/compact [cap]` | shrink oversized tool results in the log |
+| `/sessions` | list saved sessions (current marked with `▸`) |
+| `/forget` | delete the current session from disk |
+| `/new` (alias `/reset`) | start a fresh conversation in the same session |
+| `/clear` | clear visible scrollback only (log kept) |
+| `/setup` | reconfigure (exit and run `reasonix setup`) |
+| `/exit` | quit |
+Additional commands in `reasonix code`:
+| command | what it does |
+|---|---|
+| `/apply` | commit the pending SEARCH/REPLACE blocks to disk |
+| `/discard` | drop the pending edit blocks without writing |
+| `/undo` | roll back the last applied edit batch |
+| `/commit "msg"` | `git add -A && git commit -m "msg"` |
+**Keyboard:**
+- `Enter` — submit
+- `Shift+Enter` / `Ctrl+J` — newline (multi-line paste also supported; `\` + Enter as a portable fallback)
+- `↑ / ↓` — walk prompt history while idle; navigate slash-autocomplete matches
+- `Tab` / `Enter` on a `/foo` prefix — accept the highlighted suggestion
+- `Esc` — abort the current turn (stops the API call, cancels any in-flight tool, rejects pending MCP requests)
+- `y` / `n` on confirm prompts — hotkey accept / reject
+---
+## Sessions and safety nets
+- Sessions live as JSONL under `~/.reasonix/sessions/<name>.jsonl` (per
+  directory for `reasonix code`). Every message appended atomically; `Ctrl+C`
+  never loses context.
+- Tool results are capped at 32k chars per call. Oversized sessions
+  self-heal on load (shrinks + rewrites the file).
+- Malformed `assistant.tool_calls` / `tool` pairing is validated on
+  every outgoing API call so a corrupted session can't keep 400ing.
+- Context gauge turns yellow at 50%, red at 80% with a `/compact` nudge.
+  Approaching the 131k window triggers an automatic compaction attempt
+  before falling back to a forced summary.
+- The model's sandbox in `reasonix code` refuses any path that resolves
+  outside the launch directory, including symlink escape and `..` traversal.
+### Troubleshooting: duplicate rows / ghost rendering
+Some Windows terminals (Git Bash / MINTTY / winpty-wrapped shells)
+don't fully implement the ANSI cursor-up escapes Ink uses to repaint
+the live spinner region. Symptom: spinners, streaming previews, or
+tool-result rows print multiple copies into scrollback instead of
+overwriting in place.
+If you hit this, run with plain mode:
+```bash
+REASONIX_UI=plain npx reasonix code
+# or
+REASONIX_UI=plain npx reasonix
 ```
-src/foo.ts
-<<<<<<< SEARCH
-const x = 1;
-=======
-const x = 2;
->>>>>>> REPLACE
-```
-Reasonix parses them out of each turn, applies them to disk, reports
-`✓ applied src/foo.ts` in the TUI. SEARCH must match byte-for-byte;
-we never fuzzy-match (silent wrong edits are worse than loud
-rejections). Run `git diff` to review, `git checkout .` to undo.
+Plain mode suppresses every live/animated row and disables the
+internal tick timer. You lose the streaming preview and spinners
+but gain stable scrollback. Committed events (your prompts, tool
+results, the model's final responses) still render normally via
+Ink's `<Static>` append path.
+Windows Terminal, PowerShell 7 in Windows Terminal, and WezTerm
+don't need this opt-out.
+---
+## MCP — bring your own tools
+Any [MCP](https://spec.modelcontextprotocol.io/) server works. Wizard
+lets you pick from a catalog, or drive it by flag:
 ```bash
-cd my-project
-npx reasonix code           # default: cwd, reasoner, per-dir session
-npx reasonix code src/      # scope the filesystem sandbox tighter
-npx reasonix code --no-session   # ephemeral
+# stdio (local subprocess)
+npx reasonix --mcp "fs=npx -y @modelcontextprotocol/server-filesystem /tmp/safe"
+# multiple servers at once
+npx reasonix \
+  --mcp "fs=npx -y @modelcontextprotocol/server-filesystem /tmp/safe" \
+  --mcp "demo=npx tsx examples/mcp-server-demo.ts"
+# HTTP+SSE (remote / hosted)
+npx reasonix --mcp "kb=https://mcp.example.com/sse"
 ```
-First-run sandbox: because code mode uses the filesystem MCP from
-`@modelcontextprotocol/server-filesystem`, the model can only read
-and write inside the directory you pointed at. It literally can't
-touch files above that root.
+`reasonix mcp list` shows the curated catalog. `reasonix mcp inspect <spec>`
+connects once and dumps the server's tools / resources / prompts without
+starting a chat. Progress notifications from long-running tools (2025-03-26
+spec) render live as a progress bar in the spinner.
+Supported transports: **stdio** (local command) and **HTTP+SSE** (remote,
+MCP 2024-11-05 spec).
+---
-### Advanced — CLI subcommands and flags
+## CLI reference
 ```bash
-npx reasonix setup                       # reconfigure any time
-npx reasonix chat --session work         # a different named session
-npx reasonix chat --no-session           # ephemeral — nothing persisted
+npx reasonix                             # chat (uses saved config)
+npx reasonix code [path]                 # coding mode scoped to path (default: cwd)
+npx reasonix setup                       # reconfigure the wizard
+npx reasonix chat --session work         # named session
+npx reasonix chat --no-session           # ephemeral
 npx reasonix run "ask anything"          # one-shot, streams to stdout
 npx reasonix stats session.jsonl         # summarize a transcript
-npx reasonix replay chat.jsonl           # scrub a transcript + rebuild cost/cache
+npx reasonix replay chat.jsonl           # rebuild cost/cache from a transcript
 npx reasonix diff a.jsonl b.jsonl --md   # compare two transcripts
-npx reasonix mcp list                    # curated MCP server catalog
+npx reasonix mcp list                    # curated MCP catalog
+npx reasonix mcp inspect <spec>          # probe a single MCP server
+npx reasonix sessions                    # list saved sessions
+```
+Common flags:
+```bash
+--preset <fast|smart|max>   # bundle (model + harvest + branch)
+--model <id>                # explicit model id
+--harvest / --no-harvest    # R1 plan-state extraction
+--branch <N>                # self-consistency budget
+--mcp "name=cmd args…"      # attach an MCP server (repeatable)
+--transcript path.jsonl     # write a JSONL transcript on the side
+--session <name>            # named session (default: per-dir for code mode)
+--no-session                # ephemeral
+--no-config                 # ignore ~/.reasonix/config.json (CI-friendly)
 ```
-Power users can still bypass config and drive Reasonix with flags:
+Env vars (win over config):
 ```bash
-npx reasonix chat \
-  --preset max \
-  --mcp "filesystem=npx -y @modelcontextprotocol/server-filesystem /tmp/safe" \
-  --mcp "kb=https://mcp.example.com/sse" \
-  --transcript session.jsonl \
-  --no-config   # ignore ~/.reasonix/config.json (for CI / reproducing issues)
+export DEEPSEEK_API_KEY=sk-...
+export DEEPSEEK_BASE_URL=https://...   # optional alternate endpoint
 ```
-### Library
+---
+## Library usage
 ```ts
 import {
@@ -261,7 +422,7 @@ const loop = new CacheFirstLoop({
     toolSpecs: tools.specs(),
   }),
   harvest: true,
-  branch: 3, // self-consistency budget
+  branch: 3,
 });
 for await (const ev of loop.step("What is 17 + 25?")) {
@@ -270,27 +431,72 @@ for await (const ev of loop.step("What is 17 + 25?")) {
 console.log(loop.stats.summary());
 ```
-### Configuration
+`ChatOptions.seedTools` accepts a pre-built `ToolRegistry` for callers
+who want the `reasonix code` loop wiring without the CLI wrapper.
+See [docs/ARCHITECTURE.md](docs/ARCHITECTURE.md) for internals.
+---
-The wizard handles everything on first run. If you'd rather use env vars
-(CI, shared boxes, etc.):
+## Why Reasonix (not LangChain)
+Every abstraction here earns its weight against a DeepSeek-specific
+property — dirt-cheap tokens, R1 reasoning traces, automatic prefix
+caching, JSON mode. Generic wrappers leave these on the table.
+| | Reasonix default | generic frameworks |
+|---|---|---|
+| Prefix-stable loop (→ 85–95% cache hit) | yes | no (prompts rebuilt each turn) |
+| Auto-flatten deep tool schemas | yes | no (DeepSeek drops args) |
+| Retry with jittered backoff (429/503) | yes | no (custom callbacks) |
+| Scavenge tool calls leaked into `<think>` | yes | no |
+| Call-storm breaker on identical-arg repeats | yes | no |
+| Live cache-hit / cost / vs-Claude panel | yes | no |
+| First-run config prompt + Markdown TUI | yes | no |
+On the same τ-bench-lite workload — 8 multi-turn tool-use tasks × 3
+repeats = 48 runs per side, live DeepSeek `deepseek-chat`, sole variable
+prefix stability:
+| metric | baseline (cache-hostile) | Reasonix | delta |
+|---|---:|---:|---:|
+| cache hit | 46.6% | **94.4%** | +47.7 pp |
+| cost / task | $0.002599 | $0.001579 | **−39%** |
+| pass rate | 96% (23/24) | **100% (24/24)** | — |
+**Verify it yourself — no API key, zero cost:**
 ```bash
-export DEEPSEEK_API_KEY=sk-...        # wins over ~/.reasonix/config.json
-export DEEPSEEK_BASE_URL=https://...  # optional alternate endpoint
+git clone https://github.com/esengine/reasonix.git && cd reasonix && npm install
+npx reasonix replay benchmarks/tau-bench/transcripts/t01_address_happy.reasonix.r1.jsonl
+npx reasonix diff \
+  benchmarks/tau-bench/transcripts/t01_address_happy.baseline.r1.jsonl \
+  benchmarks/tau-bench/transcripts/t01_address_happy.reasonix.r1.jsonl
 ```
-Get a key (free credit on signup): <https://platform.deepseek.com/api_keys>
+The committed JSONL transcripts carry per-turn `usage`, `cost`, and
+`prefixHash`. Reasonix's prefix hash stays byte-stable across every
+model call; baseline's churns on every turn. The cache delta is
+*mechanically* attributable to log stability, not to a different
+system prompt.
+Full 48-run report: [`benchmarks/tau-bench/report.md`](./benchmarks/tau-bench/report.md).
+Reproduce with your own API key: `npx tsx benchmarks/tau-bench/runner.ts --repeats 3`.
-Re-run `npx reasonix setup` any time to add/remove MCP servers or switch
-preset — your existing selections are pre-checked.
+MCP reference runs (one single prefix hash across all 5 turns even
+with two concurrent MCP subprocesses):
+| server | turns | cache hit | cost | vs Claude |
+|---|---:|---:|---:|---:|
+| bundled demo (`add` / `echo` / `get_time`) | 2 | **96.6%** (turn 2) | $0.000254 | −94.0% |
+| official `server-filesystem` | 5 | **96.7%** | $0.001235 | −97.0% |
+| **both concurrently** | 5 | **81.1%** | $0.001852 | −95.9% |
 ---
 ## Non-goals
 - Multi-agent orchestration (use LangGraph).
-- RAG / vector stores (use LlamaIndex or do it yourself).
+- RAG / vector stores (use LlamaIndex).
 - Multi-provider abstraction (use LiteLLM).
 - Web UI / SaaS.
@@ -306,13 +512,11 @@ cd reasonix
 npm install
 npm run dev chat        # run CLI from source via tsx
 npm run build           # tsup to dist/
-npm test                # vitest (279 tests)
+npm test                # vitest (444 tests)
 npm run lint            # biome
 npm run typecheck       # tsc --noEmit
 ```
-See [docs/ARCHITECTURE.md](docs/ARCHITECTURE.md) for internals.
 ---
 ## License