reasonix 0.4.13 → 0.4.15

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -9,227 +9,454 @@
9
9
  **A DeepSeek-native AI coding assistant in your terminal.** Ink TUI. MCP
10
10
  first-class. No LangChain.
11
11
 
12
+ ---
13
+
14
+ ## Quick start (60 seconds)
15
+
16
+ **1. Get a DeepSeek API key.** Free credit on signup:
17
+ <https://platform.deepseek.com/api_keys>
18
+
19
+ **2. Run it.** No install needed.
20
+
12
21
  ```bash
13
22
  npx reasonix
14
23
  ```
15
24
 
16
- One command. First run walks you through a 30-second wizard (API key →
17
- preset → pick MCP servers from a checklist); every run after that drops
18
- straight into chat with your tools wired up. Inside the chat, type `/help`.
25
+ First run walks you through a 30-second wizard:
19
26
 
20
- Why bother with yet another agent framework? Because every abstraction
21
- here earns its weight against a DeepSeek-specific property dirt-cheap
22
- tokens, R1 reasoning traces, automatic prefix caching, JSON mode.
23
- Generic wrappers treat DeepSeek as "OpenAI with a different base URL"
24
- and leave these advantages on the table. Reasonix leans into them:
25
- on the same τ-bench-lite workload,
26
- [**94.4% cache hit, ~40% cheaper tokens, 100% pass rate**](#validated-numbers)
27
- vs. a cache-hostile baseline.
27
+ - paste your API key (saved to `~/.reasonix/config.json`)
28
+ - pick a preset `fast` (cheap chat, default), `smart` (+R1 reasoning), `max` (+self-consistency branching)
29
+ - multi-select MCP servers from a catalog (filesystem, memory, github, puppeteer, …)
28
30
 
29
- ---
31
+ Every run after that drops you straight into chat.
30
32
 
31
- ## What you get
33
+ **3. Inside the chat.** Type anything and hit Enter. Type `/help` to see
34
+ every command. The status bar at the top shows cache hit %, cost so far,
35
+ balance, and context usage. Press `Esc` to cancel whatever is running.
32
36
 
33
- | Feature | How it works | Opt in |
34
- |---|---|---|
35
- | **Setup wizard** | First run of `npx reasonix`: pick preset, multi-select MCP servers from a curated catalog, saved to config so the next run just launches chat | always on (first run) |
36
- | **MCP (stdio + SSE)** | Multi-server bridge every MCP tool inherits Cache-First + repair + context-safety automatically. `reasonix mcp list` shows the catalog | always on |
37
- | **Cache-First Loop** | Immutable prefix + append-only log = prefix byte-stable across turns → DeepSeek's automatic prefix cache hits at 70–95% | always on |
38
- | **Context safety net** | Tool results capped at 32k chars · oversized sessions auto-heal on load · `/compact` to shrink further · ctx gauge in the status bar · Esc to abort exploration and get a forced summary | always on |
39
- | **R1 Thought Harvesting** | Parses `reasoning_content` into typed `{ subgoals, hypotheses, uncertainties, rejectedPaths }` via a cheap V3 call | `/preset smart` |
40
- | **Self-Consistency Branching** | Runs N parallel samples at spread temperatures; picks the one with the fewest flagged uncertainties | `/preset max` / `/branch N` |
41
- | **Tool-Call Repair** | Auto-flattens deep/wide schemas, scavenges tool calls leaked into `<think>`, repairs truncated JSON, breaks call-storms | always on |
42
- | **Retry layer** | Exponential backoff + jitter on 408/429/500/502/503/504 and network errors. 4xx auth errors don't retry | always on |
43
- | **Ink TUI** | Live cache-hit / cost / context panel. Streams R1 thinking to a compact preview. Renders Markdown (bold / lists / code / stripped LaTeX) | always on |
37
+ ```
38
+ reasonix › explain what this project does
39
+ assistant
40
+ …streams R1 reasoning into a dim preview, then writes the answer…
41
+ status bar: cache hit 92% · cost $0.001 · ctx 8k/131k (6%) · balance 12.34 CNY
42
+ ```
43
+
44
+ Requires Node 18. Works on macOS, Linux, Windows (Git Bash + PowerShell).
44
45
 
45
46
  ---
46
47
 
47
- ## Why not just use LangChain?
48
+ ## Using `reasonix code` your terminal pair programmer
48
49
 
49
- Even on the default `fast` preset (no harvest, no branching), Reasonix bakes
50
- in five DeepSeek-specific defences that generic agent frameworks leave to you:
50
+ Scoped to the directory you launch from. The model has native
51
+ `read_file` / `write_file` / `edit_file` / `list_directory` /
52
+ `search_files` / `directory_tree` / `get_file_info` /
53
+ `create_directory` / `move_file` tools, all sandboxed — any path that
54
+ resolves outside the launch root (including `..` and symlink escapes)
55
+ is refused.
51
56
 
52
- | | Reasonix default | generic frameworks |
53
- |---|---|---|
54
- | Prefix-stable loop (→ 85–95% cache hit) | ✅ | ❌ prompts rebuilt each turn |
55
- | Auto-flatten deep tool schemas | ✅ | ❌ DeepSeek drops args |
56
- | Retry with jittered backoff (429/503) | ✅ | ❌ custom callbacks |
57
- | Scavenge tool calls leaked into `<think>` | ✅ | ❌ |
58
- | Call-storm breaker on identical-arg repeats | ✅ | ❌ |
59
- | Live cache-hit / cost / vs-Claude panel | ✅ | ❌ |
60
- | First-run config prompt + Markdown TUI | ✅ | ❌ |
57
+ ```bash
58
+ cd my-project
59
+ npx reasonix code
60
+ ```
61
61
 
62
- Harvest and self-consistency branching are bonuses on top. The everyday
63
- win is that **a plain chat with Reasonix already pays for ~40% less tokens
64
- than the same chat through a naive LangChain setup**, because the prefix
65
- actually stays byte-stable.
62
+ ### Walkthrough: fix a bug
66
63
 
67
- ## Validated numbers
64
+ ```
65
+ reasonix code › users.ts 里 findByEmail 对大小写敏感导致登录失败,帮我改成大小写不敏感
68
66
 
69
- **τ-bench-lite** — 8 multi-turn tool-use tasks × 3 repeats = 48 runs per
70
- side. Same tools / same prompt / same client on both sides, sole variable
71
- is prefix stability. Measured on live DeepSeek `deepseek-chat`:
67
+ assistant
68
+ R1 reasoning · 1243 chars of thought
69
+ tool<search_files> → src/users.ts, src/users.test.ts
70
+ ▸ tool<read_file> → (src/users.ts, 412 chars)
71
+ ▸ tool<read_file> → (src/users.test.ts, 820 chars)
72
+ ▸ 找到了。findByEmail 直接用 === 比对。我把查询和存储都规范化为小写,并补上一条测试。
72
73
 
73
- | metric | baseline (cache-hostile) | Reasonix | delta |
74
- |---|---:|---:|---:|
75
- | runs | 24 | 24 | — |
76
- | **cache hit** | 46.6% | **94.4%** | **+47.7pp** |
77
- | cost / task | $0.002599 | $0.001579 | **−39% (×0.61)** |
78
- | vs Claude Sonnet 4.6 (token-count estimate) | — | — | **~96% cheaper** |
79
- | pass rate | 96% (23/24) | **100% (24/24)** | Reasonix held the guardrail on every run |
74
+ src/users.ts
75
+ <<<<<<< SEARCH
76
+ return users.find(u => u.email === email);
77
+ =======
78
+ const needle = email.toLowerCase();
79
+ return users.find(u => u.email.toLowerCase() === needle);
80
+ >>>>>>> REPLACE
80
81
 
81
- **Verify it yourself — no API key, zero cost:**
82
+ src/users.test.ts
83
+ <<<<<<< SEARCH
84
+ it("finds by email", () => {
85
+ expect(findByEmail("a@b.com")).toBeDefined();
86
+ });
87
+ =======
88
+ it("finds by email", () => {
89
+ expect(findByEmail("a@b.com")).toBeDefined();
90
+ });
91
+
92
+ it("is case-insensitive", () => {
93
+ expect(findByEmail("A@B.COM")).toBeDefined();
94
+ });
95
+ >>>>>>> REPLACE
96
+
97
+ ▸ 2 pending edits across 2 files — /apply to write · /discard to drop
98
+ ```
99
+
100
+ **At this point nothing has been written to disk.** You have three
101
+ options:
102
+
103
+ - `/apply` — commit the pending blocks. You'll see `✓ applied
104
+ src/users.ts` and `✓ applied src/users.test.ts`. A snapshot of the
105
+ pre-edit file is kept so `/undo` can roll back.
106
+ - `/discard` — throw the blocks away without writing.
107
+ - Keep chatting — ask for adjustments. Say "also cover the empty
108
+ string case" and the model proposes another block set.
109
+
110
+ After applying:
111
+
112
+ ```
113
+ reasonix code › /commit "fix: findByEmail case-insensitive"
114
+ ▸ git add -A && git commit -m "fix: findByEmail case-insensitive"
115
+ [main a1b2c3d] fix: findByEmail case-insensitive
116
+ ```
117
+
118
+ `/commit` runs `git add -A && git commit -m ...` from the sandbox root.
119
+
120
+ ### Walkthrough: explore before editing
121
+
122
+ For "what does this code do?" questions the model uses the read-side
123
+ tools and replies in prose — no SEARCH/REPLACE blocks, no file writes.
124
+ Ask to change something only when you mean it:
125
+
126
+ ```
127
+ reasonix code › 这个项目的路由是怎么组织的?
128
+ assistant
129
+ ▸ tool<directory_tree> → (src/ tree, 47 entries)
130
+ ▸ tool<read_file> → (src/router.ts, 1.2 KB)
131
+ ▸ 路由分三层:顶层 AppRouter 注册 tab,每个 tab 用 React Router 的
132
+ nested routes 写子路径,最后 …
133
+ ```
134
+
135
+ If the SEARCH text doesn't match the file byte-for-byte, `edit_file`
136
+ refuses the edit loudly rather than fuzzy-matching. The model sees the
137
+ error and retries with the correct search text — silent wrong edits are
138
+ worse than visible rejections.
139
+
140
+ ### Things to try
141
+
142
+ - `/tool 1` — dump the last tool call's full output (when the 400-char
143
+ inline clip isn't enough).
144
+ - `/think` — see the model's full R1 reasoning for the last turn
145
+ (reasoner preset only).
146
+ - `/undo` — roll back the last applied edit batch.
147
+ - `/new` — start fresh in the same directory without losing the
148
+ session file.
149
+ - Drop `--no-session` for an ephemeral session that doesn't persist.
82
150
 
83
151
  ```bash
84
- git clone https://github.com/esengine/reasonix.git && cd reasonix && npm install
85
- npx reasonix replay benchmarks/tau-bench/transcripts/t01_address_happy.reasonix.r1.jsonl
86
- npx reasonix diff \
87
- benchmarks/tau-bench/transcripts/t01_address_happy.baseline.r1.jsonl \
88
- benchmarks/tau-bench/transcripts/t01_address_happy.reasonix.r1.jsonl
152
+ npx reasonix code src/ # narrower sandbox (only src/ is writable)
153
+ npx reasonix code --no-session # ephemeral — nothing saved to disk
154
+ npx reasonix code --preset max # R1 reasoning + 3-way self-consistency
89
155
  ```
90
156
 
91
- The JSONL transcripts committed in `benchmarks/tau-bench/transcripts/`
92
- carry per-turn `usage`, `cost`, and `prefixHash`. Reasonix's prefix hash
93
- stays byte-stable across every model call; baseline's prefix churns on
94
- every turn. The cache delta is *mechanically* attributable to log
95
- stability, not to a different system prompt.
157
+ ---
96
158
 
97
- Full 48-run report: [`benchmarks/tau-bench/report.md`][r]. Reproduce
98
- with your own API key: `npx tsx benchmarks/tau-bench/runner.ts --repeats 3`.
159
+ ## Using `reasonix` — general chat
99
160
 
100
- [r]: ./benchmarks/tau-bench/report.md
161
+ Same TUI, no filesystem tools unless you opt in via MCP. Good for
162
+ drafting, Q&A, schema design, architecture discussions, or driving
163
+ your own MCP servers. Sessions persist per name under
164
+ `~/.reasonix/sessions/`.
101
165
 
102
- ### MCP — works out of the box
166
+ ```bash
167
+ npx reasonix # uses saved config + wizard-selected MCP
168
+ npx reasonix --preset smart # one-shot override
169
+ npx reasonix --session design # named session
170
+ npx reasonix --session design # resume it later — history intact
171
+ ```
103
172
 
104
- Any [MCP](https://spec.modelcontextprotocol.io/) server's tools inherit
105
- Cache-First + repair + context-safety automatically. The wizard (`npx
106
- reasonix`) lets you multi-select from a curated catalog — no flags, no
107
- JSON-by-hand. Three live reference runs:
173
+ ### Walkthrough: a multi-turn session with R1 reasoning
108
174
 
109
- | server | turns | tool calls | cache hit | cost | vs Claude |
110
- |---|---:|---:|---:|---:|---:|
111
- | bundled demo (`add` / `echo` / `get_time`) | 2 | 1 | **96.6%** (turn 2) | $0.000254 | −94.0% |
112
- | official `@modelcontextprotocol/server-filesystem` | 5 | 4 | **96.7%** overall | $0.001235 | −97.0% |
113
- | **both concurrently** (`demo_add` + `fs_write_file`) | 5 | 4 | **81.1%** | $0.001852 | −95.9% |
175
+ ```
176
+ reasonix › /preset smart
177
+ switched to smart · model deepseek-reasoner · harvest on · branch off
178
+
179
+ reasonix 我要给一个 Flutter 应用设计限时折扣的弹窗展示规则。目标:
180
+ 每天首次打开时弹一次,连续弹 3 天后休眠 7 天。怎么实现?
181
+
182
+ assistant
183
+ ▸ R1 reasoning · 2410 chars of thought
184
+ ‹ subgoals (3): 持久化展示计数 · 判断是否过了 24h · 休眠窗口判断
185
+ ‹ hypotheses (2): SharedPreferences 存计数 · lastShownAt 时间戳
186
+ ‹ uncertainties (1): 用户换设备后重置的策略
187
+
188
+ 建议数据模型:
189
+ lastShownAt: DateTime
190
+ consecutiveShows: int (0..3)
191
+ sleepUntil: DateTime?
192
+
193
+ ```
114
194
 
115
- The third row is the ecosystem proof: two MCP servers running as
116
- separate subprocesses, tools from both exercised in one conversation.
117
- **One single prefix hash across all 5 turns** byte-stability survives
118
- concurrent MCP subprocesses.
195
+ `/think` dumps the full R1 thought trace; `/status` shows the current
196
+ model / flags / context use; `/retry` re-samples the same prompt with
197
+ a fresh random seed (useful when the first answer missed something).
119
198
 
120
- Reproduce without an API key (replay the committed transcripts):
199
+ ### Walkthrough: attach MCP tools on the fly
121
200
 
122
201
  ```bash
123
- npx reasonix replay benchmarks/tau-bench/transcripts/mcp-demo.add.jsonl
124
- npx reasonix replay benchmarks/tau-bench/transcripts/mcp-filesystem.jsonl
202
+ # Attach the official filesystem server sandboxed to /tmp/scratch,
203
+ # plus a remote knowledge-base over SSE.
204
+ npx reasonix \
205
+ --mcp "fs=npx -y @modelcontextprotocol/server-filesystem /tmp/scratch" \
206
+ --mcp "kb=https://mcp.example.com/sse"
125
207
  ```
126
208
 
127
- Supported transports: **stdio** (local `npx` or binary) and **HTTP+SSE**
128
- (remote / hosted servers, MCP 2024-11-05 spec). Pass an `http(s)://`
129
- URL to `--mcp` and Reasonix opens the SSE stream and POSTs JSON-RPC
130
- to the endpoint the server advertises.
209
+ Inside the chat:
210
+
211
+ ```
212
+ reasonix /mcp
213
+ ▸ fs (stdio, 11 tools) fs_read_file · fs_list_directory · fs_write_file · …
214
+ ▸ kb (sse, 4 tools) kb_search · kb_get · kb_list_collections · kb_stat
215
+
216
+ reasonix › 在 /tmp/scratch 下把所有 .log 文件里含 "ERROR" 的行收集到 errors.txt
217
+ assistant
218
+ ▸ tool<fs_search_files> → 4 matches
219
+ ▸ tool<fs_read_file> → …
220
+ ▸ tool<fs_write_file> → wrote 2.4 KB to errors.txt
221
+ ▸ 已写入 errors.txt — 共 38 行,分布在 4 个源文件中。
222
+ ```
223
+
224
+ MCP tools go through the same Cache-First + repair + context-safety
225
+ plumbing as native tools, including the 32k result cap and live
226
+ progress-notification rendering.
227
+
228
+ ### When to use `reasonix` vs `reasonix code`
131
229
 
132
- [mcp]: ./benchmarks/tau-bench/transcripts/mcp-demo.add.jsonl
230
+ | situation | command |
231
+ |---|---|
232
+ | Editing files in the current project | `reasonix code` |
233
+ | Exploring a project without writing files | `reasonix code` (it only writes on `/apply`) |
234
+ | Design / architecture / research chat | `reasonix` |
235
+ | Driving your own MCP servers | `reasonix --mcp "..."` |
236
+ | One-shot question, no TUI | `reasonix run "..."` |
237
+ | Reproducing a prior session / benchmark | `reasonix replay path.jsonl` |
133
238
 
134
239
  ---
135
240
 
136
- ## Usage
241
+ ## Commands inside the session
242
+
243
+ | command | what it does |
244
+ |---|---|
245
+ | `/help` | full command reference with hints |
246
+ | `/status` | current model · flags · context · session |
247
+ | `/preset <fast\|smart\|max>` | one-tap bundle (model + harvest + branch) |
248
+ | `/model <id>` | switch DeepSeek model (`deepseek-chat`, `deepseek-reasoner`) |
249
+ | `/harvest [on\|off]` | toggle R1 plan-state extraction |
250
+ | `/branch <N\|off>` | run N parallel samples per turn, pick best (N ≥ 2) |
251
+ | `/mcp` | list attached MCP servers and their tools |
252
+ | `/tool [N]` | dump the Nth tool call's full output (1 = latest) |
253
+ | `/think` | dump the last turn's full R1 reasoning |
254
+ | `/retry` | truncate and resend your last message (fresh sample) |
255
+ | `/compact [cap]` | shrink oversized tool results in the log |
256
+ | `/sessions` | list saved sessions (current marked with `▸`) |
257
+ | `/forget` | delete the current session from disk |
258
+ | `/new` (alias `/reset`) | start a fresh conversation in the same session |
259
+ | `/clear` | clear visible scrollback only (log kept) |
260
+ | `/setup` | reconfigure (exit and run `reasonix setup`) |
261
+ | `/exit` | quit |
262
+
263
+ Additional commands in `reasonix code`:
264
+
265
+ | command | what it does |
266
+ |---|---|
267
+ | `/apply` | commit the pending SEARCH/REPLACE blocks to disk |
268
+ | `/discard` | drop the pending edit blocks without writing |
269
+ | `/undo` | roll back the last applied edit batch |
270
+ | `/commit "msg"` | `git add -A && git commit -m "msg"` |
271
+
272
+ **Keyboard:**
273
+
274
+ - `Enter` — submit
275
+ - `Shift+Enter` / `Ctrl+J` — newline (multi-line paste also supported; `\` + Enter as a portable fallback)
276
+ - `↑ / ↓` — walk prompt history while idle; navigate slash-autocomplete matches
277
+ - `Tab` / `Enter` on a `/foo` prefix — accept the highlighted suggestion
278
+ - `Esc` — abort the current turn (stops the API call, cancels any in-flight tool, rejects pending MCP requests)
279
+ - `y` / `n` on confirm prompts — hotkey accept / reject
137
280
 
138
- ### One command
281
+ ---
282
+
283
+ ## Sessions and safety nets
284
+
285
+ - Sessions live as JSONL under `~/.reasonix/sessions/<name>.jsonl` (per
286
+ directory for `reasonix code`). Every message appended atomically; `Ctrl+C`
287
+ never loses context.
288
+ - Tool results are capped at 32k chars per call. Oversized sessions
289
+ self-heal on load (shrinks + rewrites the file).
290
+ - Malformed `assistant.tool_calls` / `tool` pairing is validated on
291
+ every outgoing API call so a corrupted session can't keep 400ing.
292
+ - Context gauge turns yellow at 50%, red at 80% with a `/compact` nudge.
293
+ Approaching the 131k window triggers an automatic compaction attempt
294
+ before falling back to a forced summary.
295
+ - The model's sandbox in `reasonix code` refuses any path that resolves
296
+ outside the launch directory, including symlink escape and `..` traversal.
297
+
298
+ ### Troubleshooting: duplicate rows / ghost rendering
299
+
300
+ Some Windows terminals (Git Bash / MINTTY / winpty-wrapped shells)
301
+ don't fully implement the ANSI cursor-up escapes Ink uses to repaint
302
+ the live spinner region. Symptom: spinners, streaming previews, or
303
+ tool-result rows print multiple copies into scrollback instead of
304
+ overwriting in place.
305
+
306
+ If you hit this, run with plain mode:
139
307
 
140
308
  ```bash
141
- npx reasonix
309
+ REASONIX_UI=plain npx reasonix code
310
+ # or
311
+ REASONIX_UI=plain npx reasonix
142
312
  ```
143
313
 
144
- First run: a wizard asks for your API key, lets you pick a preset
145
- (fast / smart / max), then offers a multi-select checklist of MCP
146
- servers filesystem, memory, github, puppeteer, everything. Everything
147
- is saved to `~/.reasonix/config.json`. Subsequent runs drop straight
148
- into chat.
314
+ Plain mode suppresses every live/animated row and disables the
315
+ internal tick timer. You lose the streaming preview and spinners
316
+ but gain stable scrollback. Committed events (your prompts, tool
317
+ results, the model's final responses) still render normally via
318
+ Ink's `<Static>` append path.
149
319
 
150
- ### Inside the chat
320
+ Windows Terminal, PowerShell 7 in Windows Terminal, and WezTerm
321
+ don't need this opt-out.
151
322
 
152
- A status bar at the top shows cache hit %, cost, Claude-equivalent, and
153
- the **context gauge** (`ctx 42k/131k (32%)` — yellow at 50%, red + a
154
- `/compact` nudge at 80%). A command strip under the input lists the
155
- slash commands:
323
+ ---
324
+
325
+ ## Web search on by default
326
+
327
+ The model has two web tools the moment you launch: `web_search` and
328
+ `web_fetch`. No flag, no API key, no signup. When you ask about
329
+ something the model wasn't trained on (new releases, current events,
330
+ obscure APIs), it decides to call `web_search` on its own; if a
331
+ snippet isn't enough it follows up with `web_fetch`.
156
332
 
157
333
  ```
158
- /help full list + hints
159
- /preset <fast|smart|max> one-tap bundles (model + harvest + branch)
160
- /mcp list attached MCP servers and tools
161
- /compact [cap] shrink oversized tool results in history
162
- /sessions · /forget list / delete saved sessions
163
- /setup reconfigure (exits and tells you to run `reasonix setup`)
164
- /clear · /exit
334
+ you Flutter 3.19 新加了什么?
335
+ assistant
336
+ tool<web_search> query: "Flutter 3.19 new features"
337
+ tool<web_fetch> https://docs.flutter.dev/release/3-19
338
+ 3.19 主要新增了
165
339
  ```
166
340
 
167
- **Esc while thinking** abort the current exploration and force the
168
- model to summarize what it already found. No more "model ran 24 tool
169
- calls and gave up" you get an answer every time.
341
+ Backed by **Mojeek**'s public search page an independent web index,
342
+ no API key, no signup, bot-friendly. Coverage on niche or very recent
343
+ queries can be thinner than Google/Bing, but it's reliable from
344
+ scripts and doesn't gate on cookies or sessions. (DDG was the original
345
+ backend but it started serving anti-bot pages in 2026.)
170
346
 
171
- Sessions live as JSONL under `~/.reasonix/sessions/<name>.jsonl`
172
- every message appended atomically, so killing the CLI never loses
173
- context. Oversized tool results auto-heal on load, so poisoning a
174
- session with one giant `read_file` doesn't brick your history.
347
+ **Turn it off** (offline mode / privacy / CI):
175
348
 
176
- ### Code mode — `npx reasonix code`
349
+ ```json
350
+ // ~/.reasonix/config.json
351
+ { "apiKey": "sk-…", "search": false }
352
+ ```
177
353
 
178
- A thin opinionated layer on top of chat: filesystem MCP bridged at
179
- `cwd`, coding system prompt, reasoner preset, per-directory session.
180
- The model proposes edits as **SEARCH/REPLACE blocks**:
354
+ ```bash
355
+ # Or one env var (wins over config):
356
+ REASONIX_SEARCH=off npx reasonix
357
+ ```
358
+
359
+ **Bring your own provider** (Kagi, SearXNG, Serper, an internal
360
+ cache) — implement the two tools however you want and register them
361
+ manually:
181
362
 
363
+ ```ts
364
+ import { ToolRegistry } from "reasonix";
365
+ // Register your own `web_search` / `web_fetch` on a ToolRegistry,
366
+ // then pass it to CacheFirstLoop (or `reasonix chat --no-config`
367
+ // with seedTools via library API).
182
368
  ```
183
- src/foo.ts
184
- <<<<<<< SEARCH
185
- const x = 1;
186
- =======
187
- const x = 2;
188
- >>>>>>> REPLACE
369
+
370
+ Inside the session:
371
+
372
+ ```
373
+ reasonix Flutter 3.19 引入了什么新的 Navigator API?
374
+ assistant
375
+ ▸ tool<web_search> → query: "Flutter 3.19 new Navigator API"
376
+ answer: Flutter 3.19 introduces the NavigatorObserver changes …
377
+ 1. Flutter 3.19 Release Notes — https://docs.flutter.dev/…
378
+ 2. What's new in Flutter 3.19 — https://medium.com/…
379
+ ▸ tool<web_fetch> → https://docs.flutter.dev/release/release-notes/3-19-0
380
+ (full page text, clipped at 32k)
381
+ ▸ 3.19 新增了 …
189
382
  ```
190
383
 
191
- Reasonix parses them out of each turn, applies them to disk, reports
192
- `✓ applied src/foo.ts` in the TUI. SEARCH must match byte-for-byte;
193
- we never fuzzy-match (silent wrong edits are worse than loud
194
- rejections). Run `git diff` to review, `git checkout .` to undo.
384
+ For advanced / self-hosted search (Kagi, SearXNG, internal caches)
385
+ implement the `WebSearchProvider` interface and call
386
+ `registerWebTools(registry, { provider })` yourself, or bridge an
387
+ existing MCP search server via `--mcp`.
388
+
389
+ ---
390
+
391
+ ## MCP — bring your own tools
392
+
393
+ Any [MCP](https://spec.modelcontextprotocol.io/) server works. Wizard
394
+ lets you pick from a catalog, or drive it by flag:
195
395
 
196
396
  ```bash
197
- cd my-project
198
- npx reasonix code # default: cwd, reasoner, per-dir session
199
- npx reasonix code src/ # scope the filesystem sandbox tighter
200
- npx reasonix code --no-session # ephemeral
397
+ # stdio (local subprocess)
398
+ npx reasonix --mcp "fs=npx -y @modelcontextprotocol/server-filesystem /tmp/safe"
399
+
400
+ # multiple servers at once
401
+ npx reasonix \
402
+ --mcp "fs=npx -y @modelcontextprotocol/server-filesystem /tmp/safe" \
403
+ --mcp "demo=npx tsx examples/mcp-server-demo.ts"
404
+
405
+ # HTTP+SSE (remote / hosted)
406
+ npx reasonix --mcp "kb=https://mcp.example.com/sse"
201
407
  ```
202
408
 
203
- First-run sandbox: because code mode uses the filesystem MCP from
204
- `@modelcontextprotocol/server-filesystem`, the model can only read
205
- and write inside the directory you pointed at. It literally can't
206
- touch files above that root.
409
+ `reasonix mcp list` shows the curated catalog. `reasonix mcp inspect <spec>`
410
+ connects once and dumps the server's tools / resources / prompts without
411
+ starting a chat. Progress notifications from long-running tools (2025-03-26
412
+ spec) render live as a progress bar in the spinner.
413
+
414
+ Supported transports: **stdio** (local command) and **HTTP+SSE** (remote,
415
+ MCP 2024-11-05 spec).
207
416
 
208
- ### Advanced — CLI subcommands and flags
417
+ ---
418
+
419
+ ## CLI reference
209
420
 
210
421
  ```bash
211
- npx reasonix setup # reconfigure any time
212
- npx reasonix chat --session work # a different named session
213
- npx reasonix chat --no-session # ephemeral nothing persisted
422
+ npx reasonix # chat (uses saved config)
423
+ npx reasonix code [path] # coding mode scoped to path (default: cwd)
424
+ npx reasonix setup # reconfigure the wizard
425
+ npx reasonix chat --session work # named session
426
+ npx reasonix chat --no-session # ephemeral
214
427
  npx reasonix run "ask anything" # one-shot, streams to stdout
215
428
  npx reasonix stats session.jsonl # summarize a transcript
216
- npx reasonix replay chat.jsonl # scrub a transcript + rebuild cost/cache
429
+ npx reasonix replay chat.jsonl # rebuild cost/cache from a transcript
217
430
  npx reasonix diff a.jsonl b.jsonl --md # compare two transcripts
218
- npx reasonix mcp list # curated MCP server catalog
431
+ npx reasonix mcp list # curated MCP catalog
432
+ npx reasonix mcp inspect <spec> # probe a single MCP server
433
+ npx reasonix sessions # list saved sessions
434
+ ```
435
+
436
+ Common flags:
437
+
438
+ ```bash
439
+ --preset <fast|smart|max> # bundle (model + harvest + branch)
440
+ --model <id> # explicit model id
441
+ --harvest / --no-harvest # R1 plan-state extraction
442
+ --branch <N> # self-consistency budget
443
+ --mcp "name=cmd args…" # attach an MCP server (repeatable)
444
+ --transcript path.jsonl # write a JSONL transcript on the side
445
+ --session <name> # named session (default: per-dir for code mode)
446
+ --no-session # ephemeral
447
+ --no-config # ignore ~/.reasonix/config.json (CI-friendly)
219
448
  ```
220
449
 
221
- Power users can still bypass config and drive Reasonix with flags:
450
+ Env vars (win over config):
222
451
 
223
452
  ```bash
224
- npx reasonix chat \
225
- --preset max \
226
- --mcp "filesystem=npx -y @modelcontextprotocol/server-filesystem /tmp/safe" \
227
- --mcp "kb=https://mcp.example.com/sse" \
228
- --transcript session.jsonl \
229
- --no-config # ignore ~/.reasonix/config.json (for CI / reproducing issues)
453
+ export DEEPSEEK_API_KEY=sk-...
454
+ export DEEPSEEK_BASE_URL=https://... # optional alternate endpoint
230
455
  ```
231
456
 
232
- ### Library
457
+ ---
458
+
459
+ ## Library usage
233
460
 
234
461
  ```ts
235
462
  import {
@@ -261,7 +488,7 @@ const loop = new CacheFirstLoop({
261
488
  toolSpecs: tools.specs(),
262
489
  }),
263
490
  harvest: true,
264
- branch: 3, // self-consistency budget
491
+ branch: 3,
265
492
  });
266
493
 
267
494
  for await (const ev of loop.step("What is 17 + 25?")) {
@@ -270,27 +497,72 @@ for await (const ev of loop.step("What is 17 + 25?")) {
270
497
  console.log(loop.stats.summary());
271
498
  ```
272
499
 
273
- ### Configuration
500
+ `ChatOptions.seedTools` accepts a pre-built `ToolRegistry` for callers
501
+ who want the `reasonix code` loop wiring without the CLI wrapper.
502
+ See [docs/ARCHITECTURE.md](docs/ARCHITECTURE.md) for internals.
503
+
504
+ ---
505
+
506
+ ## Why Reasonix (not LangChain)
274
507
 
275
- The wizard handles everything on first run. If you'd rather use env vars
276
- (CI, shared boxes, etc.):
508
+ Every abstraction here earns its weight against a DeepSeek-specific
509
+ property — dirt-cheap tokens, R1 reasoning traces, automatic prefix
510
+ caching, JSON mode. Generic wrappers leave these on the table.
511
+
512
+ | | Reasonix default | generic frameworks |
513
+ |---|---|---|
514
+ | Prefix-stable loop (→ 85–95% cache hit) | yes | no (prompts rebuilt each turn) |
515
+ | Auto-flatten deep tool schemas | yes | no (DeepSeek drops args) |
516
+ | Retry with jittered backoff (429/503) | yes | no (custom callbacks) |
517
+ | Scavenge tool calls leaked into `<think>` | yes | no |
518
+ | Call-storm breaker on identical-arg repeats | yes | no |
519
+ | Live cache-hit / cost / vs-Claude panel | yes | no |
520
+ | First-run config prompt + Markdown TUI | yes | no |
521
+
522
+ On the same τ-bench-lite workload — 8 multi-turn tool-use tasks × 3
523
+ repeats = 48 runs per side, live DeepSeek `deepseek-chat`, sole variable
524
+ prefix stability:
525
+
526
+ | metric | baseline (cache-hostile) | Reasonix | delta |
527
+ |---|---:|---:|---:|
528
+ | cache hit | 46.6% | **94.4%** | +47.7 pp |
529
+ | cost / task | $0.002599 | $0.001579 | **−39%** |
530
+ | pass rate | 96% (23/24) | **100% (24/24)** | — |
531
+
532
+ **Verify it yourself — no API key, zero cost:**
277
533
 
278
534
  ```bash
279
- export DEEPSEEK_API_KEY=sk-... # wins over ~/.reasonix/config.json
280
- export DEEPSEEK_BASE_URL=https://... # optional alternate endpoint
535
+ git clone https://github.com/esengine/reasonix.git && cd reasonix && npm install
536
+ npx reasonix replay benchmarks/tau-bench/transcripts/t01_address_happy.reasonix.r1.jsonl
537
+ npx reasonix diff \
538
+ benchmarks/tau-bench/transcripts/t01_address_happy.baseline.r1.jsonl \
539
+ benchmarks/tau-bench/transcripts/t01_address_happy.reasonix.r1.jsonl
281
540
  ```
282
541
 
283
- Get a key (free credit on signup): <https://platform.deepseek.com/api_keys>
542
+ The committed JSONL transcripts carry per-turn `usage`, `cost`, and
543
+ `prefixHash`. Reasonix's prefix hash stays byte-stable across every
544
+ model call; baseline's churns on every turn. The cache delta is
545
+ *mechanically* attributable to log stability, not to a different
546
+ system prompt.
547
+
548
+ Full 48-run report: [`benchmarks/tau-bench/report.md`](./benchmarks/tau-bench/report.md).
549
+ Reproduce with your own API key: `npx tsx benchmarks/tau-bench/runner.ts --repeats 3`.
284
550
 
285
- Re-run `npx reasonix setup` any time to add/remove MCP servers or switch
286
- preset your existing selections are pre-checked.
551
+ MCP reference runs (one single prefix hash across all 5 turns even
552
+ with two concurrent MCP subprocesses):
553
+
554
+ | server | turns | cache hit | cost | vs Claude |
555
+ |---|---:|---:|---:|---:|
556
+ | bundled demo (`add` / `echo` / `get_time`) | 2 | **96.6%** (turn 2) | $0.000254 | −94.0% |
557
+ | official `server-filesystem` | 5 | **96.7%** | $0.001235 | −97.0% |
558
+ | **both concurrently** | 5 | **81.1%** | $0.001852 | −95.9% |
287
559
 
288
560
  ---
289
561
 
290
562
  ## Non-goals
291
563
 
292
564
  - Multi-agent orchestration (use LangGraph).
293
- - RAG / vector stores (use LlamaIndex or do it yourself).
565
+ - RAG / vector stores (use LlamaIndex).
294
566
  - Multi-provider abstraction (use LiteLLM).
295
567
  - Web UI / SaaS.
296
568
 
@@ -306,13 +578,11 @@ cd reasonix
306
578
  npm install
307
579
  npm run dev chat # run CLI from source via tsx
308
580
  npm run build # tsup to dist/
309
- npm test # vitest (279 tests)
581
+ npm test # vitest (444 tests)
310
582
  npm run lint # biome
311
583
  npm run typecheck # tsc --noEmit
312
584
  ```
313
585
 
314
- See [docs/ARCHITECTURE.md](docs/ARCHITECTURE.md) for internals.
315
-
316
586
  ---
317
587
 
318
588
  ## License