reasonix 0.4.12 → 0.4.14

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -9,227 +9,388 @@
9
9
  **A DeepSeek-native AI coding assistant in your terminal.** Ink TUI. MCP
10
10
  first-class. No LangChain.
11
11
 
12
+ ---
13
+
14
+ ## Quick start (60 seconds)
15
+
16
+ **1. Get a DeepSeek API key.** Free credit on signup:
17
+ <https://platform.deepseek.com/api_keys>
18
+
19
+ **2. Run it.** No install needed.
20
+
12
21
  ```bash
13
22
  npx reasonix
14
23
  ```
15
24
 
16
- One command. First run walks you through a 30-second wizard (API key →
17
- preset → pick MCP servers from a checklist); every run after that drops
18
- straight into chat with your tools wired up. Inside the chat, type `/help`.
25
+ First run walks you through a 30-second wizard:
19
26
 
20
- Why bother with yet another agent framework? Because every abstraction
21
- here earns its weight against a DeepSeek-specific property dirt-cheap
22
- tokens, R1 reasoning traces, automatic prefix caching, JSON mode.
23
- Generic wrappers treat DeepSeek as "OpenAI with a different base URL"
24
- and leave these advantages on the table. Reasonix leans into them:
25
- on the same τ-bench-lite workload,
26
- [**94.4% cache hit, ~40% cheaper tokens, 100% pass rate**](#validated-numbers)
27
- vs. a cache-hostile baseline.
27
+ - paste your API key (saved to `~/.reasonix/config.json`)
28
+ - pick a preset `fast` (cheap chat, default), `smart` (+R1 reasoning), `max` (+self-consistency branching)
29
+ - multi-select MCP servers from a catalog (filesystem, memory, github, puppeteer, …)
28
30
 
29
- ---
31
+ Every run after that drops you straight into chat.
30
32
 
31
- ## What you get
33
+ **3. Inside the chat.** Type anything and hit Enter. Type `/help` to see
34
+ every command. The status bar at the top shows cache hit %, cost so far,
35
+ balance, and context usage. Press `Esc` to cancel whatever is running.
32
36
 
33
- | Feature | How it works | Opt in |
34
- |---|---|---|
35
- | **Setup wizard** | First run of `npx reasonix`: pick preset, multi-select MCP servers from a curated catalog, saved to config so the next run just launches chat | always on (first run) |
36
- | **MCP (stdio + SSE)** | Multi-server bridge every MCP tool inherits Cache-First + repair + context-safety automatically. `reasonix mcp list` shows the catalog | always on |
37
- | **Cache-First Loop** | Immutable prefix + append-only log = prefix byte-stable across turns → DeepSeek's automatic prefix cache hits at 70–95% | always on |
38
- | **Context safety net** | Tool results capped at 32k chars · oversized sessions auto-heal on load · `/compact` to shrink further · ctx gauge in the status bar · Esc to abort exploration and get a forced summary | always on |
39
- | **R1 Thought Harvesting** | Parses `reasoning_content` into typed `{ subgoals, hypotheses, uncertainties, rejectedPaths }` via a cheap V3 call | `/preset smart` |
40
- | **Self-Consistency Branching** | Runs N parallel samples at spread temperatures; picks the one with the fewest flagged uncertainties | `/preset max` / `/branch N` |
41
- | **Tool-Call Repair** | Auto-flattens deep/wide schemas, scavenges tool calls leaked into `<think>`, repairs truncated JSON, breaks call-storms | always on |
42
- | **Retry layer** | Exponential backoff + jitter on 408/429/500/502/503/504 and network errors. 4xx auth errors don't retry | always on |
43
- | **Ink TUI** | Live cache-hit / cost / context panel. Streams R1 thinking to a compact preview. Renders Markdown (bold / lists / code / stripped LaTeX) | always on |
37
+ ```
38
+ reasonix › explain what this project does
39
+ assistant
40
+ …streams R1 reasoning into a dim preview, then writes the answer…
41
+ status bar: cache hit 92% · cost $0.001 · ctx 8k/131k (6%) · balance 12.34 CNY
42
+ ```
43
+
44
+ Requires Node 18. Works on macOS, Linux, Windows (Git Bash + PowerShell).
44
45
 
45
46
  ---
46
47
 
47
- ## Why not just use LangChain?
48
+ ## Using `reasonix code` your terminal pair programmer
48
49
 
49
- Even on the default `fast` preset (no harvest, no branching), Reasonix bakes
50
- in five DeepSeek-specific defences that generic agent frameworks leave to you:
50
+ Scoped to the directory you launch from. The model has native
51
+ `read_file` / `write_file` / `edit_file` / `list_directory` /
52
+ `search_files` / `directory_tree` / `get_file_info` /
53
+ `create_directory` / `move_file` tools, all sandboxed — any path that
54
+ resolves outside the launch root (including `..` and symlink escapes)
55
+ is refused.
51
56
 
52
- | | Reasonix default | generic frameworks |
53
- |---|---|---|
54
- | Prefix-stable loop (→ 85–95% cache hit) | ✅ | ❌ prompts rebuilt each turn |
55
- | Auto-flatten deep tool schemas | ✅ | ❌ DeepSeek drops args |
56
- | Retry with jittered backoff (429/503) | ✅ | ❌ custom callbacks |
57
- | Scavenge tool calls leaked into `<think>` | ✅ | ❌ |
58
- | Call-storm breaker on identical-arg repeats | ✅ | ❌ |
59
- | Live cache-hit / cost / vs-Claude panel | ✅ | ❌ |
60
- | First-run config prompt + Markdown TUI | ✅ | ❌ |
57
+ ```bash
58
+ cd my-project
59
+ npx reasonix code
60
+ ```
61
61
 
62
- Harvest and self-consistency branching are bonuses on top. The everyday
63
- win is that **a plain chat with Reasonix already pays for ~40% less tokens
64
- than the same chat through a naive LangChain setup**, because the prefix
65
- actually stays byte-stable.
62
+ ### Walkthrough: fix a bug
66
63
 
67
- ## Validated numbers
64
+ ```
65
+ reasonix code › users.ts 里 findByEmail 对大小写敏感导致登录失败,帮我改成大小写不敏感
68
66
 
69
- **τ-bench-lite** — 8 multi-turn tool-use tasks × 3 repeats = 48 runs per
70
- side. Same tools / same prompt / same client on both sides, sole variable
71
- is prefix stability. Measured on live DeepSeek `deepseek-chat`:
67
+ assistant
68
+ R1 reasoning · 1243 chars of thought
69
+ tool<search_files> → src/users.ts, src/users.test.ts
70
+ ▸ tool<read_file> → (src/users.ts, 412 chars)
71
+ ▸ tool<read_file> → (src/users.test.ts, 820 chars)
72
+ ▸ 找到了。findByEmail 直接用 === 比对。我把查询和存储都规范化为小写,并补上一条测试。
72
73
 
73
- | metric | baseline (cache-hostile) | Reasonix | delta |
74
- |---|---:|---:|---:|
75
- | runs | 24 | 24 | — |
76
- | **cache hit** | 46.6% | **94.4%** | **+47.7pp** |
77
- | cost / task | $0.002599 | $0.001579 | **−39% (×0.61)** |
78
- | vs Claude Sonnet 4.6 (token-count estimate) | — | — | **~96% cheaper** |
79
- | pass rate | 96% (23/24) | **100% (24/24)** | Reasonix held the guardrail on every run |
74
+ src/users.ts
75
+ <<<<<<< SEARCH
76
+ return users.find(u => u.email === email);
77
+ =======
78
+ const needle = email.toLowerCase();
79
+ return users.find(u => u.email.toLowerCase() === needle);
80
+ >>>>>>> REPLACE
80
81
 
81
- **Verify it yourself — no API key, zero cost:**
82
+ src/users.test.ts
83
+ <<<<<<< SEARCH
84
+ it("finds by email", () => {
85
+ expect(findByEmail("a@b.com")).toBeDefined();
86
+ });
87
+ =======
88
+ it("finds by email", () => {
89
+ expect(findByEmail("a@b.com")).toBeDefined();
90
+ });
82
91
 
83
- ```bash
84
- git clone https://github.com/esengine/reasonix.git && cd reasonix && npm install
85
- npx reasonix replay benchmarks/tau-bench/transcripts/t01_address_happy.reasonix.r1.jsonl
86
- npx reasonix diff \
87
- benchmarks/tau-bench/transcripts/t01_address_happy.baseline.r1.jsonl \
88
- benchmarks/tau-bench/transcripts/t01_address_happy.reasonix.r1.jsonl
89
- ```
92
+ it("is case-insensitive", () => {
93
+ expect(findByEmail("A@B.COM")).toBeDefined();
94
+ });
95
+ >>>>>>> REPLACE
90
96
 
91
- The JSONL transcripts committed in `benchmarks/tau-bench/transcripts/`
92
- carry per-turn `usage`, `cost`, and `prefixHash`. Reasonix's prefix hash
93
- stays byte-stable across every model call; baseline's prefix churns on
94
- every turn. The cache delta is *mechanically* attributable to log
95
- stability, not to a different system prompt.
97
+ 2 pending edits across 2 files — /apply to write · /discard to drop
98
+ ```
96
99
 
97
- Full 48-run report: [`benchmarks/tau-bench/report.md`][r]. Reproduce
98
- with your own API key: `npx tsx benchmarks/tau-bench/runner.ts --repeats 3`.
100
+ **At this point nothing has been written to disk.** You have three
101
+ options:
99
102
 
100
- [r]: ./benchmarks/tau-bench/report.md
103
+ - `/apply` — commit the pending blocks. You'll see `✓ applied
104
+ src/users.ts` and `✓ applied src/users.test.ts`. A snapshot of the
105
+ pre-edit file is kept so `/undo` can roll back.
106
+ - `/discard` — throw the blocks away without writing.
107
+ - Keep chatting — ask for adjustments. Say "also cover the empty
108
+ string case" and the model proposes another block set.
101
109
 
102
- ### MCP — works out of the box
110
+ After applying:
103
111
 
104
- Any [MCP](https://spec.modelcontextprotocol.io/) server's tools inherit
105
- Cache-First + repair + context-safety automatically. The wizard (`npx
106
- reasonix`) lets you multi-select from a curated catalog no flags, no
107
- JSON-by-hand. Three live reference runs:
112
+ ```
113
+ reasonix code /commit "fix: findByEmail case-insensitive"
114
+ git add -A && git commit -m "fix: findByEmail case-insensitive"
115
+ [main a1b2c3d] fix: findByEmail case-insensitive
116
+ ```
108
117
 
109
- | server | turns | tool calls | cache hit | cost | vs Claude |
110
- |---|---:|---:|---:|---:|---:|
111
- | bundled demo (`add` / `echo` / `get_time`) | 2 | 1 | **96.6%** (turn 2) | $0.000254 | −94.0% |
112
- | official `@modelcontextprotocol/server-filesystem` | 5 | 4 | **96.7%** overall | $0.001235 | −97.0% |
113
- | **both concurrently** (`demo_add` + `fs_write_file`) | 5 | 4 | **81.1%** | $0.001852 | −95.9% |
118
+ `/commit` runs `git add -A && git commit -m ...` from the sandbox root.
114
119
 
115
- The third row is the ecosystem proof: two MCP servers running as
116
- separate subprocesses, tools from both exercised in one conversation.
117
- **One single prefix hash across all 5 turns** — byte-stability survives
118
- concurrent MCP subprocesses.
120
+ ### Walkthrough: explore before editing
119
121
 
120
- Reproduce without an API key (replay the committed transcripts):
122
+ For "what does this code do?" questions the model uses the read-side
123
+ tools and replies in prose — no SEARCH/REPLACE blocks, no file writes.
124
+ Ask to change something only when you mean it:
121
125
 
122
- ```bash
123
- npx reasonix replay benchmarks/tau-bench/transcripts/mcp-demo.add.jsonl
124
- npx reasonix replay benchmarks/tau-bench/transcripts/mcp-filesystem.jsonl
125
126
  ```
127
+ reasonix code › 这个项目的路由是怎么组织的?
128
+ assistant
129
+ ▸ tool<directory_tree> → (src/ tree, 47 entries)
130
+ ▸ tool<read_file> → (src/router.ts, 1.2 KB)
131
+ ▸ 路由分三层:顶层 AppRouter 注册 tab,每个 tab 用 React Router 的
132
+ nested routes 写子路径,最后 …
133
+ ```
134
+
135
+ If the SEARCH text doesn't match the file byte-for-byte, `edit_file`
136
+ refuses the edit loudly rather than fuzzy-matching. The model sees the
137
+ error and retries with the correct search text — silent wrong edits are
138
+ worse than visible rejections.
126
139
 
127
- Supported transports: **stdio** (local `npx` or binary) and **HTTP+SSE**
128
- (remote / hosted servers, MCP 2024-11-05 spec). Pass an `http(s)://`
129
- URL to `--mcp` and Reasonix opens the SSE stream and POSTs JSON-RPC
130
- to the endpoint the server advertises.
140
+ ### Things to try
131
141
 
132
- [mcp]: ./benchmarks/tau-bench/transcripts/mcp-demo.add.jsonl
142
+ - `/tool 1` — dump the last tool call's full output (when the 400-char
143
+ inline clip isn't enough).
144
+ - `/think` — see the model's full R1 reasoning for the last turn
145
+ (reasoner preset only).
146
+ - `/undo` — roll back the last applied edit batch.
147
+ - `/new` — start fresh in the same directory without losing the
148
+ session file.
149
+ - Drop `--no-session` for an ephemeral session that doesn't persist.
150
+
151
+ ```bash
152
+ npx reasonix code src/ # narrower sandbox (only src/ is writable)
153
+ npx reasonix code --no-session # ephemeral — nothing saved to disk
154
+ npx reasonix code --preset max # R1 reasoning + 3-way self-consistency
155
+ ```
133
156
 
134
157
  ---
135
158
 
136
- ## Usage
159
+ ## Using `reasonix` — general chat
137
160
 
138
- ### One command
161
+ Same TUI, no filesystem tools unless you opt in via MCP. Good for
162
+ drafting, Q&A, schema design, architecture discussions, or driving
163
+ your own MCP servers. Sessions persist per name under
164
+ `~/.reasonix/sessions/`.
139
165
 
140
166
  ```bash
141
- npx reasonix
167
+ npx reasonix # uses saved config + wizard-selected MCP
168
+ npx reasonix --preset smart # one-shot override
169
+ npx reasonix --session design # named session
170
+ npx reasonix --session design # resume it later — history intact
171
+ ```
172
+
173
+ ### Walkthrough: a multi-turn session with R1 reasoning
174
+
142
175
  ```
176
+ reasonix › /preset smart
177
+ ▸ switched to smart · model deepseek-reasoner · harvest on · branch off
178
+
179
+ reasonix › 我要给一个 Flutter 应用设计限时折扣的弹窗展示规则。目标:
180
+ 每天首次打开时弹一次,连续弹 3 天后休眠 7 天。怎么实现?
181
+
182
+ assistant
183
+ ▸ R1 reasoning · 2410 chars of thought
184
+ ‹ subgoals (3): 持久化展示计数 · 判断是否过了 24h · 休眠窗口判断
185
+ ‹ hypotheses (2): SharedPreferences 存计数 · lastShownAt 时间戳
186
+ ‹ uncertainties (1): 用户换设备后重置的策略
187
+
188
+ 建议数据模型:
189
+ lastShownAt: DateTime
190
+ consecutiveShows: int (0..3)
191
+ sleepUntil: DateTime?
192
+
193
+ ```
194
+
195
+ `/think` dumps the full R1 thought trace; `/status` shows the current
196
+ model / flags / context use; `/retry` re-samples the same prompt with
197
+ a fresh random seed (useful when the first answer missed something).
143
198
 
144
- First run: a wizard asks for your API key, lets you pick a preset
145
- (fast / smart / max), then offers a multi-select checklist of MCP
146
- servers — filesystem, memory, github, puppeteer, everything. Everything
147
- is saved to `~/.reasonix/config.json`. Subsequent runs drop straight
148
- into chat.
199
+ ### Walkthrough: attach MCP tools on the fly
149
200
 
150
- ### Inside the chat
201
+ ```bash
202
+ # Attach the official filesystem server sandboxed to /tmp/scratch,
203
+ # plus a remote knowledge-base over SSE.
204
+ npx reasonix \
205
+ --mcp "fs=npx -y @modelcontextprotocol/server-filesystem /tmp/scratch" \
206
+ --mcp "kb=https://mcp.example.com/sse"
207
+ ```
151
208
 
152
- A status bar at the top shows cache hit %, cost, Claude-equivalent, and
153
- the **context gauge** (`ctx 42k/131k (32%)` — yellow at 50%, red + a
154
- `/compact` nudge at 80%). A command strip under the input lists the
155
- slash commands:
209
+ Inside the chat:
156
210
 
157
211
  ```
158
- /help full list + hints
159
- /preset <fast|smart|max> one-tap bundles (model + harvest + branch)
160
- /mcp list attached MCP servers and tools
161
- /compact [cap] shrink oversized tool results in history
162
- /sessions · /forget list / delete saved sessions
163
- /setup reconfigure (exits and tells you to run `reasonix setup`)
164
- /clear · /exit
212
+ reasonix /mcp
213
+ fs (stdio, 11 tools) fs_read_file · fs_list_directory · fs_write_file · …
214
+ kb (sse, 4 tools) kb_search · kb_get · kb_list_collections · kb_stat
215
+
216
+ reasonix /tmp/scratch 下把所有 .log 文件里含 "ERROR" 的行收集到 errors.txt
217
+ assistant
218
+ tool<fs_search_files> → 4 matches
219
+ ▸ tool<fs_read_file> → …
220
+ ▸ tool<fs_write_file> → wrote 2.4 KB to errors.txt
221
+ ▸ 已写入 errors.txt — 共 38 行,分布在 4 个源文件中。
165
222
  ```
166
223
 
167
- **Esc while thinking** abort the current exploration and force the
168
- model to summarize what it already found. No more "model ran 24 tool
169
- calls and gave up" — you get an answer every time.
224
+ MCP tools go through the same Cache-First + repair + context-safety
225
+ plumbing as native tools, including the 32k result cap and live
226
+ progress-notification rendering.
170
227
 
171
- Sessions live as JSONL under `~/.reasonix/sessions/<name>.jsonl`
172
- every message appended atomically, so killing the CLI never loses
173
- context. Oversized tool results auto-heal on load, so poisoning a
174
- session with one giant `read_file` doesn't brick your history.
228
+ ### When to use `reasonix` vs `reasonix code`
175
229
 
176
- ### Code mode `npx reasonix code`
230
+ | situation | command |
231
+ |---|---|
232
+ | Editing files in the current project | `reasonix code` |
233
+ | Exploring a project without writing files | `reasonix code` (it only writes on `/apply`) |
234
+ | Design / architecture / research chat | `reasonix` |
235
+ | Driving your own MCP servers | `reasonix --mcp "..."` |
236
+ | One-shot question, no TUI | `reasonix run "..."` |
237
+ | Reproducing a prior session / benchmark | `reasonix replay path.jsonl` |
177
238
 
178
- A thin opinionated layer on top of chat: filesystem MCP bridged at
179
- `cwd`, coding system prompt, reasoner preset, per-directory session.
180
- The model proposes edits as **SEARCH/REPLACE blocks**:
239
+ ---
181
240
 
241
+ ## Commands inside the session
242
+
243
+ | command | what it does |
244
+ |---|---|
245
+ | `/help` | full command reference with hints |
246
+ | `/status` | current model · flags · context · session |
247
+ | `/preset <fast\|smart\|max>` | one-tap bundle (model + harvest + branch) |
248
+ | `/model <id>` | switch DeepSeek model (`deepseek-chat`, `deepseek-reasoner`) |
249
+ | `/harvest [on\|off]` | toggle R1 plan-state extraction |
250
+ | `/branch <N\|off>` | run N parallel samples per turn, pick best (N ≥ 2) |
251
+ | `/mcp` | list attached MCP servers and their tools |
252
+ | `/tool [N]` | dump the Nth tool call's full output (1 = latest) |
253
+ | `/think` | dump the last turn's full R1 reasoning |
254
+ | `/retry` | truncate and resend your last message (fresh sample) |
255
+ | `/compact [cap]` | shrink oversized tool results in the log |
256
+ | `/sessions` | list saved sessions (current marked with `▸`) |
257
+ | `/forget` | delete the current session from disk |
258
+ | `/new` (alias `/reset`) | start a fresh conversation in the same session |
259
+ | `/clear` | clear visible scrollback only (log kept) |
260
+ | `/setup` | reconfigure (exit and run `reasonix setup`) |
261
+ | `/exit` | quit |
262
+
263
+ Additional commands in `reasonix code`:
264
+
265
+ | command | what it does |
266
+ |---|---|
267
+ | `/apply` | commit the pending SEARCH/REPLACE blocks to disk |
268
+ | `/discard` | drop the pending edit blocks without writing |
269
+ | `/undo` | roll back the last applied edit batch |
270
+ | `/commit "msg"` | `git add -A && git commit -m "msg"` |
271
+
272
+ **Keyboard:**
273
+
274
+ - `Enter` — submit
275
+ - `Shift+Enter` / `Ctrl+J` — newline (multi-line paste also supported; `\` + Enter as a portable fallback)
276
+ - `↑ / ↓` — walk prompt history while idle; navigate slash-autocomplete matches
277
+ - `Tab` / `Enter` on a `/foo` prefix — accept the highlighted suggestion
278
+ - `Esc` — abort the current turn (stops the API call, cancels any in-flight tool, rejects pending MCP requests)
279
+ - `y` / `n` on confirm prompts — hotkey accept / reject
280
+
281
+ ---
282
+
283
+ ## Sessions and safety nets
284
+
285
+ - Sessions live as JSONL under `~/.reasonix/sessions/<name>.jsonl` (per
286
+ directory for `reasonix code`). Every message appended atomically; `Ctrl+C`
287
+ never loses context.
288
+ - Tool results are capped at 32k chars per call. Oversized sessions
289
+ self-heal on load (shrinks + rewrites the file).
290
+ - Malformed `assistant.tool_calls` / `tool` pairing is validated on
291
+ every outgoing API call so a corrupted session can't keep 400ing.
292
+ - Context gauge turns yellow at 50%, red at 80% with a `/compact` nudge.
293
+ Approaching the 131k window triggers an automatic compaction attempt
294
+ before falling back to a forced summary.
295
+ - The model's sandbox in `reasonix code` refuses any path that resolves
296
+ outside the launch directory, including symlink escape and `..` traversal.
297
+
298
+ ### Troubleshooting: duplicate rows / ghost rendering
299
+
300
+ Some Windows terminals (Git Bash / MINTTY / winpty-wrapped shells)
301
+ don't fully implement the ANSI cursor-up escapes Ink uses to repaint
302
+ the live spinner region. Symptom: spinners, streaming previews, or
303
+ tool-result rows print multiple copies into scrollback instead of
304
+ overwriting in place.
305
+
306
+ If you hit this, run with plain mode:
307
+
308
+ ```bash
309
+ REASONIX_UI=plain npx reasonix code
310
+ # or
311
+ REASONIX_UI=plain npx reasonix
182
312
  ```
183
- src/foo.ts
184
- <<<<<<< SEARCH
185
- const x = 1;
186
- =======
187
- const x = 2;
188
- >>>>>>> REPLACE
189
- ```
190
313
 
191
- Reasonix parses them out of each turn, applies them to disk, reports
192
- `✓ applied src/foo.ts` in the TUI. SEARCH must match byte-for-byte;
193
- we never fuzzy-match (silent wrong edits are worse than loud
194
- rejections). Run `git diff` to review, `git checkout .` to undo.
314
+ Plain mode suppresses every live/animated row and disables the
315
+ internal tick timer. You lose the streaming preview and spinners
316
+ but gain stable scrollback. Committed events (your prompts, tool
317
+ results, the model's final responses) still render normally via
318
+ Ink's `<Static>` append path.
319
+
320
+ Windows Terminal, PowerShell 7 in Windows Terminal, and WezTerm
321
+ don't need this opt-out.
322
+
323
+ ---
324
+
325
+ ## MCP — bring your own tools
326
+
327
+ Any [MCP](https://spec.modelcontextprotocol.io/) server works. Wizard
328
+ lets you pick from a catalog, or drive it by flag:
195
329
 
196
330
  ```bash
197
- cd my-project
198
- npx reasonix code # default: cwd, reasoner, per-dir session
199
- npx reasonix code src/ # scope the filesystem sandbox tighter
200
- npx reasonix code --no-session # ephemeral
331
+ # stdio (local subprocess)
332
+ npx reasonix --mcp "fs=npx -y @modelcontextprotocol/server-filesystem /tmp/safe"
333
+
334
+ # multiple servers at once
335
+ npx reasonix \
336
+ --mcp "fs=npx -y @modelcontextprotocol/server-filesystem /tmp/safe" \
337
+ --mcp "demo=npx tsx examples/mcp-server-demo.ts"
338
+
339
+ # HTTP+SSE (remote / hosted)
340
+ npx reasonix --mcp "kb=https://mcp.example.com/sse"
201
341
  ```
202
342
 
203
- First-run sandbox: because code mode uses the filesystem MCP from
204
- `@modelcontextprotocol/server-filesystem`, the model can only read
205
- and write inside the directory you pointed at. It literally can't
206
- touch files above that root.
343
+ `reasonix mcp list` shows the curated catalog. `reasonix mcp inspect <spec>`
344
+ connects once and dumps the server's tools / resources / prompts without
345
+ starting a chat. Progress notifications from long-running tools (2025-03-26
346
+ spec) render live as a progress bar in the spinner.
347
+
348
+ Supported transports: **stdio** (local command) and **HTTP+SSE** (remote,
349
+ MCP 2024-11-05 spec).
350
+
351
+ ---
207
352
 
208
- ### Advanced — CLI subcommands and flags
353
+ ## CLI reference
209
354
 
210
355
  ```bash
211
- npx reasonix setup # reconfigure any time
212
- npx reasonix chat --session work # a different named session
213
- npx reasonix chat --no-session # ephemeral nothing persisted
356
+ npx reasonix # chat (uses saved config)
357
+ npx reasonix code [path] # coding mode scoped to path (default: cwd)
358
+ npx reasonix setup # reconfigure the wizard
359
+ npx reasonix chat --session work # named session
360
+ npx reasonix chat --no-session # ephemeral
214
361
  npx reasonix run "ask anything" # one-shot, streams to stdout
215
362
  npx reasonix stats session.jsonl # summarize a transcript
216
- npx reasonix replay chat.jsonl # scrub a transcript + rebuild cost/cache
363
+ npx reasonix replay chat.jsonl # rebuild cost/cache from a transcript
217
364
  npx reasonix diff a.jsonl b.jsonl --md # compare two transcripts
218
- npx reasonix mcp list # curated MCP server catalog
365
+ npx reasonix mcp list # curated MCP catalog
366
+ npx reasonix mcp inspect <spec> # probe a single MCP server
367
+ npx reasonix sessions # list saved sessions
368
+ ```
369
+
370
+ Common flags:
371
+
372
+ ```bash
373
+ --preset <fast|smart|max> # bundle (model + harvest + branch)
374
+ --model <id> # explicit model id
375
+ --harvest / --no-harvest # R1 plan-state extraction
376
+ --branch <N> # self-consistency budget
377
+ --mcp "name=cmd args…" # attach an MCP server (repeatable)
378
+ --transcript path.jsonl # write a JSONL transcript on the side
379
+ --session <name> # named session (default: per-dir for code mode)
380
+ --no-session # ephemeral
381
+ --no-config # ignore ~/.reasonix/config.json (CI-friendly)
219
382
  ```
220
383
 
221
- Power users can still bypass config and drive Reasonix with flags:
384
+ Env vars (win over config):
222
385
 
223
386
  ```bash
224
- npx reasonix chat \
225
- --preset max \
226
- --mcp "filesystem=npx -y @modelcontextprotocol/server-filesystem /tmp/safe" \
227
- --mcp "kb=https://mcp.example.com/sse" \
228
- --transcript session.jsonl \
229
- --no-config # ignore ~/.reasonix/config.json (for CI / reproducing issues)
387
+ export DEEPSEEK_API_KEY=sk-...
388
+ export DEEPSEEK_BASE_URL=https://... # optional alternate endpoint
230
389
  ```
231
390
 
232
- ### Library
391
+ ---
392
+
393
+ ## Library usage
233
394
 
234
395
  ```ts
235
396
  import {
@@ -261,7 +422,7 @@ const loop = new CacheFirstLoop({
261
422
  toolSpecs: tools.specs(),
262
423
  }),
263
424
  harvest: true,
264
- branch: 3, // self-consistency budget
425
+ branch: 3,
265
426
  });
266
427
 
267
428
  for await (const ev of loop.step("What is 17 + 25?")) {
@@ -270,27 +431,72 @@ for await (const ev of loop.step("What is 17 + 25?")) {
270
431
  console.log(loop.stats.summary());
271
432
  ```
272
433
 
273
- ### Configuration
434
+ `ChatOptions.seedTools` accepts a pre-built `ToolRegistry` for callers
435
+ who want the `reasonix code` loop wiring without the CLI wrapper.
436
+ See [docs/ARCHITECTURE.md](docs/ARCHITECTURE.md) for internals.
437
+
438
+ ---
274
439
 
275
- The wizard handles everything on first run. If you'd rather use env vars
276
- (CI, shared boxes, etc.):
440
+ ## Why Reasonix (not LangChain)
441
+
442
+ Every abstraction here earns its weight against a DeepSeek-specific
443
+ property — dirt-cheap tokens, R1 reasoning traces, automatic prefix
444
+ caching, JSON mode. Generic wrappers leave these on the table.
445
+
446
+ | | Reasonix default | generic frameworks |
447
+ |---|---|---|
448
+ | Prefix-stable loop (→ 85–95% cache hit) | yes | no (prompts rebuilt each turn) |
449
+ | Auto-flatten deep tool schemas | yes | no (DeepSeek drops args) |
450
+ | Retry with jittered backoff (429/503) | yes | no (custom callbacks) |
451
+ | Scavenge tool calls leaked into `<think>` | yes | no |
452
+ | Call-storm breaker on identical-arg repeats | yes | no |
453
+ | Live cache-hit / cost / vs-Claude panel | yes | no |
454
+ | First-run config prompt + Markdown TUI | yes | no |
455
+
456
+ On the same τ-bench-lite workload — 8 multi-turn tool-use tasks × 3
457
+ repeats = 48 runs per side, live DeepSeek `deepseek-chat`, sole variable
458
+ prefix stability:
459
+
460
+ | metric | baseline (cache-hostile) | Reasonix | delta |
461
+ |---|---:|---:|---:|
462
+ | cache hit | 46.6% | **94.4%** | +47.7 pp |
463
+ | cost / task | $0.002599 | $0.001579 | **−39%** |
464
+ | pass rate | 96% (23/24) | **100% (24/24)** | — |
465
+
466
+ **Verify it yourself — no API key, zero cost:**
277
467
 
278
468
  ```bash
279
- export DEEPSEEK_API_KEY=sk-... # wins over ~/.reasonix/config.json
280
- export DEEPSEEK_BASE_URL=https://... # optional alternate endpoint
469
+ git clone https://github.com/esengine/reasonix.git && cd reasonix && npm install
470
+ npx reasonix replay benchmarks/tau-bench/transcripts/t01_address_happy.reasonix.r1.jsonl
471
+ npx reasonix diff \
472
+ benchmarks/tau-bench/transcripts/t01_address_happy.baseline.r1.jsonl \
473
+ benchmarks/tau-bench/transcripts/t01_address_happy.reasonix.r1.jsonl
281
474
  ```
282
475
 
283
- Get a key (free credit on signup): <https://platform.deepseek.com/api_keys>
476
+ The committed JSONL transcripts carry per-turn `usage`, `cost`, and
477
+ `prefixHash`. Reasonix's prefix hash stays byte-stable across every
478
+ model call; baseline's churns on every turn. The cache delta is
479
+ *mechanically* attributable to log stability, not to a different
480
+ system prompt.
481
+
482
+ Full 48-run report: [`benchmarks/tau-bench/report.md`](./benchmarks/tau-bench/report.md).
483
+ Reproduce with your own API key: `npx tsx benchmarks/tau-bench/runner.ts --repeats 3`.
284
484
 
285
- Re-run `npx reasonix setup` any time to add/remove MCP servers or switch
286
- preset your existing selections are pre-checked.
485
+ MCP reference runs (one single prefix hash across all 5 turns even
486
+ with two concurrent MCP subprocesses):
487
+
488
+ | server | turns | cache hit | cost | vs Claude |
489
+ |---|---:|---:|---:|---:|
490
+ | bundled demo (`add` / `echo` / `get_time`) | 2 | **96.6%** (turn 2) | $0.000254 | −94.0% |
491
+ | official `server-filesystem` | 5 | **96.7%** | $0.001235 | −97.0% |
492
+ | **both concurrently** | 5 | **81.1%** | $0.001852 | −95.9% |
287
493
 
288
494
  ---
289
495
 
290
496
  ## Non-goals
291
497
 
292
498
  - Multi-agent orchestration (use LangGraph).
293
- - RAG / vector stores (use LlamaIndex or do it yourself).
499
+ - RAG / vector stores (use LlamaIndex).
294
500
  - Multi-provider abstraction (use LiteLLM).
295
501
  - Web UI / SaaS.
296
502
 
@@ -306,13 +512,11 @@ cd reasonix
306
512
  npm install
307
513
  npm run dev chat # run CLI from source via tsx
308
514
  npm run build # tsup to dist/
309
- npm test # vitest (279 tests)
515
+ npm test # vitest (444 tests)
310
516
  npm run lint # biome
311
517
  npm run typecheck # tsc --noEmit
312
518
  ```
313
519
 
314
- See [docs/ARCHITECTURE.md](docs/ARCHITECTURE.md) for internals.
315
-
316
520
  ---
317
521
 
318
522
  ## License