reasonix 0.18.1 → 0.19.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -3,7 +3,7 @@
3
3
  </p>
4
4
 
5
5
  <p align="center">
6
- <em>Cache-first agent loop for DeepSeek V4 (flash + pro) Ink TUI, MCP first-class, no LangChain.</em>
6
+ <em>Cache-first agent loop for DeepSeek V4 — terminal-native, MCP first-class, no LangChain.</em>
7
7
  </p>
8
8
 
9
9
  <p align="center">
@@ -18,39 +18,25 @@
18
18
  [![downloads](https://img.shields.io/npm/dm/reasonix.svg)](https://www.npmjs.com/package/reasonix)
19
19
  [![node](https://img.shields.io/node/v/reasonix.svg)](./package.json)
20
20
 
21
- **A DeepSeek-native AI coding agent in your terminal.** ~30× cheaper
22
- per task than Claude Code, with a cache-first loop engineered for
23
- DeepSeek's pricing model. Edits as reviewable SEARCH/REPLACE blocks.
24
- MIT-licensed. No IDE lock-in. MCP first-class.
21
+ **A DeepSeek-native AI coding agent for your terminal.** ~30× cheaper per task than Claude Code, engineered around DeepSeek's prefix-cache so the savings are real (94% live cache hit, not theoretical). MIT-licensed, no IDE lock-in, MCP first-class.
25
22
 
26
23
  ---
27
24
 
28
- ## Quick start (60 seconds)
29
-
30
- **1. Get a DeepSeek API key.** Free credit on signup:
31
- <https://platform.deepseek.com/api_keys>
32
-
33
- **2. Point it at a project.** No install needed.
25
+ ## Quick start
34
26
 
35
27
  ```bash
36
28
  cd my-project
37
29
  npx reasonix code
38
30
  ```
39
31
 
40
- First run walks you through a 30-second wizard (paste API key pick
41
- preset → multi-select MCP servers). Every run after that drops you
42
- straight in.
43
-
44
- **3. Ask it to edit.** The model proposes edits as SEARCH/REPLACE
45
- blocks — nothing hits disk until you `/apply`.
32
+ First run: paste a [DeepSeek API key](https://platform.deepseek.com/api_keys), pick a preset, optionally select MCP servers. Every run after drops you straight in.
46
33
 
47
34
  ```
48
- reasonix code › users.ts findByEmail 对大小写敏感导致登录失败,帮我改
35
+ reasonix code › fix the case-sensitivity bug in findByEmail
49
36
 
50
37
  assistant
51
38
  ▸ tool<search_files> → src/users.ts, src/users.test.ts
52
- ▸ tool<read_file> → (src/users.ts, 412 chars)
53
- ▸ 找到了。findByEmail 直接用 === 比对。改成小写规范化并补一条测试。
39
+ ▸ tool<read_file> → src/users.ts (412 chars)
54
40
 
55
41
  src/users.ts
56
42
  <<<<<<< SEARCH
@@ -60,918 +46,124 @@ src/users.ts
60
46
  return users.find(u => u.email.toLowerCase() === needle);
61
47
  >>>>>>> REPLACE
62
48
 
63
- ▸ 1 pending edit across 1 file — /apply to write · /discard to drop
64
-
65
- reasonix code › /apply
66
- ▸ ✓ applied src/users.ts
49
+ ▸ 1 pending edit · /apply to write, /discard to drop
67
50
  ```
68
51
 
69
- Requires Node ≥ 20.10. macOS, Linux, Windows (PowerShell / Git Bash /
70
- Windows Terminal). Press `Esc` anytime to abort; `/help` for the full
71
- command list.
52
+ Edits stay in memory until you type `/apply` — nothing hits disk by default. Requires Node ≥ 20.10. Tested on macOS, Linux, and Windows (PowerShell, Git Bash, Windows Terminal).
72
53
 
73
54
  ---
74
55
 
75
- ## At a glance
76
-
77
- | | Reasonix | Claude Code | Cursor | Aider |
78
- |------------------------------------|----------------|----------------|----------------|----------------|
79
- | Backend | DeepSeek V4 | Anthropic | OpenAI / Anthropic | any |
80
- | Cost / typical task | **~$0.001–0.005** | ~$0.05–0.50 | $20/mo + usage | varies |
81
- | Where it runs | terminal | terminal + IDE | IDE (Electron) | terminal |
82
- | License | **MIT** | closed | closed | Apache 2 |
83
- | DeepSeek prefix-cache hit rate | **90.2%** | n/a | n/a | ~33% |
84
- | Reviewable edits (no auto-write) | **yes** (`/apply`) | yes | partial | yes |
85
- | MCP servers | **first-class**| first-class | — | — |
86
-
87
- Numbers from `benchmarks/tau-bench-lite` (8 multi-turn coding tasks ×
88
- 3 repeats, live `deepseek-chat`). Same workload, sole variable is
89
- prefix stability — committed transcripts in [`benchmarks/`](./benchmarks/).
90
- The full feature comparison [is below](#why-reasonix-vs-cursor--claude-code--cline--aider).
56
+ ## How it compares
91
57
 
92
- ---
93
-
94
- ## Web dashboard
58
+ | | Reasonix | Claude Code | Cursor | Aider |
59
+ |----------------------------------|------------------|-----------------|--------------------|------------------|
60
+ | Backend | DeepSeek V4 | Anthropic | OpenAI / Anthropic | any (OpenRouter) |
61
+ | **Cost / typical task** | **~¥0.01–0.04** | ~¥0.40–4 | ¥150/mo + usage | varies |
62
+ | Surface | terminal | terminal + IDE | IDE (Electron) | terminal |
63
+ | License | **MIT** | closed | closed | Apache 2 |
64
+ | **DeepSeek prefix-cache hit** | **94%** (live) | n/a | n/a | ~33% (baseline) |
65
+ | Plan mode (read-only audit gate) | yes | yes | — | yes |
66
+ | Edit review (`/apply`, no auto-write) | yes | yes | partial | yes |
67
+ | MCP servers | first-class | first-class | — | — |
68
+ | User-authored skills | yes | yes | — | — |
69
+ | Embedded web dashboard | yes | — | n/a (IDE) | — |
70
+ | Hooks (`PreToolUse`, etc.) | yes | yes | — | — |
71
+ | Sandbox boundary | strict | yes | partial | yes |
72
+ | Persistent per-workspace sessions | yes | partial | n/a | — |
95
73
 
96
- Type `/dashboard` inside any session and Reasonix prints a localhost
97
- URL with a one-time token. Open it for a 13-tab control surface that
98
- mirrors the running TUI — chat (with live streaming), the editor (file
99
- tree + CodeMirror, syntax highlighting + autocomplete + side-by-side
100
- diff for pending edits), Usage / Sessions / Plans / Tools /
101
- Permissions / System / MCP / Skills / Memory / Hooks / Settings.
74
+ Numbers from `benchmarks/tau-bench-lite` (8 multi-turn tasks × 3 repeats, live `deepseek-chat`). [Committed transcripts →](./benchmarks/)
102
75
 
103
- ```
104
- reasonix code /dashboard
105
- ▸ http://127.0.0.1:54219/?token=… (open in browser)
106
- ```
76
+ <details>
77
+ <summary><strong>Why DeepSeek-only? the cache economics</strong></summary>
107
78
 
108
- 127.0.0.1 only, ephemeral token expires when the session ends, every
109
- mutation is CSRF-checked. The TUI keeps working — modals (shell
110
- confirms, plan reviews, edit gates) mirror to whichever surface you
111
- look at first. No build step, no Electron, no separate process to
112
- keep alive.
79
+ Cheap tokens alone is half the story. DeepSeek's prefix-cache is **byte-stable**: the cache fingerprints from byte 0 of the prompt. Reasonix's loop is engineered around that — append-only growth, no re-ordering, no marker-based compaction — so the cache prefix survives every tool call.
113
80
 
114
- ---
81
+ By comparison, Claude Code is built around Anthropic's `cache_control` markers (a fundamentally different mechanic). Pointing it at DeepSeek's Anthropic-compat endpoint keeps the cheap tokens but loses the cache hits — markers are ignored, and the underlying prefix isn't byte-stable. Generic-backend tools (Aider / Cline / Continue) hit the same wall from the other direction: their compaction patterns destroy byte stability.
115
82
 
116
- ## Why Reasonix? (vs Cursor / Claude Code / Cline / Aider)
117
-
118
- Three things you'd come to Reasonix for, that nothing else combines:
119
-
120
- - **Cost economics that land in your bill.** DeepSeek V4 is ~30×
121
- cheaper than Claude Sonnet per token. Cheap tokens alone isn't the
122
- win — *cheap tokens with a 90%+ prefix-cache hit* is. Reasonix's
123
- loop is engineered around append-only prompt growth so the
124
- cache-stable prefix survives every tool call. The benchmarks
125
- section verifies this end-to-end: 90.2% live cache hit, versus
126
- 32.8% for a generic harness on the same workload. The `/stats`
127
- panel surfaces "vs Claude Sonnet 4.6" savings on every turn.
128
-
129
- - **It lives in your terminal.** Pure CLI — no Electron, no VS Code
130
- extension, no IDE plugin to wedge into your editor. Sits next to
131
- git, tmux, and your shell history. macOS / Linux / Windows
132
- (PowerShell, Git Bash, Windows Terminal all tested). The only
133
- network call is to the DeepSeek API itself; no vendor server in
134
- the middle.
135
-
136
- - **Open source and hackable, end to end.** MIT-licensed TypeScript.
137
- The entire loop, tool registry, cache-stable prefix, TUI, MCP
138
- bridge — all in `src/` under 30k lines. Fork it, ship a private
139
- build, drop it into CI. No SaaS layer, no enterprise tier, no
140
- feature gates.
141
-
142
- | | Reasonix | Claude Code | Cursor | Cline | Aider |
143
- |---|---|---|---|---|---|
144
- | Backend | DeepSeek V4 only | Anthropic only | OpenAI / Anthropic | any (OpenRouter) | any (OpenRouter) |
145
- | Cost / typical task | **~$0.001–$0.005** | ~$0.05–$0.50 | $20/mo + usage | varies | varies |
146
- | Where it runs | terminal | terminal + IDE | IDE (Electron) | VS Code only | terminal |
147
- | License | **MIT** | closed | closed | Apache 2 | Apache 2 |
148
- | Cache-first prefix loop | **engineered (94% hit)** | basic | n/a | n/a | basic |
149
- | MCP servers | **first-class** | first-class | — | beta | — |
150
- | Plan mode (read-only audit gate) | **yes** | yes | — | yes | — |
151
- | User-authored skills | **yes** | yes | — | — | — |
152
- | Edit review (no auto-write) | **yes** (`/apply`) | yes | partial | yes | yes |
153
- | Workspace switch (`/cwd`, `change_workspace`) | **yes** | — | n/a (per-window) | — | — |
154
- | Cross-session cost dashboard | **yes** (`/stats`) | — | — | — | — |
155
- | Sandbox boundary enforcement | **strict** (refuses `..` escape) | yes | partial | yes | partial |
83
+ At DeepSeek's pricing $0.07/Mtok uncached, $0.014/Mtok cached **the difference between 50% and 94% hit is roughly 2.5× on input cost alone.** Same model, same API; the loop's invariants are what changed.
156
84
 
157
- <details>
158
- <summary><strong>When reasonix is the wrong choice · DeepSeek/Anthropic-compat caveats · vs Aider/Cline/Continue</strong></summary>
159
-
160
- ### Pick something else when
161
-
162
- - **You want multi-provider flexibility** (mix Claude / GPT / Gemini /
163
- local Llama in one tool). Try [Aider](https://aider.chat) or
164
- [Cline](https://cline.bot). Reasonix is DeepSeek-only on purpose —
165
- every layer (cache-first loop, R1 harvesting, JSON-mode tool repair,
166
- reasoning-effort cap) is tuned against DeepSeek-specific behavior
167
- and economics. Coupling to one backend is the feature, not a
168
- limitation we'll grow out of.
169
- - **You want IDE integration** (inline diff in your gutter,
170
- multi-cursor, ghost text, refactor previews). Try
171
- [Cursor](https://cursor.com) or Claude Code's IDE mode. Reasonix
172
- is terminal-first; the diff lives in `git diff`, the file tree
173
- lives in `ls`, the chat lives in your shell.
174
- - **You're chasing the hardest reasoning benchmarks.** Claude Opus
175
- 4.6 still wins some leaderboards. DeepSeek V4-pro is competitive
176
- on most coding tasks but doesn't lead every benchmark. If your
177
- task is "solve this PhD-level proof" rather than "fix this auth
178
- bug," start with Claude.
179
- - **You need fully-local / fully-free**. DeepSeek's API has free
180
- credit on signup, but isn't free forever. For air-gapped or
181
- always-free, look at Aider + Ollama or [Continue](https://continue.dev).
182
-
183
- ### "But DeepSeek now has an Anthropic-compatible API — can't I just point Claude Code at it?"
184
-
185
- You can. DeepSeek ships an official Anthropic-compatible endpoint at
186
- `https://api.deepseek.com/anthropic`, and Claude Code (or any Anthropic
187
- SDK client) talks to it without modification. The protocol works. The
188
- **caching economics** don't transfer, and that's the whole point.
189
-
190
- Look at DeepSeek's [own compatibility table](https://api-docs.deepseek.com/guides/anthropic_api):
191
-
192
- | Field | Status on DeepSeek's compat endpoint |
193
- |---|---|
194
- | `cache_control` markers | **Ignored** |
195
- | `mcp_servers` (API-level) | Ignored |
196
- | `thinking.budget_tokens` | Ignored |
197
- | Images / documents / citations | Not supported |
198
-
199
- `cache_control: Ignored` is the load-bearing line. Two completely
200
- different cache mechanics are colliding here:
201
-
202
- | | Anthropic native | DeepSeek auto-cache |
203
- |---|---|---|
204
- | Model | **Marker-based.** You put `cache_control` on a message; Anthropic caches "everything up to this marker" as a content-addressed unit. Multiple markers = multiple independent breakpoints. | **Byte-stable prefix.** The cache fingerprints the literal byte stream from byte 0. |
205
- | Claude Code's design | Built around this. Markers on system prompt + tool defs let the loop reorder, compact, or insert metadata after the markers without losing the cache. | n/a — Claude Code wasn't designed for byte-stable prefixes. |
206
- | What happens when Claude Code → DeepSeek compat | Markers stripped (ignored). Claude Code's main caching strategy disappears. | Falls back to auto-cache. But Claude Code's prefix isn't byte-stable (markers were the *substitute* for byte-stability), so auto-cache misses too. |
207
-
208
- Net effect: **Claude Code's loop, redirected at DeepSeek, gets the
209
- cheap tokens and loses the cache hit it depended on.** A loop running
210
- at 80%+ cache hit on Anthropic's marker cache lands somewhere in the
211
- 40-60% range on DeepSeek's auto-cache (matches the generic-harness
212
- baseline in our benchmarks). Same model, same API, same workload —
213
- the loop's invariants don't fit the cache mechanic it's now talking
214
- to.
215
-
216
- Reasonix's loop was designed around byte-stable prefix from line one.
217
- No markers, no breakpoints — append-only is the invariant. That's why
218
- the same τ-bench workload lands at **90.2% cache hit** on Reasonix
219
- and **32.8%** on a cache-hostile baseline (committed transcripts;
220
- benchmarks section below). At DeepSeek's pricing — $0.07/Mtok
221
- uncached, ~$0.014/Mtok cached — the difference between 50% and 94%
222
- hit is **roughly 2.5× on input cost alone**.
223
-
224
- ### "What about Aider / Cline / Continue?"
225
-
226
- They support DeepSeek natively (no compat layer needed) and you do
227
- get the cheap token price. What you don't get is the DeepSeek-
228
- specific loop work — those tools' loops support every backend
229
- generically (OpenAI / Anthropic / local Llama / ...) and use
230
- compaction + summarization patterns that destroy byte-stability. They
231
- land in the same 40-60% cache-hit range as the baseline. Plus a
232
- handful of DeepSeek-specific quirks generic loops don't handle:
85
+ A few DeepSeek-specific fixes generic loops miss:
233
86
 
234
87
  | Generic loops assume | DeepSeek actually does | Reasonix's fix |
235
88
  |---|---|---|
236
- | Reasoning emitted as a structured `thinking` block | R1 sometimes leaks tool-call JSON inside `<think>` tags | a `scavenge` pass that pulls escaped tool calls back out, otherwise the model thinks it called and waits for output that never comes |
237
- | Tool schemas validated strictly | DeepSeek silently drops deeply-nested object/array params | auto-flatten — nested params get rewritten to single-level prefixed names so the model sees them at all |
238
- | Tool-call args are well-formed JSON | DeepSeek occasionally produces `string="false"` and other malformed fragments | dedicated `ToolCallRepair` heals the common shapes before they hit dispatch |
239
- | Reasoning depth tuned via system-level switches | V4 exposes a `reasoning_effort` knob (`max` / `high`) | `/effort` slash + `--effort` flag, so users can step down for cheap turns |
240
- | Old tool results kept in full forever | 1M context — don't compact pre-emptively, but most agents do | call-storm breaker + result token cap, but the prefix is *never* rewritten; compaction lands as new turns at the tail |
89
+ | Reasoning emitted as a structured `thinking` block | R1 sometimes leaks tool-call JSON inside `<think>` tags | a `scavenge` pass that pulls escaped tool calls back out |
90
+ | Tool schemas validated strictly | DeepSeek silently drops deeply-nested object/array params | auto-flatten — nested params get rewritten to single-level prefixed names |
91
+ | Tool-call args are well-formed JSON | DeepSeek occasionally produces `string="false"` and other malformed fragments | dedicated `ToolCallRepair` heals the common shapes before dispatch |
92
+ | Reasoning depth tuned via system-level switches | V4 exposes a `reasoning_effort` knob (`max` / `high`) | `/effort` slash + `--effort` flag for cheap turns |
241
93
 
242
- > Cache-stability isn't a feature you turn on; it's an invariant
243
- > the loop is designed around. Reasonix isn't yet-another agent
244
- > CLI — it's an agent CLI built around DeepSeek's specific cache
245
- > mechanic and pricing model.
94
+ Cache stability isn't a feature you turn on; it's an invariant the loop is designed around. That's the entire reason Reasonix is DeepSeek-only.
246
95
 
247
96
  </details>
248
97
 
249
98
  ---
250
99
 
251
- ## `reasonix code` — pair programmer in your terminal
252
-
253
- Scoped to the directory you launch from. The model has native
254
- `read_file` / `write_file` / `edit_file` / `list_directory` /
255
- `search_files` / `directory_tree` / `get_file_info` /
256
- `create_directory` / `move_file` tools, all sandboxed — any path that
257
- resolves outside the launch root (including `..` and symlink escapes)
258
- is refused. Plus `run_command` with a read-only allowlist; anything
259
- state-mutating (`npm install`, `git commit`, …) is gated behind a
260
- confirmation picker.
261
-
262
- ### Walkthrough: explore before editing
263
-
264
- For "what does this code do?" questions the model uses the read-side
265
- tools and replies in prose — no SEARCH/REPLACE blocks, no file
266
- writes. Ask to change something only when you mean it:
267
-
268
- ```
269
- reasonix code › 这个项目的路由是怎么组织的?
270
- assistant
271
- ▸ tool<directory_tree> → (src/ tree, 47 entries)
272
- ▸ tool<read_file> → (src/router.ts, 1.2 KB)
273
- ▸ 路由分三层:顶层 AppRouter 注册 tab,每个 tab 用 React Router 的
274
- nested routes 写子路径,最后 …
275
- ```
100
+ ## What's in the box
276
101
 
277
- If an `edit_file` SEARCH block doesn't match the file byte-for-byte,
278
- the edit is refused loudly rather than fuzzy-matched. The model sees
279
- the error and retries — silent wrong edits are worse than visible
280
- rejections.
102
+ ### Cache-first agent loop
103
+ Loop preserves prefix stability across tool dispatches. R1-style reasoning supported, with a scavenge pass that pulls escaped tool calls back out of `<think>` blocks. Tool-call repair handles malformed args before they hit dispatch. `/effort` lets you step reasoning depth down for cheap turns.
281
104
 
282
- ### Plan mode — review before executing
105
+ ### Tool registry
106
+ Native: `read_file`, `write_file`, `edit_file` (SEARCH/REPLACE), `list_directory`, `search_files`, `grep_files`, `run_command`, `run_background`, `web_search`, `web_fetch`. All sandboxed to the launch directory. **MCP first-class** — `--mcp 'name=cmd args'` adds external servers (stdio / Streamable HTTP / SSE), tools merge into the registry under a prefix.
283
107
 
284
- For anything bigger than a typo, the model is encouraged to propose a
285
- markdown plan first. You'll see a picker with **Approve / Refine /
286
- Cancel**:
108
+ ### Plan mode + edit review
109
+ `/plan` enters a read-only audit gate where the model can't dispatch edits until you approve a written plan. Edits emerge as SEARCH/REPLACE blocks; nothing hits disk until `/apply`. `/walk` steps through pending edits one at a time. `/discard` drops them all.
287
110
 
288
- ```
289
- reasonix code auth JWT 迁移到 session cookies
290
-
291
- ▸ plan submitted — awaiting your review
292
- ────────────────────────────────────────
293
- ## Summary
294
- Swap JWT middleware for session cookies, keep user table intact.
295
-
296
- ## Files
297
- - src/auth/middleware.ts — replace `verifyJwt` with `readSession`
298
- - src/auth/session.ts — new file, in-memory store + signed cookie
299
- - src/routes/login.ts — return Set-Cookie instead of a token
300
- - tests/auth/*.test.ts — update fixtures
301
-
302
- ## Risks
303
- - Existing logged-in users get logged out (no migration).
304
- - Session store is in-memory; restart clears sessions.
305
- ────────────────────────────────────────
306
- ▸ Approve and implement
307
- Refine — explore more
308
- Cancel
309
- ```
111
+ ### Sessions, scoped per workspace
112
+ Sessions persist in `~/.reasonix/sessions/` and are filtered by launch directory. `--new` preserves the previous session under a timestamped name; `--resume` finds the latest. `/sessions` switches mid-chat without quitting.
310
113
 
311
- **Force it** with `/plan` — enters an explicit read-only phase where
312
- the model *must* submit a plan before any edit or non-allowlisted
313
- shell call will execute. Use for high-stakes changes you want to
314
- audit before the model touches disk. `/plan off` or picker
315
- Approve/Cancel exits.
114
+ ### Embedded web dashboard
115
+ `/dashboard` opens a localhost SPA mirroring the running TUI chat (with full composer fallback when the TUI's renderer breaks down on legacy PowerShell), editor (file tree + CodeMirror), Sessions / Plans / Usage / Tools / MCP / Memory / Hooks / Settings. Token-gated, CSRF-checked, ephemeral. [Design mockup →](./design/agent-dashboard.html)
316
116
 
317
- ### Prompt prefixes — `!cmd` and `@path`
117
+ ### Hooks
118
+ Configurable shell scripts that fire on `PreToolUse`, `PostToolUse`, `UserPromptSubmit`, `Stop`, `Notification`, `SessionEnd`. Lives in `.reasonix/settings.json` (per-project) or `~/.reasonix/settings.json` (per-user). The harness executes them — not the model.
318
119
 
319
- Two inline shortcuts that don't need a slash:
120
+ ### Memory + skills
121
+ Two layers: project-scoped `REASONIX.md` (committed, repo conventions) and user-scoped `~/.reasonix/memory/` (per-user, the model can write to it via the `remember` tool). Skills are user-authored prompt packs with optional sub-agent execution.
320
122
 
321
- **`!<cmd>` — run a shell command in the sandbox and feed it to the
322
- model.** Typed at the prompt, like bash. Output lands in the visible
323
- log AND in the session so the model's next turn reasons about it:
123
+ ### Permissions
124
+ `allow` / `ask` / `deny` patterns on commands and tools. `npm publish` defaults to `ask`; `rm -rf *` and `git push --force *` default to `deny`. Approved-once decisions can be remembered for a prefix.
324
125
 
325
- ```
326
- reasonix code › !git status --short
327
- ▸ M src/users.ts
328
- ▸ M src/users.test.ts
126
+ [Full feature docs on the website →](https://esengine.github.io/reasonix/) · [Architecture →](./docs/ARCHITECTURE.md) · [TUI design mockup →](./design/agent-tui-terminal.html)
329
127
 
330
- reasonix code › 把这两个文件的改动说明一下
331
- assistant
332
- ▸ tool<read_file> → src/users.ts, src/users.test.ts
333
- ▸ …
334
- ```
335
-
336
- No allowlist gate — user-typed shell = explicit consent. 60s timeout,
337
- 32k char cap, survives session resume.
338
-
339
- **`@path/to/file` — inline a file under "Referenced files."** Start
340
- typing `@` and a picker appears (↑/↓ navigate, Tab/Enter to insert).
341
- Good for "what does @src/users.ts do?" without making the model
342
- `read_file` it first. Sandboxed: relative paths only, no `..` escape,
343
- 64KB per-file cap. Recent files rank higher.
344
-
345
- ### `/commit` — stage + commit in one step
346
-
347
- ```
348
- reasonix code › /commit "fix: findByEmail case-insensitive"
349
- ▸ git add -A && git commit -m "fix: findByEmail case-insensitive"
350
- [main a1b2c3d] fix: findByEmail case-insensitive
351
- ```
352
-
353
- ### Things to try
354
-
355
- - `/tool 1` — dump the last tool call's full output (when the 400-char
356
- inline clip isn't enough).
357
- - `/think` — see the model's full reasoning for the last turn
358
- (thinking-mode models: v4-flash / v4-pro / reasoner alias).
359
- - `/undo` — roll back the last applied edit batch.
360
- - `/new` — start fresh in the same directory without losing the
361
- session file.
362
- - `/effort high` — step down from the default `max` agent-class
363
- reasoning_effort for cheaper/faster turns on simple tasks.
364
- - `npx reasonix code --preset pro` — v4-pro for the whole session,
365
- no auto-downgrade to flash. Pair with `--branch 3` if you want
366
- 3-way self-consistency on gnarly refactors.
367
- - `npx reasonix code src/` — narrower sandbox (only `src/` is
368
- writable).
369
- - `npx reasonix code --no-session` — ephemeral; nothing saved.
370
-
371
- ### `reasonix stats` — how much did you actually save?
372
-
373
- Every turn `reasonix chat|code|run` runs appends a compact record
374
- (tokens + cost + what Claude Sonnet 4.6 would have charged) to
375
- `~/.reasonix/usage.jsonl`. `reasonix stats` with no args rolls that
376
- log into today / week / month / all-time windows:
377
-
378
- ```
379
- Reasonix usage — /Users/you/.reasonix/usage.jsonl
380
-
381
- turns cache hit cost (USD) vs Claude saved
382
- ----------------------------------------------------------------------
383
- today 8 95.1% $0.004821 $0.1348 96.4%
384
- week 34 93.8% $0.023104 $0.6081 96.2%
385
- month 127 94.2% $0.081530 $2.1452 96.2%
386
- all-time 342 94.0% $0.210881 $5.8934 96.4%
387
- ```
388
-
389
- Privacy: only tokens, costs, and the session name you chose land
390
- in the file. No prompts, no completions, no tool arguments.
391
- `reasonix stats <transcript>` keeps the old per-file summary
392
- (assistant turns + tool calls) for scripts that already use it.
393
-
394
- ### Staying current
395
-
396
- The panel header shows the running version next to `Reasonix` (e.g.
397
- `Reasonix 0.12.6 · v4-flash · AUTO · max …`, the trailing `max` is
398
- the reasoning-effort badge — `/effort high` to step down).
399
- A quiet 24-hour background check against
400
- the npm registry surfaces a yellow `update: X.Y.Z` on the right side
401
- of the same row when a newer version has been published. No blocking,
402
- no nagging — the check runs once per day max and is silent on failure
403
- (offline, firewall, etc.).
404
-
405
- ```bash
406
- reasonix update # print current vs latest, run `npm i -g reasonix@latest`
407
- reasonix update --dry-run # print the plan without running anything
408
- ```
409
-
410
- Running via `npx`? The command detects that and prints a
411
- cache-refresh hint instead — npx picks up the newest version on
412
- its next invocation automatically.
413
-
414
- ### Project conventions — `REASONIX.md`
415
-
416
- Drop a `REASONIX.md` in the project root and its contents are pinned
417
- into the system prompt every launch. Committable team memory — house
418
- conventions, domain glossary, things the model keeps forgetting:
419
-
420
- ```bash
421
- cat > REASONIX.md <<'EOF'
422
- # Notes for Reasonix
423
- - Use snake_case for new Python modules; legacy camelCase modules keep their style.
424
- - `cargo check` is in the auto-run allowlist; full `cargo test` needs confirmation.
425
- - The `api/` dir mirrors `backend/` — keep schemas in sync.
426
- EOF
427
- ```
428
-
429
- Re-launch (or `/new`) to pick it up; the prefix is hashed once per
430
- session to keep the DeepSeek cache warm. `/memory` prints what's
431
- currently pinned. `REASONIX_MEMORY=off` disables every memory source
432
- for CI / offline repro.
433
-
434
- ### User memory — `~/.reasonix/memory/`
435
-
436
- A second, **private per-user** memory layer lives under your home
437
- directory. Unlike `REASONIX.md` it's never committed, and the model
438
- can write to it itself via the `remember` tool. Two scopes:
439
-
440
- - `~/.reasonix/memory/global/` — cross-project (your preferences,
441
- tooling).
442
- - `~/.reasonix/memory/<project-hash>/` — scoped to one sandbox root
443
- in `reasonix code` (decisions, local facts, per-repo shortcuts).
444
-
445
- Each scope keeps an always-loaded `MEMORY.md` index of one-liners
446
- plus zero or more `<name>.md` detail files (loaded on demand via
447
- `recall_memory`). Writes land immediately; pinning into the system
448
- prompt takes effect on next `/new` or launch so the cache prefix
449
- stays stable for the current session.
450
-
451
- ```
452
- reasonix code › 我用 bun 而不是 npm,请以后都用 bun 跑构建
453
-
454
- assistant
455
- ▸ tool<remember> → project/bun_build saved
456
- "Build command on this machine is `bun run build`"
457
- ```
458
-
459
- **Slash**: `/memory` · `/memory list` · `/memory show <name>` ·
460
- `/memory forget <name>` · `/memory clear <scope> confirm`.
461
- **Model tools**: `remember(type, scope, name, description, content)` ·
462
- `forget(scope, name)` · `recall_memory(scope, name)`.
463
-
464
- Project scope is only available inside `reasonix code` (needs a real
465
- sandbox root to hash); plain `reasonix` gets the global scope only.
466
-
467
- ### Skills — user-authored prompt packs
468
-
469
- Skills are prose instruction blocks you drop on disk. Reasonix pins
470
- their names + one-line descriptions into the system prompt; the
471
- model can call `run_skill({name: "..."})` on its own when a match
472
- fits, or you can type `/skill <name> [args]` to run one manually.
473
-
474
- Two scopes, same layout as user memory:
475
-
476
- - `<project>/.reasonix/skills/` — per-project skills (commit them to
477
- share with your team, or add to `.gitignore` for personal drafts).
478
- - `~/.reasonix/skills/` — global skills available everywhere.
479
-
480
- Either layout works: `<name>/SKILL.md` (preferred — can bundle
481
- additional assets alongside) or flat `<name>.md`.
482
-
483
- ```markdown
484
- ---
485
- name: review
486
- description: Review uncommitted changes and flag risks
487
128
  ---
488
129
 
489
- Run `git diff` on staged and unstaged changes. Summarize what each
490
- hunk does, call out potential regressions, and list files that might
491
- need additional tests. Don't propose edits unless I ask.
492
- ```
493
-
494
- Use it:
495
-
496
- ```
497
- reasonix code › /skill review
498
- ▸ running skill: review
499
- assistant
500
- ▸ tool<run_command> → git diff --cached
501
- ▸ 3 改动,1 个需要回归测试 …
502
- ```
503
-
504
- Or let the model pick autonomously — because the skill's name +
505
- description are pinned in the prefix, asking "帮我看下未提交的改动有没
506
- 有风险" triggers `run_skill({name: "review"})` without you typing the
507
- slash command.
508
-
509
- **Slash**: `/skill` (list) · `/skill show <name>` · `/skill <name>
510
- [args]` (inject body as user turn).
511
-
512
- **Deliberately not tied** to any other client's directory convention
513
- (`.claude/skills`, etc.) — Reasonix is model-agnostic at the
514
- conversation layer. Any SKILL.md you author works; the body is
515
- prose, so skills authored for other tools usually port over unchanged
516
- (Reasonix's tool names differ — `filesystem` / `shell` / `web` — but
517
- the model reads the instructions and picks our equivalents).
518
-
519
- ### Hooks — automate around tool calls and turns
520
-
521
- Drop a `settings.json` under `.reasonix/` (project or `~/`) and
522
- Reasonix will fire shell commands at four well-known points in
523
- the loop: before a tool runs, after a tool returns, before your
524
- prompt reaches the model, and after the turn ends.
525
-
526
- ```json
527
- // <project>/.reasonix/settings.json ← committable
528
- // ~/.reasonix/settings.json ← per-user
529
- {
530
- "hooks": {
531
- "PreToolUse": [{ "match": "edit_file|write_file", "command": "bun scripts/guard.ts" }],
532
- "PostToolUse": [{ "match": "edit_file", "command": "biome format --write" }],
533
- "UserPromptSubmit": [{ "command": "echo $(date +%s) >> ~/.reasonix/prompts.log" }],
534
- "Stop": [{ "command": "bun test --run", "timeout": 60000 }]
535
- }
536
- }
537
- ```
538
-
539
- Each hook is a shell command. Reasonix invokes it with stdin = a
540
- JSON envelope describing the event:
130
+ ## Contributing
541
131
 
542
- ```json
543
- { "event": "PreToolUse", "cwd": "/path/to/project",
544
- "toolName": "edit_file", "toolArgs": { "path": "src/x.ts", "..." } }
545
- ```
546
-
547
- Exit code drives the decision:
548
-
549
- - **0** — pass; loop continues normally
550
- - **2** — block (only on `PreToolUse` / `UserPromptSubmit`); the
551
- hook's stderr becomes the synthetic tool result the model sees,
552
- or the prompt is dropped entirely
553
- - **anything else** — warn; loop continues, stderr renders as a
554
- yellow row inline
132
+ Reasonix is solo-maintained but designed to grow. Scoped starter issues:
555
133
 
556
- `match` is anchored regex on the tool name; `*` or omitted matches
557
- every tool. Project hooks fire before global hooks. Default
558
- timeouts: 5s for blocking events, 30s for logging events; per-hook
559
- `timeout` overrides.
134
+ - [#15 `reasonix doctor --json` flag](https://github.com/esengine/reasonix/issues/15) · CLI · 2-3h
135
+ - [#16 `web_search` / `web_fetch` actionable error messages](https://github.com/esengine/reasonix/issues/16) · tools · 2-3h
136
+ - [#17 Slash command "did you mean?" suggestion](https://github.com/esengine/reasonix/issues/17) · TUI · 2-3h
137
+ - [#18 — Unit tests for `clipboard.ts`](https://github.com/esengine/reasonix/issues/18) · tests · 2-3h
560
138
 
561
- **Slash**: `/hooks` (list active hooks) · `/hooks reload` (re-read
562
- `settings.json` from disk without losing your session).
139
+ Each has background, code pointers, acceptance criteria, hints. Browse all [`good first issue`](https://github.com/esengine/reasonix/labels/good%20first%20issue)s.
563
140
 
564
- ### Staying current from inside the TUI
565
-
566
- `/update` inside a running session shows your current version, the
567
- last-resolved latest version (from the quiet 24h background check),
568
- and the shell command to run. The slash does *not* spawn
569
- `npm install` — stdio:inherit into a running Ink renderer corrupts
570
- the display. Exit the session and run `reasonix update` in a
571
- fresh shell when you actually want to install.
572
-
573
- ---
141
+ **Open Discussions** opinions wanted:
142
+ - [#20 · CLI / TUI design](https://github.com/esengine/reasonix/discussions/20) — what's broken, what's missing, what would you change?
143
+ - [#21 · Dashboard design](https://github.com/esengine/reasonix/discussions/21) react against the [proposed mockup](./design/agent-dashboard.html)
144
+ - [#22 · Future feature wishlist](https://github.com/esengine/reasonix/discussions/22) what would you build into Reasonix next?
574
145
 
575
- ## `reasonix` also works as general chat
576
-
577
- Same TUI, no filesystem tools unless you opt in via MCP. Good for
578
- drafting, Q&A, schema design, architecture discussions, or driving
579
- your own MCP servers. Sessions persist per name under
580
- `~/.reasonix/sessions/`.
581
-
582
- ```bash
583
- npx reasonix # uses saved config + wizard-selected MCP
584
- npx reasonix --preset pro # pin v4-pro for the whole run (no auto-downgrade)
585
- npx reasonix --session design # named session — resume later with --session design
586
- ```
587
-
588
- Bridge your own MCP servers on the fly:
146
+ **Before your first PR**: read [`CONTRIBUTING.md`](./CONTRIBUTING.md). Short, strict project rules (comments, errors, libraries-over-hand-rolled); `tests/comment-policy.test.ts` enforces the comment ones and `npm run verify` is the pre-push gate.
589
147
 
590
148
  ```bash
591
- npx reasonix \
592
- --mcp "fs=npx -y @modelcontextprotocol/server-filesystem /tmp/safe" \
593
- --mcp "kb=https://mcp.example.com/sse"
594
- ```
595
-
596
- MCP tools go through the same Cache-First + repair + context-safety
597
- plumbing as native tools — 32k result cap, live progress-notification
598
- rendering, retries.
599
-
600
- ---
601
-
602
- ## Commands inside the session
603
-
604
- <details>
605
- <summary><strong>Slash command reference</strong> (click to expand)</summary>
606
-
607
- **Core**
608
-
609
- | command | what it does |
610
- |---|---|
611
- | `/help` · `/?` | full command reference with hints |
612
- | `/status` | current model · flags · context · session |
613
- | `/new` · `/reset` | fresh conversation in the same session |
614
- | `/clear` | clear visible scrollback only (log kept) |
615
- | `/retry` | truncate and resend your last message (fresh sample) |
616
- | `/exit` · `/quit` | quit |
617
-
618
- **Model**
619
-
620
- | command | what it does |
621
- |---|---|
622
- | `/preset <auto\|flash\|pro>` | model commitment — `auto` = flash with escalation, `flash` = locked flash, `pro` = locked pro |
623
- | `/model <id>` | switch DeepSeek model (`deepseek-v4-flash`, `deepseek-v4-pro`, plus `deepseek-chat` / `deepseek-reasoner` compat aliases) |
624
- | `/models` | list live models from DeepSeek `/models` endpoint |
625
- | `/harvest [on\|off]` | toggle R1 plan-state extraction |
626
- | `/branch <N\|off>` | run N parallel samples per turn, pick best (N ≥ 2) |
627
- | `/effort <high\|max>` | reasoning_effort cap — `max` is the agent default, `high` is cheaper/faster |
628
- | `/think` | dump the last turn's full thinking-mode reasoning |
629
-
630
- **Context & tools**
631
-
632
- | command | what it does |
633
- |---|---|
634
- | `/mcp` | list attached MCP servers and their tools / resources / prompts |
635
- | `/resource [uri]` | browse + read MCP resources (no arg → list URIs; `<uri>` → fetch) |
636
- | `/prompt [name]` | browse + fetch MCP prompts |
637
- | `/tool [N]` | dump the Nth tool call's full output (1 = latest) |
638
- | `/compact [tokens]` | shrink oversized tool results in the log (default 4000 tokens/result) |
639
- | `/context` | break down where context tokens are going (system / tools / log) |
640
- | `/stats` | cross-session cost dashboard (today / week / month / all-time) |
641
- | `/keys` | keyboard shortcuts + prompt prefixes (`!` / `@` / `/`) cheatsheet |
642
-
643
- **Memory & skills**
644
-
645
- | command | what it does |
646
- |---|---|
647
- | `/memory` | show pinned memory (REASONIX.md + ~/.reasonix/memory) |
648
- | `/memory list` · `show <name>` · `forget <name>` · `clear <scope> confirm` | manage the store |
649
- | `/skill` · `/skill list` | list discovered skills (project + global) |
650
- | `/skill show <name>` | dump one skill's body |
651
- | `/skill <name> [args]` | run a skill (inject body as user turn) |
652
-
653
- **Sessions**
654
-
655
- | command | what it does |
656
- |---|---|
657
- | `/sessions` | list saved sessions (current marked with `▸`) |
658
- | `/forget` | delete the current session from disk |
659
- | `/setup` | reconfigure (exit and run `reasonix setup`) |
660
-
661
- **Code mode only** (`reasonix code`)
662
-
663
- | command | what it does |
664
- |---|---|
665
- | `/apply` | commit the pending SEARCH/REPLACE blocks to disk |
666
- | `/discard` | drop the pending edit blocks without writing |
667
- | `/undo` | roll back the last applied edit batch |
668
- | `/commit "msg"` | `git add -A && git commit -m "msg"` |
669
- | `/plan [on\|off]` | toggle read-only plan mode |
670
- | `/apply-plan` | force-approve a pending plan |
671
-
672
- **Keyboard**
673
-
674
- - `Enter` — submit
675
- - `Shift+Enter` / `Ctrl+J` — newline (multi-line paste also supported;
676
- `\` + Enter as a portable fallback)
677
- - `↑` / `↓` — walk prompt history while idle; navigate slash-autocomplete
678
- - `Tab` / `Enter` on a `/foo` prefix — accept the highlighted suggestion
679
- - `Esc` — abort the current turn (stops the API call, cancels any
680
- in-flight tool, rejects pending MCP requests)
681
- - `y` / `n` on confirm prompts — hotkey accept / reject
682
-
683
- </details>
684
-
685
- ---
686
-
687
- ## Sessions and safety nets
688
-
689
- - Sessions live as JSONL under `~/.reasonix/sessions/<name>.jsonl`
690
- (per directory for `reasonix code`). Every message appended
691
- atomically; `Ctrl+C` never loses context.
692
- - Tool results are capped at 32k chars per call. Oversized sessions
693
- self-heal on load (shrinks + rewrites the file).
694
- - Malformed `assistant.tool_calls` / `tool` pairing is validated on
695
- every outgoing API call so a corrupted session can't keep 400ing.
696
- - Context gauge turns yellow at 50%, red at 80% with a `/compact`
697
- nudge. Approaching the 1M-token window (V4 flash + pro) triggers an
698
- automatic compaction attempt before falling back to a forced summary.
699
- - The `reasonix code` sandbox refuses any path that resolves outside
700
- the launch directory, including symlink escape and `..` traversal.
701
-
702
- ### Troubleshooting: duplicate rows / ghost rendering
703
-
704
- Some Windows terminals (Git Bash / MINTTY / winpty-wrapped shells)
705
- don't fully implement the ANSI cursor-up escapes Ink uses to repaint
706
- the live spinner region. Symptom: spinners, streaming previews, or
707
- tool-result rows print multiple copies into scrollback instead of
708
- overwriting in place.
709
-
710
- If you hit this, run with plain mode:
711
-
712
- ```bash
713
- REASONIX_UI=plain npx reasonix code
714
- ```
715
-
716
- Plain mode suppresses live/animated rows and disables the internal
717
- tick timer. You lose the streaming preview and spinners but gain
718
- stable scrollback. Windows Terminal, PowerShell 7 in Windows
719
- Terminal, and WezTerm don't need this opt-out.
720
-
721
- ---
722
-
723
- ## Web search — on by default
724
-
725
- The model has two web tools the moment you launch: `web_search` and
726
- `web_fetch`. No flag, no API key, no signup. When you ask about
727
- something the model wasn't trained on (new releases, current events,
728
- obscure APIs), it decides to call `web_search` on its own; if a
729
- snippet isn't enough it follows up with `web_fetch`.
730
-
731
- Backed by **Mojeek**'s public search page — an independent web
732
- index, bot-friendly, no cookies/sessions. Coverage on niche or very
733
- recent queries can be thinner than Google/Bing, but it's reliable
734
- from scripts. (DDG was the original backend but started serving
735
- anti-bot pages in 2026.)
736
-
737
- **Turn it off** (offline mode / privacy / CI):
738
-
739
- ```json
740
- // ~/.reasonix/config.json
741
- { "apiKey": "sk-…", "search": false }
742
- ```
743
-
744
- ```bash
745
- REASONIX_SEARCH=off npx reasonix code
746
- ```
747
-
748
- **Bring your own** (Kagi, SearXNG, internal caches): implement the
749
- `WebSearchProvider` interface and call
750
- `registerWebTools(registry, { provider })` yourself, or bridge an
751
- existing MCP search server via `--mcp`.
752
-
753
- ---
754
-
755
- ## MCP — bring your own tools
756
-
757
- Any [MCP](https://spec.modelcontextprotocol.io/) server works. The
758
- wizard lets you pick from a catalog, or drive it by flag:
759
-
760
- ```bash
761
- # stdio (local subprocess)
762
- npx reasonix --mcp "fs=npx -y @modelcontextprotocol/server-filesystem /tmp/safe"
763
-
764
- # multiple at once
765
- npx reasonix \
766
- --mcp "fs=npx -y @modelcontextprotocol/server-filesystem /tmp/safe" \
767
- --mcp "demo=npx tsx examples/mcp-server-demo.ts"
768
-
769
- # HTTP+SSE (remote / hosted)
770
- npx reasonix --mcp "kb=https://mcp.example.com/sse"
771
- ```
772
-
773
- `reasonix mcp list` shows the curated catalog. `reasonix mcp inspect
774
- <spec>` connects once and dumps the server's tools / resources /
775
- prompts without starting a chat. Progress notifications from
776
- long-running tools (2025-03-26 spec) render live as a progress bar
777
- in the spinner.
778
-
779
- Supported transports: **stdio** (local command) and **HTTP+SSE**
780
- (remote, MCP 2024-11-05 spec).
781
-
782
- ---
783
-
784
- ## CLI reference
785
-
786
- <details>
787
- <summary><strong>Commands, flags, env vars</strong> (click to expand)</summary>
788
-
789
- ```bash
790
- npx reasonix code [path] # coding mode scoped to path (default: cwd)
791
- npx reasonix # chat (uses saved config)
792
- npx reasonix setup # reconfigure the wizard
793
- npx reasonix chat --session work # named session
794
- npx reasonix chat --no-session # ephemeral
795
- npx reasonix run "ask anything" # one-shot, streams to stdout
796
- npx reasonix stats session.jsonl # summarize a transcript
797
- npx reasonix replay chat.jsonl # rebuild cost/cache from a transcript
798
- npx reasonix diff a.jsonl b.jsonl --md # compare two transcripts
799
- npx reasonix mcp list # curated MCP catalog
800
- npx reasonix mcp inspect <spec> # probe a single MCP server
801
- npx reasonix sessions # list saved sessions
802
- ```
803
-
804
- Common flags:
805
-
806
- ```bash
807
- --preset <auto|flash|pro> # model commitment (auto / locked-flash / locked-pro)
808
- --model <id> # explicit model id
809
- --harvest / --no-harvest # R1 plan-state extraction
810
- --branch <N> # self-consistency budget
811
- --mcp "name=cmd args…" # attach an MCP server (repeatable)
812
- --transcript path.jsonl # write a JSONL transcript on the side
813
- --session <name> # named session (default: per-dir for code mode)
814
- --no-session # ephemeral
815
- --no-config # ignore ~/.reasonix/config.json (CI-friendly)
816
- ```
817
-
818
- Env vars (win over config):
819
-
820
- ```bash
821
- export DEEPSEEK_API_KEY=sk-...
822
- export DEEPSEEK_BASE_URL=https://... # optional alternate endpoint
823
- export REASONIX_MEMORY=off # disable REASONIX.md + user memory
824
- export REASONIX_SEARCH=off # disable web_search / web_fetch
825
- export REASONIX_UI=plain # disable live rows (ghosting workaround)
826
- ```
827
-
828
- </details>
829
-
830
- ---
831
-
832
- ## Library usage
833
-
834
- <details>
835
- <summary><strong>Programmatic API — embed reasonix in your own Node project</strong> (click to expand)</summary>
836
-
837
-
838
- ```ts
839
- import {
840
- CacheFirstLoop,
841
- DeepSeekClient,
842
- ImmutablePrefix,
843
- ToolRegistry,
844
- } from "reasonix";
845
-
846
- const client = new DeepSeekClient(); // reads DEEPSEEK_API_KEY from env
847
- const tools = new ToolRegistry();
848
-
849
- tools.register({
850
- name: "add",
851
- description: "Add two integers",
852
- parameters: {
853
- type: "object",
854
- properties: { a: { type: "integer" }, b: { type: "integer" } },
855
- required: ["a", "b"],
856
- },
857
- fn: ({ a, b }: { a: number; b: number }) => a + b,
858
- });
859
-
860
- const loop = new CacheFirstLoop({
861
- client,
862
- tools,
863
- prefix: new ImmutablePrefix({
864
- system: "You are a math helper.",
865
- toolSpecs: tools.specs(),
866
- }),
867
- harvest: true,
868
- branch: 3,
869
- });
870
-
871
- for await (const ev of loop.step("What is 17 + 25?")) {
872
- if (ev.role === "assistant_final") console.log(ev.content);
873
- }
874
- console.log(loop.stats.summary());
875
- ```
876
-
877
- `ChatOptions.seedTools` accepts a pre-built `ToolRegistry` for
878
- callers who want the `reasonix code` loop wiring without the CLI
879
- wrapper. See [docs/ARCHITECTURE.md](docs/ARCHITECTURE.md) for
880
- internals.
881
-
882
- </details>
883
-
884
- ---
885
-
886
- ## Benchmarks — verify the cache-hit claim yourself
887
-
888
- Every abstraction here earns its weight against a DeepSeek-specific
889
- property — dirt-cheap tokens, R1 reasoning traces, automatic prefix
890
- caching, JSON mode. Generic wrappers leave these on the table.
891
-
892
- | | Reasonix default | generic frameworks |
893
- |---|---|---|
894
- | Prefix-stable loop (→ 85–95% cache hit) | yes | no (prompts rebuilt each turn) |
895
- | Auto-flatten deep tool schemas | yes | no (DeepSeek drops args) |
896
- | Retry with jittered backoff (429/503) | yes | no (custom callbacks) |
897
- | Scavenge tool calls leaked into `<think>` | yes | no |
898
- | Call-storm breaker on identical-arg repeats | yes | no |
899
- | Live cache-hit / cost / vs-Claude panel | yes | no |
900
-
901
- On the same τ-bench-lite workload — 8 multi-turn tool-use tasks × 3
902
- repeats = 48 runs per side, live DeepSeek `deepseek-chat`, sole
903
- variable prefix stability:
904
-
905
- | metric | baseline (cache-hostile) | Reasonix | delta |
906
- |---|---:|---:|---:|
907
- | cache hit | 32.8% | **90.2%** | +57.4 pp |
908
- | cost / task | $0.000992 | $0.000593 | **−40%** |
909
- | pass rate | 100% (24/24) | **100% (24/24)** | — |
910
-
911
- **Reproduce without spending an API credit:**
912
-
913
- ```bash
914
- git clone https://github.com/esengine/reasonix.git && cd reasonix && npm install
915
- npx reasonix replay benchmarks/tau-bench/transcripts/t01_address_happy.reasonix.r1.jsonl
916
- npx reasonix diff \
917
- benchmarks/tau-bench/transcripts/t01_address_happy.baseline.r1.jsonl \
918
- benchmarks/tau-bench/transcripts/t01_address_happy.reasonix.r1.jsonl
149
+ git clone https://github.com/esengine/reasonix.git
150
+ cd reasonix
151
+ npm install
152
+ npm run dev code # run from source via tsx
153
+ npm run verify # lint + typecheck + 1665 tests
919
154
  ```
920
155
 
921
- The committed JSONL transcripts carry per-turn `usage`, `cost`, and
922
- `prefixHash`. Reasonix's prefix hash stays byte-stable across every
923
- model call; baseline's churns on every turn. The cache delta is
924
- *mechanically* attributable to log stability, not to a different
925
- system prompt.
926
-
927
- Full 48-run report:
928
- [`benchmarks/tau-bench/report.md`](./benchmarks/tau-bench/report.md).
929
- Reproduce with your own API key: `npx tsx
930
- benchmarks/tau-bench/runner.ts --repeats 3`.
931
-
932
- MCP reference runs (one single prefix hash across all 5 turns even
933
- with two concurrent MCP subprocesses):
934
-
935
- | server | turns | cache hit | cost | vs Claude |
936
- |---|---:|---:|---:|---:|
937
- | bundled demo (`add` / `echo` / `get_time`) | 2 | **96.6%** (turn 2) | $0.000254 | −94.0% |
938
- | official `server-filesystem` | 5 | **96.7%** | $0.001235 | −97.0% |
939
- | **both concurrently** | 5 | **81.1%** | $0.001852 | −95.9% |
940
-
941
156
  ---
942
157
 
943
158
  ## Non-goals
944
159
 
945
- - **Multi-agent orchestration / sub-agents** (use LangGraph).
946
- - **Workflow DSL / DAG scheduler / parallel-branch engine** skills
947
- are prose; the model sequences via the normal tool-use loop.
948
- Keeps single-loop + append-only + cache-first invariants intact.
949
- - **Multi-provider abstraction** (use LiteLLM). Reasonix is
950
- DeepSeek-only on purpose — every pillar (cache-first loop, R1
951
- harvesting, tool-call repair) is tuned against DeepSeek-specific
952
- behavior and economics. Coupling to one backend is the feature.
953
- - **RAG / vector stores** (use LlamaIndex).
954
- - **Web UI / SaaS.**
955
-
956
- Reasonix does DeepSeek, deeply.
957
-
958
- ---
959
-
960
- ## Development
961
-
962
- ```bash
963
- git clone https://github.com/esengine/reasonix.git
964
- cd reasonix
965
- npm install
966
- npm run dev code # run CLI from source via tsx
967
- npm run build # tsup to dist/
968
- npm test # vitest (1482 tests)
969
- npm run lint # biome
970
- npm run typecheck # tsc --noEmit
971
- ```
160
+ - **Multi-provider flexibility.** DeepSeek-only on purpose — every layer is tuned around DeepSeek's specific cache mechanic and pricing. Coupling to one backend is the feature.
161
+ - **IDE integration.** Terminal-first; the diff lives in `git diff`, the file tree in `ls`. The dashboard is a companion, not a Cursor replacement.
162
+ - **Hardest-leaderboard reasoning.** Claude Opus still wins some benchmarks. DeepSeek V4 is competitive on coding; if your work is "solve this PhD proof" rather than "fix this auth bug," start with Claude.
163
+ - **Air-gapped / fully-free.** DeepSeek's API has free credit on signup but isn't free forever. For air-gapped, see Aider + Ollama or [Continue](https://continue.dev).
972
164
 
973
165
  ---
974
166
 
975
167
  ## License
976
168
 
977
- MIT
169
+ MIT — see [LICENSE](./LICENSE).