reasonix 0.18.1 → 0.20.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -3,975 +3,121 @@
3
3
  </p>
4
4
 
5
5
  <p align="center">
6
- <em>Cache-first agent loop for DeepSeek V4 (flash + pro) — Ink TUI, MCP first-class, no LangChain.</em>
6
+ <strong>English</strong> · <a href="./README.zh-CN.md">简体中文</a> · <a href="https://esengine.github.io/reasonix/">Website</a>
7
7
  </p>
8
8
 
9
9
  <p align="center">
10
- <strong>English</strong> · <a href="./README.zh-CN.md">简体中文</a> · <a href="https://esengine.github.io/reasonix/">Website</a>
10
+ <a href="https://www.npmjs.com/package/reasonix"><img src="https://img.shields.io/npm/v/reasonix.svg" alt="npm version"/></a>
11
+ <a href="https://github.com/esengine/reasonix/actions/workflows/ci.yml"><img src="https://github.com/esengine/reasonix/actions/workflows/ci.yml/badge.svg" alt="CI"/></a>
12
+ <a href="./LICENSE"><img src="https://img.shields.io/npm/l/reasonix.svg" alt="license"/></a>
13
+ <a href="https://www.npmjs.com/package/reasonix"><img src="https://img.shields.io/npm/dm/reasonix.svg" alt="downloads"/></a>
14
+ <a href="./package.json"><img src="https://img.shields.io/node/v/reasonix.svg" alt="node"/></a>
15
+ <a href="https://github.com/esengine/reasonix/stargazers"><img src="https://img.shields.io/github/stars/esengine/reasonix.svg?style=flat&logo=github&label=stars" alt="GitHub stars"/></a>
16
+ <a href="https://github.com/esengine/reasonix/discussions"><img src="https://img.shields.io/github/discussions/esengine/reasonix.svg?logo=github&label=discussions" alt="Discussions"/></a>
11
17
  </p>
12
18
 
13
- # Reasonix
14
-
15
- [![npm version](https://img.shields.io/npm/v/reasonix.svg)](https://www.npmjs.com/package/reasonix)
16
- [![CI](https://github.com/esengine/reasonix/actions/workflows/ci.yml/badge.svg)](https://github.com/esengine/reasonix/actions/workflows/ci.yml)
17
- [![license](https://img.shields.io/npm/l/reasonix.svg)](./LICENSE)
18
- [![downloads](https://img.shields.io/npm/dm/reasonix.svg)](https://www.npmjs.com/package/reasonix)
19
- [![node](https://img.shields.io/node/v/reasonix.svg)](./package.json)
19
+ <p align="center">
20
+ <strong>A DeepSeek-native AI coding agent for your terminal.</strong> Engineered around DeepSeek's prefix-cache, so the savings are real and the loop stays cheap enough to leave on.
21
+ </p>
20
22
 
21
- **A DeepSeek-native AI coding agent in your terminal.** ~30× cheaper
22
- per task than Claude Code, with a cache-first loop engineered for
23
- DeepSeek's pricing model. Edits as reviewable SEARCH/REPLACE blocks.
24
- MIT-licensed. No IDE lock-in. MCP first-class.
23
+ <p align="center">
24
+ <img src="docs/assets/hero-stats.svg" alt="94% live prefix-cache hit · ~30× cheaper per task vs Claude Code · MIT terminal-native" width="860"/>
25
+ </p>
25
26
 
26
27
  ---
27
28
 
28
- ## Quick start (60 seconds)
29
-
30
- **1. Get a DeepSeek API key.** Free credit on signup:
31
- <https://platform.deepseek.com/api_keys>
32
-
33
- **2. Point it at a project.** No install needed.
29
+ ## Quick start
34
30
 
35
31
  ```bash
36
32
  cd my-project
37
- npx reasonix code
38
- ```
39
-
40
- First run walks you through a 30-second wizard (paste API key → pick
41
- preset → multi-select MCP servers). Every run after that drops you
42
- straight in.
43
-
44
- **3. Ask it to edit.** The model proposes edits as SEARCH/REPLACE
45
- blocks — nothing hits disk until you `/apply`.
46
-
33
+ npx reasonix code # paste a DeepSeek API key on first run; persists after
47
34
  ```
48
- reasonix code › users.ts 里 findByEmail 对大小写敏感导致登录失败,帮我改
49
35
 
50
- assistant
51
- ▸ tool<search_files> src/users.ts, src/users.test.ts
52
- ▸ tool<read_file> → (src/users.ts, 412 chars)
53
- ▸ 找到了。findByEmail 直接用 === 比对。改成小写规范化并补一条测试。
54
-
55
- src/users.ts
56
- <<<<<<< SEARCH
57
- return users.find(u => u.email === email);
58
- =======
59
- const needle = email.toLowerCase();
60
- return users.find(u => u.email.toLowerCase() === needle);
61
- >>>>>>> REPLACE
62
-
63
- ▸ 1 pending edit across 1 file — /apply to write · /discard to drop
64
-
65
- reasonix code › /apply
66
- ▸ ✓ applied src/users.ts
67
- ```
36
+ <p align="center">
37
+ <img src="docs/assets/hero-terminal.svg" alt="Reasonix code mode — assistant proposes a SEARCH/REPLACE edit; nothing on disk until /apply" width="860"/>
38
+ </p>
68
39
 
69
- Requires Node ≥ 20.10. macOS, Linux, Windows (PowerShell / Git Bash /
70
- Windows Terminal). Press `Esc` anytime to abort; `/help` for the full
71
- command list.
40
+ Requires Node ≥ 22. Tested on macOS, Linux, and Windows (PowerShell, Git Bash, Windows Terminal). Get a [DeepSeek API key →](https://platform.deepseek.com/api_keys) · `reasonix code --help` for flags.
72
41
 
73
42
  ---
74
43
 
75
- ## At a glance
44
+ ## How it compares
76
45
 
77
- | | Reasonix | Claude Code | Cursor | Aider |
78
- |------------------------------------|----------------|----------------|----------------|----------------|
79
- | Backend | DeepSeek V4 | Anthropic | OpenAI / Anthropic | any |
80
- | Cost / typical task | **~$0.001–0.005** | ~$0.050.50 | $20/mo + usage | varies |
81
- | Where it runs | terminal | terminal + IDE | IDE (Electron) | terminal |
82
- | License | **MIT** | closed | closed | Apache 2 |
83
- | DeepSeek prefix-cache hit rate | **90.2%** | n/a | n/a | ~33% |
84
- | Reviewable edits (no auto-write) | **yes** (`/apply`) | yes | partial | yes |
85
- | MCP servers | **first-class**| first-class | — | — |
46
+ | | Reasonix | Claude Code | Cursor | Aider |
47
+ |-----------------------------------|------------------|-----------------|--------------------|------------------|
48
+ | Backend | DeepSeek V4 | Anthropic | OpenAI / Anthropic | any (OpenRouter) |
49
+ | **Cost / typical task** | **~¥0.01–0.04** | 0.404 | ¥150/mo + usage | varies |
50
+ | License | **MIT** | closed | closed | Apache 2 |
51
+ | **DeepSeek prefix-cache hit** | **94%** (live) | n/a | n/a | ~33% (baseline) |
52
+ | Embedded web dashboard | yes | | n/a (IDE) | |
53
+ | Persistent per-workspace sessions | yes | partial | n/a | — |
86
54
 
87
- Numbers from `benchmarks/tau-bench-lite` (8 multi-turn coding tasks ×
88
- 3 repeats, live `deepseek-chat`). Same workload, sole variable is
89
- prefix stability — committed transcripts in [`benchmarks/`](./benchmarks/).
90
- The full feature comparison [is below](#why-reasonix-vs-cursor--claude-code--cline--aider).
55
+ Plan mode, edit review, MCP, skills, hooks, and sandboxing are all `yes` for Reasonix and most peers — see the feature grid below for what they actually do here.
91
56
 
92
- ---
93
-
94
- ## Web dashboard
95
-
96
- Type `/dashboard` inside any session and Reasonix prints a localhost
97
- URL with a one-time token. Open it for a 13-tab control surface that
98
- mirrors the running TUI — chat (with live streaming), the editor (file
99
- tree + CodeMirror, syntax highlighting + autocomplete + side-by-side
100
- diff for pending edits), Usage / Sessions / Plans / Tools /
101
- Permissions / System / MCP / Skills / Memory / Hooks / Settings.
57
+ Numbers from `benchmarks/tau-bench-lite` (8 multi-turn tasks × 3 repeats, live `deepseek-chat`). [Committed transcripts →](./benchmarks/)
102
58
 
103
- ```
104
- reasonix code /dashboard
105
- ▸ http://127.0.0.1:54219/?token=… (open in browser)
106
- ```
59
+ <details>
60
+ <summary><strong>Why DeepSeek-only? the cache economics</strong></summary>
107
61
 
108
- 127.0.0.1 only, ephemeral token expires when the session ends, every
109
- mutation is CSRF-checked. The TUI keeps working — modals (shell
110
- confirms, plan reviews, edit gates) mirror to whichever surface you
111
- look at first. No build step, no Electron, no separate process to
112
- keep alive.
62
+ Cheap tokens alone is half the story. DeepSeek's prefix-cache is **byte-stable**: the cache fingerprints from byte 0 of the prompt. Reasonix's loop is engineered around that — append-only growth, no re-ordering, no marker-based compaction — so the cache prefix survives every tool call.
113
63
 
114
- ---
64
+ By comparison, Claude Code is built around Anthropic's `cache_control` markers (a fundamentally different mechanic). Pointing it at DeepSeek's Anthropic-compat endpoint keeps the cheap tokens but loses the cache hits — markers are ignored, and the underlying prefix isn't byte-stable. Generic-backend tools (Aider / Cline / Continue) hit the same wall from the other direction: their compaction patterns destroy byte stability.
115
65
 
116
- ## Why Reasonix? (vs Cursor / Claude Code / Cline / Aider)
117
-
118
- Three things you'd come to Reasonix for, that nothing else combines:
119
-
120
- - **Cost economics that land in your bill.** DeepSeek V4 is ~30×
121
- cheaper than Claude Sonnet per token. Cheap tokens alone isn't the
122
- win — *cheap tokens with a 90%+ prefix-cache hit* is. Reasonix's
123
- loop is engineered around append-only prompt growth so the
124
- cache-stable prefix survives every tool call. The benchmarks
125
- section verifies this end-to-end: 90.2% live cache hit, versus
126
- 32.8% for a generic harness on the same workload. The `/stats`
127
- panel surfaces "vs Claude Sonnet 4.6" savings on every turn.
128
-
129
- - **It lives in your terminal.** Pure CLI — no Electron, no VS Code
130
- extension, no IDE plugin to wedge into your editor. Sits next to
131
- git, tmux, and your shell history. macOS / Linux / Windows
132
- (PowerShell, Git Bash, Windows Terminal all tested). The only
133
- network call is to the DeepSeek API itself; no vendor server in
134
- the middle.
135
-
136
- - **Open source and hackable, end to end.** MIT-licensed TypeScript.
137
- The entire loop, tool registry, cache-stable prefix, TUI, MCP
138
- bridge — all in `src/` under 30k lines. Fork it, ship a private
139
- build, drop it into CI. No SaaS layer, no enterprise tier, no
140
- feature gates.
141
-
142
- | | Reasonix | Claude Code | Cursor | Cline | Aider |
143
- |---|---|---|---|---|---|
144
- | Backend | DeepSeek V4 only | Anthropic only | OpenAI / Anthropic | any (OpenRouter) | any (OpenRouter) |
145
- | Cost / typical task | **~$0.001–$0.005** | ~$0.05–$0.50 | $20/mo + usage | varies | varies |
146
- | Where it runs | terminal | terminal + IDE | IDE (Electron) | VS Code only | terminal |
147
- | License | **MIT** | closed | closed | Apache 2 | Apache 2 |
148
- | Cache-first prefix loop | **engineered (94% hit)** | basic | n/a | n/a | basic |
149
- | MCP servers | **first-class** | first-class | — | beta | — |
150
- | Plan mode (read-only audit gate) | **yes** | yes | — | yes | — |
151
- | User-authored skills | **yes** | yes | — | — | — |
152
- | Edit review (no auto-write) | **yes** (`/apply`) | yes | partial | yes | yes |
153
- | Workspace switch (`/cwd`, `change_workspace`) | **yes** | — | n/a (per-window) | — | — |
154
- | Cross-session cost dashboard | **yes** (`/stats`) | — | — | — | — |
155
- | Sandbox boundary enforcement | **strict** (refuses `..` escape) | yes | partial | yes | partial |
66
+ At DeepSeek's pricing $0.07/Mtok uncached, $0.014/Mtok cached **the difference between 50% and 94% hit is roughly 2.5× on input cost alone.** Same model, same API; the loop's invariants are what changed.
156
67
 
157
- <details>
158
- <summary><strong>When reasonix is the wrong choice · DeepSeek/Anthropic-compat caveats · vs Aider/Cline/Continue</strong></summary>
159
-
160
- ### Pick something else when
161
-
162
- - **You want multi-provider flexibility** (mix Claude / GPT / Gemini /
163
- local Llama in one tool). Try [Aider](https://aider.chat) or
164
- [Cline](https://cline.bot). Reasonix is DeepSeek-only on purpose —
165
- every layer (cache-first loop, R1 harvesting, JSON-mode tool repair,
166
- reasoning-effort cap) is tuned against DeepSeek-specific behavior
167
- and economics. Coupling to one backend is the feature, not a
168
- limitation we'll grow out of.
169
- - **You want IDE integration** (inline diff in your gutter,
170
- multi-cursor, ghost text, refactor previews). Try
171
- [Cursor](https://cursor.com) or Claude Code's IDE mode. Reasonix
172
- is terminal-first; the diff lives in `git diff`, the file tree
173
- lives in `ls`, the chat lives in your shell.
174
- - **You're chasing the hardest reasoning benchmarks.** Claude Opus
175
- 4.6 still wins some leaderboards. DeepSeek V4-pro is competitive
176
- on most coding tasks but doesn't lead every benchmark. If your
177
- task is "solve this PhD-level proof" rather than "fix this auth
178
- bug," start with Claude.
179
- - **You need fully-local / fully-free**. DeepSeek's API has free
180
- credit on signup, but isn't free forever. For air-gapped or
181
- always-free, look at Aider + Ollama or [Continue](https://continue.dev).
182
-
183
- ### "But DeepSeek now has an Anthropic-compatible API — can't I just point Claude Code at it?"
184
-
185
- You can. DeepSeek ships an official Anthropic-compatible endpoint at
186
- `https://api.deepseek.com/anthropic`, and Claude Code (or any Anthropic
187
- SDK client) talks to it without modification. The protocol works. The
188
- **caching economics** don't transfer, and that's the whole point.
189
-
190
- Look at DeepSeek's [own compatibility table](https://api-docs.deepseek.com/guides/anthropic_api):
191
-
192
- | Field | Status on DeepSeek's compat endpoint |
193
- |---|---|
194
- | `cache_control` markers | **Ignored** |
195
- | `mcp_servers` (API-level) | Ignored |
196
- | `thinking.budget_tokens` | Ignored |
197
- | Images / documents / citations | Not supported |
198
-
199
- `cache_control: Ignored` is the load-bearing line. Two completely
200
- different cache mechanics are colliding here:
201
-
202
- | | Anthropic native | DeepSeek auto-cache |
203
- |---|---|---|
204
- | Model | **Marker-based.** You put `cache_control` on a message; Anthropic caches "everything up to this marker" as a content-addressed unit. Multiple markers = multiple independent breakpoints. | **Byte-stable prefix.** The cache fingerprints the literal byte stream from byte 0. |
205
- | Claude Code's design | Built around this. Markers on system prompt + tool defs let the loop reorder, compact, or insert metadata after the markers without losing the cache. | n/a — Claude Code wasn't designed for byte-stable prefixes. |
206
- | What happens when Claude Code → DeepSeek compat | Markers stripped (ignored). Claude Code's main caching strategy disappears. | Falls back to auto-cache. But Claude Code's prefix isn't byte-stable (markers were the *substitute* for byte-stability), so auto-cache misses too. |
207
-
208
- Net effect: **Claude Code's loop, redirected at DeepSeek, gets the
209
- cheap tokens and loses the cache hit it depended on.** A loop running
210
- at 80%+ cache hit on Anthropic's marker cache lands somewhere in the
211
- 40-60% range on DeepSeek's auto-cache (matches the generic-harness
212
- baseline in our benchmarks). Same model, same API, same workload —
213
- the loop's invariants don't fit the cache mechanic it's now talking
214
- to.
215
-
216
- Reasonix's loop was designed around byte-stable prefix from line one.
217
- No markers, no breakpoints — append-only is the invariant. That's why
218
- the same τ-bench workload lands at **90.2% cache hit** on Reasonix
219
- and **32.8%** on a cache-hostile baseline (committed transcripts;
220
- benchmarks section below). At DeepSeek's pricing — $0.07/Mtok
221
- uncached, ~$0.014/Mtok cached — the difference between 50% and 94%
222
- hit is **roughly 2.5× on input cost alone**.
223
-
224
- ### "What about Aider / Cline / Continue?"
225
-
226
- They support DeepSeek natively (no compat layer needed) and you do
227
- get the cheap token price. What you don't get is the DeepSeek-
228
- specific loop work — those tools' loops support every backend
229
- generically (OpenAI / Anthropic / local Llama / ...) and use
230
- compaction + summarization patterns that destroy byte-stability. They
231
- land in the same 40-60% cache-hit range as the baseline. Plus a
232
- handful of DeepSeek-specific quirks generic loops don't handle:
68
+ A few DeepSeek-specific fixes generic loops miss:
233
69
 
234
70
  | Generic loops assume | DeepSeek actually does | Reasonix's fix |
235
71
  |---|---|---|
236
- | Reasoning emitted as a structured `thinking` block | R1 sometimes leaks tool-call JSON inside `<think>` tags | a `scavenge` pass that pulls escaped tool calls back out, otherwise the model thinks it called and waits for output that never comes |
237
- | Tool schemas validated strictly | DeepSeek silently drops deeply-nested object/array params | auto-flatten — nested params get rewritten to single-level prefixed names so the model sees them at all |
238
- | Tool-call args are well-formed JSON | DeepSeek occasionally produces `string="false"` and other malformed fragments | dedicated `ToolCallRepair` heals the common shapes before they hit dispatch |
239
- | Reasoning depth tuned via system-level switches | V4 exposes a `reasoning_effort` knob (`max` / `high`) | `/effort` slash + `--effort` flag, so users can step down for cheap turns |
240
- | Old tool results kept in full forever | 1M context — don't compact pre-emptively, but most agents do | call-storm breaker + result token cap, but the prefix is *never* rewritten; compaction lands as new turns at the tail |
241
-
242
- > Cache-stability isn't a feature you turn on; it's an invariant
243
- > the loop is designed around. Reasonix isn't yet-another agent
244
- > CLI — it's an agent CLI built around DeepSeek's specific cache
245
- > mechanic and pricing model.
246
-
247
- </details>
248
-
249
- ---
250
-
251
- ## `reasonix code` — pair programmer in your terminal
252
-
253
- Scoped to the directory you launch from. The model has native
254
- `read_file` / `write_file` / `edit_file` / `list_directory` /
255
- `search_files` / `directory_tree` / `get_file_info` /
256
- `create_directory` / `move_file` tools, all sandboxed — any path that
257
- resolves outside the launch root (including `..` and symlink escapes)
258
- is refused. Plus `run_command` with a read-only allowlist; anything
259
- state-mutating (`npm install`, `git commit`, …) is gated behind a
260
- confirmation picker.
261
-
262
- ### Walkthrough: explore before editing
263
-
264
- For "what does this code do?" questions the model uses the read-side
265
- tools and replies in prose — no SEARCH/REPLACE blocks, no file
266
- writes. Ask to change something only when you mean it:
267
-
268
- ```
269
- reasonix code › 这个项目的路由是怎么组织的?
270
- assistant
271
- ▸ tool<directory_tree> → (src/ tree, 47 entries)
272
- ▸ tool<read_file> → (src/router.ts, 1.2 KB)
273
- ▸ 路由分三层:顶层 AppRouter 注册 tab,每个 tab 用 React Router 的
274
- nested routes 写子路径,最后 …
275
- ```
276
-
277
- If an `edit_file` SEARCH block doesn't match the file byte-for-byte,
278
- the edit is refused loudly rather than fuzzy-matched. The model sees
279
- the error and retries — silent wrong edits are worse than visible
280
- rejections.
281
-
282
- ### Plan mode — review before executing
283
-
284
- For anything bigger than a typo, the model is encouraged to propose a
285
- markdown plan first. You'll see a picker with **Approve / Refine /
286
- Cancel**:
287
-
288
- ```
289
- reasonix code › 把 auth 从 JWT 迁移到 session cookies
290
-
291
- ▸ plan submitted — awaiting your review
292
- ────────────────────────────────────────
293
- ## Summary
294
- Swap JWT middleware for session cookies, keep user table intact.
295
-
296
- ## Files
297
- - src/auth/middleware.ts — replace `verifyJwt` with `readSession`
298
- - src/auth/session.ts — new file, in-memory store + signed cookie
299
- - src/routes/login.ts — return Set-Cookie instead of a token
300
- - tests/auth/*.test.ts — update fixtures
301
-
302
- ## Risks
303
- - Existing logged-in users get logged out (no migration).
304
- - Session store is in-memory; restart clears sessions.
305
- ────────────────────────────────────────
306
- ▸ Approve and implement
307
- Refine — explore more
308
- Cancel
309
- ```
310
-
311
- **Force it** with `/plan` — enters an explicit read-only phase where
312
- the model *must* submit a plan before any edit or non-allowlisted
313
- shell call will execute. Use for high-stakes changes you want to
314
- audit before the model touches disk. `/plan off` or picker
315
- Approve/Cancel exits.
316
-
317
- ### Prompt prefixes — `!cmd` and `@path`
318
-
319
- Two inline shortcuts that don't need a slash:
320
-
321
- **`!<cmd>` — run a shell command in the sandbox and feed it to the
322
- model.** Typed at the prompt, like bash. Output lands in the visible
323
- log AND in the session so the model's next turn reasons about it:
324
-
325
- ```
326
- reasonix code › !git status --short
327
- ▸ M src/users.ts
328
- ▸ M src/users.test.ts
329
-
330
- reasonix code › 把这两个文件的改动说明一下
331
- assistant
332
- ▸ tool<read_file> → src/users.ts, src/users.test.ts
333
- ▸ …
334
- ```
335
-
336
- No allowlist gate — user-typed shell = explicit consent. 60s timeout,
337
- 32k char cap, survives session resume.
338
-
339
- **`@path/to/file` — inline a file under "Referenced files."** Start
340
- typing `@` and a picker appears (↑/↓ navigate, Tab/Enter to insert).
341
- Good for "what does @src/users.ts do?" without making the model
342
- `read_file` it first. Sandboxed: relative paths only, no `..` escape,
343
- 64KB per-file cap. Recent files rank higher.
344
-
345
- ### `/commit` — stage + commit in one step
346
-
347
- ```
348
- reasonix code › /commit "fix: findByEmail case-insensitive"
349
- ▸ git add -A && git commit -m "fix: findByEmail case-insensitive"
350
- [main a1b2c3d] fix: findByEmail case-insensitive
351
- ```
352
-
353
- ### Things to try
354
-
355
- - `/tool 1` — dump the last tool call's full output (when the 400-char
356
- inline clip isn't enough).
357
- - `/think` — see the model's full reasoning for the last turn
358
- (thinking-mode models: v4-flash / v4-pro / reasoner alias).
359
- - `/undo` — roll back the last applied edit batch.
360
- - `/new` — start fresh in the same directory without losing the
361
- session file.
362
- - `/effort high` — step down from the default `max` agent-class
363
- reasoning_effort for cheaper/faster turns on simple tasks.
364
- - `npx reasonix code --preset pro` — v4-pro for the whole session,
365
- no auto-downgrade to flash. Pair with `--branch 3` if you want
366
- 3-way self-consistency on gnarly refactors.
367
- - `npx reasonix code src/` — narrower sandbox (only `src/` is
368
- writable).
369
- - `npx reasonix code --no-session` — ephemeral; nothing saved.
370
-
371
- ### `reasonix stats` — how much did you actually save?
372
-
373
- Every turn `reasonix chat|code|run` runs appends a compact record
374
- (tokens + cost + what Claude Sonnet 4.6 would have charged) to
375
- `~/.reasonix/usage.jsonl`. `reasonix stats` with no args rolls that
376
- log into today / week / month / all-time windows:
377
-
378
- ```
379
- Reasonix usage — /Users/you/.reasonix/usage.jsonl
380
-
381
- turns cache hit cost (USD) vs Claude saved
382
- ----------------------------------------------------------------------
383
- today 8 95.1% $0.004821 $0.1348 96.4%
384
- week 34 93.8% $0.023104 $0.6081 96.2%
385
- month 127 94.2% $0.081530 $2.1452 96.2%
386
- all-time 342 94.0% $0.210881 $5.8934 96.4%
387
- ```
388
-
389
- Privacy: only tokens, costs, and the session name you chose land
390
- in the file. No prompts, no completions, no tool arguments.
391
- `reasonix stats <transcript>` keeps the old per-file summary
392
- (assistant turns + tool calls) for scripts that already use it.
393
-
394
- ### Staying current
395
-
396
- The panel header shows the running version next to `Reasonix` (e.g.
397
- `Reasonix 0.12.6 · v4-flash · AUTO · max …`, the trailing `max` is
398
- the reasoning-effort badge — `/effort high` to step down).
399
- A quiet 24-hour background check against
400
- the npm registry surfaces a yellow `update: X.Y.Z` on the right side
401
- of the same row when a newer version has been published. No blocking,
402
- no nagging — the check runs once per day max and is silent on failure
403
- (offline, firewall, etc.).
404
-
405
- ```bash
406
- reasonix update # print current vs latest, run `npm i -g reasonix@latest`
407
- reasonix update --dry-run # print the plan without running anything
408
- ```
409
-
410
- Running via `npx`? The command detects that and prints a
411
- cache-refresh hint instead — npx picks up the newest version on
412
- its next invocation automatically.
413
-
414
- ### Project conventions — `REASONIX.md`
415
-
416
- Drop a `REASONIX.md` in the project root and its contents are pinned
417
- into the system prompt every launch. Committable team memory — house
418
- conventions, domain glossary, things the model keeps forgetting:
419
-
420
- ```bash
421
- cat > REASONIX.md <<'EOF'
422
- # Notes for Reasonix
423
- - Use snake_case for new Python modules; legacy camelCase modules keep their style.
424
- - `cargo check` is in the auto-run allowlist; full `cargo test` needs confirmation.
425
- - The `api/` dir mirrors `backend/` — keep schemas in sync.
426
- EOF
427
- ```
428
-
429
- Re-launch (or `/new`) to pick it up; the prefix is hashed once per
430
- session to keep the DeepSeek cache warm. `/memory` prints what's
431
- currently pinned. `REASONIX_MEMORY=off` disables every memory source
432
- for CI / offline repro.
433
-
434
- ### User memory — `~/.reasonix/memory/`
435
-
436
- A second, **private per-user** memory layer lives under your home
437
- directory. Unlike `REASONIX.md` it's never committed, and the model
438
- can write to it itself via the `remember` tool. Two scopes:
439
-
440
- - `~/.reasonix/memory/global/` — cross-project (your preferences,
441
- tooling).
442
- - `~/.reasonix/memory/<project-hash>/` — scoped to one sandbox root
443
- in `reasonix code` (decisions, local facts, per-repo shortcuts).
444
-
445
- Each scope keeps an always-loaded `MEMORY.md` index of one-liners
446
- plus zero or more `<name>.md` detail files (loaded on demand via
447
- `recall_memory`). Writes land immediately; pinning into the system
448
- prompt takes effect on next `/new` or launch so the cache prefix
449
- stays stable for the current session.
450
-
451
- ```
452
- reasonix code › 我用 bun 而不是 npm,请以后都用 bun 跑构建
453
-
454
- assistant
455
- ▸ tool<remember> → project/bun_build saved
456
- "Build command on this machine is `bun run build`"
457
- ```
458
-
459
- **Slash**: `/memory` · `/memory list` · `/memory show <name>` ·
460
- `/memory forget <name>` · `/memory clear <scope> confirm`.
461
- **Model tools**: `remember(type, scope, name, description, content)` ·
462
- `forget(scope, name)` · `recall_memory(scope, name)`.
463
-
464
- Project scope is only available inside `reasonix code` (needs a real
465
- sandbox root to hash); plain `reasonix` gets the global scope only.
466
-
467
- ### Skills — user-authored prompt packs
468
-
469
- Skills are prose instruction blocks you drop on disk. Reasonix pins
470
- their names + one-line descriptions into the system prompt; the
471
- model can call `run_skill({name: "..."})` on its own when a match
472
- fits, or you can type `/skill <name> [args]` to run one manually.
473
-
474
- Two scopes, same layout as user memory:
475
-
476
- - `<project>/.reasonix/skills/` — per-project skills (commit them to
477
- share with your team, or add to `.gitignore` for personal drafts).
478
- - `~/.reasonix/skills/` — global skills available everywhere.
479
-
480
- Either layout works: `<name>/SKILL.md` (preferred — can bundle
481
- additional assets alongside) or flat `<name>.md`.
482
-
483
- ```markdown
484
- ---
485
- name: review
486
- description: Review uncommitted changes and flag risks
487
- ---
488
-
489
- Run `git diff` on staged and unstaged changes. Summarize what each
490
- hunk does, call out potential regressions, and list files that might
491
- need additional tests. Don't propose edits unless I ask.
492
- ```
493
-
494
- Use it:
495
-
496
- ```
497
- reasonix code › /skill review
498
- ▸ running skill: review
499
- assistant
500
- ▸ tool<run_command> → git diff --cached
501
- ▸ 3 改动,1 个需要回归测试 …
502
- ```
72
+ | Reasoning emitted as a structured `thinking` block | R1 sometimes leaks tool-call JSON inside `<think>` tags | a `scavenge` pass that pulls escaped tool calls back out |
73
+ | Tool schemas validated strictly | DeepSeek silently drops deeply-nested object/array params | auto-flatten — nested params get rewritten to single-level prefixed names |
74
+ | Tool-call args are well-formed JSON | DeepSeek occasionally produces `string="false"` and other malformed fragments | dedicated `ToolCallRepair` heals the common shapes before dispatch |
75
+ | Reasoning depth tuned via system-level switches | V4 exposes a `reasoning_effort` knob (`max` / `high`) | `/effort` slash + `--effort` flag for cheap turns |
503
76
 
504
- Or let the model pick autonomously because the skill's name +
505
- description are pinned in the prefix, asking "帮我看下未提交的改动有没
506
- 有风险" triggers `run_skill({name: "review"})` without you typing the
507
- slash command.
508
-
509
- **Slash**: `/skill` (list) · `/skill show <name>` · `/skill <name>
510
- [args]` (inject body as user turn).
511
-
512
- **Deliberately not tied** to any other client's directory convention
513
- (`.claude/skills`, etc.) — Reasonix is model-agnostic at the
514
- conversation layer. Any SKILL.md you author works; the body is
515
- prose, so skills authored for other tools usually port over unchanged
516
- (Reasonix's tool names differ — `filesystem` / `shell` / `web` — but
517
- the model reads the instructions and picks our equivalents).
518
-
519
- ### Hooks — automate around tool calls and turns
520
-
521
- Drop a `settings.json` under `.reasonix/` (project or `~/`) and
522
- Reasonix will fire shell commands at four well-known points in
523
- the loop: before a tool runs, after a tool returns, before your
524
- prompt reaches the model, and after the turn ends.
525
-
526
- ```json
527
- // <project>/.reasonix/settings.json ← committable
528
- // ~/.reasonix/settings.json ← per-user
529
- {
530
- "hooks": {
531
- "PreToolUse": [{ "match": "edit_file|write_file", "command": "bun scripts/guard.ts" }],
532
- "PostToolUse": [{ "match": "edit_file", "command": "biome format --write" }],
533
- "UserPromptSubmit": [{ "command": "echo $(date +%s) >> ~/.reasonix/prompts.log" }],
534
- "Stop": [{ "command": "bun test --run", "timeout": 60000 }]
535
- }
536
- }
537
- ```
538
-
539
- Each hook is a shell command. Reasonix invokes it with stdin = a
540
- JSON envelope describing the event:
541
-
542
- ```json
543
- { "event": "PreToolUse", "cwd": "/path/to/project",
544
- "toolName": "edit_file", "toolArgs": { "path": "src/x.ts", "..." } }
545
- ```
546
-
547
- Exit code drives the decision:
548
-
549
- - **0** — pass; loop continues normally
550
- - **2** — block (only on `PreToolUse` / `UserPromptSubmit`); the
551
- hook's stderr becomes the synthetic tool result the model sees,
552
- or the prompt is dropped entirely
553
- - **anything else** — warn; loop continues, stderr renders as a
554
- yellow row inline
555
-
556
- `match` is anchored regex on the tool name; `*` or omitted matches
557
- every tool. Project hooks fire before global hooks. Default
558
- timeouts: 5s for blocking events, 30s for logging events; per-hook
559
- `timeout` overrides.
560
-
561
- **Slash**: `/hooks` (list active hooks) · `/hooks reload` (re-read
562
- `settings.json` from disk without losing your session).
563
-
564
- ### Staying current from inside the TUI
565
-
566
- `/update` inside a running session shows your current version, the
567
- last-resolved latest version (from the quiet 24h background check),
568
- and the shell command to run. The slash does *not* spawn
569
- `npm install` — stdio:inherit into a running Ink renderer corrupts
570
- the display. Exit the session and run `reasonix update` in a
571
- fresh shell when you actually want to install.
572
-
573
- ---
574
-
575
- ## `reasonix` — also works as general chat
576
-
577
- Same TUI, no filesystem tools unless you opt in via MCP. Good for
578
- drafting, Q&A, schema design, architecture discussions, or driving
579
- your own MCP servers. Sessions persist per name under
580
- `~/.reasonix/sessions/`.
581
-
582
- ```bash
583
- npx reasonix # uses saved config + wizard-selected MCP
584
- npx reasonix --preset pro # pin v4-pro for the whole run (no auto-downgrade)
585
- npx reasonix --session design # named session — resume later with --session design
586
- ```
587
-
588
- Bridge your own MCP servers on the fly:
589
-
590
- ```bash
591
- npx reasonix \
592
- --mcp "fs=npx -y @modelcontextprotocol/server-filesystem /tmp/safe" \
593
- --mcp "kb=https://mcp.example.com/sse"
594
- ```
595
-
596
- MCP tools go through the same Cache-First + repair + context-safety
597
- plumbing as native tools — 32k result cap, live progress-notification
598
- rendering, retries.
599
-
600
- ---
601
-
602
- ## Commands inside the session
603
-
604
- <details>
605
- <summary><strong>Slash command reference</strong> (click to expand)</summary>
606
-
607
- **Core**
608
-
609
- | command | what it does |
610
- |---|---|
611
- | `/help` · `/?` | full command reference with hints |
612
- | `/status` | current model · flags · context · session |
613
- | `/new` · `/reset` | fresh conversation in the same session |
614
- | `/clear` | clear visible scrollback only (log kept) |
615
- | `/retry` | truncate and resend your last message (fresh sample) |
616
- | `/exit` · `/quit` | quit |
617
-
618
- **Model**
619
-
620
- | command | what it does |
621
- |---|---|
622
- | `/preset <auto\|flash\|pro>` | model commitment — `auto` = flash with escalation, `flash` = locked flash, `pro` = locked pro |
623
- | `/model <id>` | switch DeepSeek model (`deepseek-v4-flash`, `deepseek-v4-pro`, plus `deepseek-chat` / `deepseek-reasoner` compat aliases) |
624
- | `/models` | list live models from DeepSeek `/models` endpoint |
625
- | `/harvest [on\|off]` | toggle R1 plan-state extraction |
626
- | `/branch <N\|off>` | run N parallel samples per turn, pick best (N ≥ 2) |
627
- | `/effort <high\|max>` | reasoning_effort cap — `max` is the agent default, `high` is cheaper/faster |
628
- | `/think` | dump the last turn's full thinking-mode reasoning |
629
-
630
- **Context & tools**
631
-
632
- | command | what it does |
633
- |---|---|
634
- | `/mcp` | list attached MCP servers and their tools / resources / prompts |
635
- | `/resource [uri]` | browse + read MCP resources (no arg → list URIs; `<uri>` → fetch) |
636
- | `/prompt [name]` | browse + fetch MCP prompts |
637
- | `/tool [N]` | dump the Nth tool call's full output (1 = latest) |
638
- | `/compact [tokens]` | shrink oversized tool results in the log (default 4000 tokens/result) |
639
- | `/context` | break down where context tokens are going (system / tools / log) |
640
- | `/stats` | cross-session cost dashboard (today / week / month / all-time) |
641
- | `/keys` | keyboard shortcuts + prompt prefixes (`!` / `@` / `/`) cheatsheet |
642
-
643
- **Memory & skills**
644
-
645
- | command | what it does |
646
- |---|---|
647
- | `/memory` | show pinned memory (REASONIX.md + ~/.reasonix/memory) |
648
- | `/memory list` · `show <name>` · `forget <name>` · `clear <scope> confirm` | manage the store |
649
- | `/skill` · `/skill list` | list discovered skills (project + global) |
650
- | `/skill show <name>` | dump one skill's body |
651
- | `/skill <name> [args]` | run a skill (inject body as user turn) |
652
-
653
- **Sessions**
654
-
655
- | command | what it does |
656
- |---|---|
657
- | `/sessions` | list saved sessions (current marked with `▸`) |
658
- | `/forget` | delete the current session from disk |
659
- | `/setup` | reconfigure (exit and run `reasonix setup`) |
660
-
661
- **Code mode only** (`reasonix code`)
662
-
663
- | command | what it does |
664
- |---|---|
665
- | `/apply` | commit the pending SEARCH/REPLACE blocks to disk |
666
- | `/discard` | drop the pending edit blocks without writing |
667
- | `/undo` | roll back the last applied edit batch |
668
- | `/commit "msg"` | `git add -A && git commit -m "msg"` |
669
- | `/plan [on\|off]` | toggle read-only plan mode |
670
- | `/apply-plan` | force-approve a pending plan |
671
-
672
- **Keyboard**
673
-
674
- - `Enter` — submit
675
- - `Shift+Enter` / `Ctrl+J` — newline (multi-line paste also supported;
676
- `\` + Enter as a portable fallback)
677
- - `↑` / `↓` — walk prompt history while idle; navigate slash-autocomplete
678
- - `Tab` / `Enter` on a `/foo` prefix — accept the highlighted suggestion
679
- - `Esc` — abort the current turn (stops the API call, cancels any
680
- in-flight tool, rejects pending MCP requests)
681
- - `y` / `n` on confirm prompts — hotkey accept / reject
682
-
683
- </details>
684
-
685
- ---
686
-
687
- ## Sessions and safety nets
688
-
689
- - Sessions live as JSONL under `~/.reasonix/sessions/<name>.jsonl`
690
- (per directory for `reasonix code`). Every message appended
691
- atomically; `Ctrl+C` never loses context.
692
- - Tool results are capped at 32k chars per call. Oversized sessions
693
- self-heal on load (shrinks + rewrites the file).
694
- - Malformed `assistant.tool_calls` / `tool` pairing is validated on
695
- every outgoing API call so a corrupted session can't keep 400ing.
696
- - Context gauge turns yellow at 50%, red at 80% with a `/compact`
697
- nudge. Approaching the 1M-token window (V4 flash + pro) triggers an
698
- automatic compaction attempt before falling back to a forced summary.
699
- - The `reasonix code` sandbox refuses any path that resolves outside
700
- the launch directory, including symlink escape and `..` traversal.
701
-
702
- ### Troubleshooting: duplicate rows / ghost rendering
703
-
704
- Some Windows terminals (Git Bash / MINTTY / winpty-wrapped shells)
705
- don't fully implement the ANSI cursor-up escapes Ink uses to repaint
706
- the live spinner region. Symptom: spinners, streaming previews, or
707
- tool-result rows print multiple copies into scrollback instead of
708
- overwriting in place.
709
-
710
- If you hit this, run with plain mode:
711
-
712
- ```bash
713
- REASONIX_UI=plain npx reasonix code
714
- ```
715
-
716
- Plain mode suppresses live/animated rows and disables the internal
717
- tick timer. You lose the streaming preview and spinners but gain
718
- stable scrollback. Windows Terminal, PowerShell 7 in Windows
719
- Terminal, and WezTerm don't need this opt-out.
720
-
721
- ---
722
-
723
- ## Web search — on by default
724
-
725
- The model has two web tools the moment you launch: `web_search` and
726
- `web_fetch`. No flag, no API key, no signup. When you ask about
727
- something the model wasn't trained on (new releases, current events,
728
- obscure APIs), it decides to call `web_search` on its own; if a
729
- snippet isn't enough it follows up with `web_fetch`.
730
-
731
- Backed by **Mojeek**'s public search page — an independent web
732
- index, bot-friendly, no cookies/sessions. Coverage on niche or very
733
- recent queries can be thinner than Google/Bing, but it's reliable
734
- from scripts. (DDG was the original backend but started serving
735
- anti-bot pages in 2026.)
736
-
737
- **Turn it off** (offline mode / privacy / CI):
738
-
739
- ```json
740
- // ~/.reasonix/config.json
741
- { "apiKey": "sk-…", "search": false }
742
- ```
743
-
744
- ```bash
745
- REASONIX_SEARCH=off npx reasonix code
746
- ```
747
-
748
- **Bring your own** (Kagi, SearXNG, internal caches): implement the
749
- `WebSearchProvider` interface and call
750
- `registerWebTools(registry, { provider })` yourself, or bridge an
751
- existing MCP search server via `--mcp`.
752
-
753
- ---
754
-
755
- ## MCP — bring your own tools
756
-
757
- Any [MCP](https://spec.modelcontextprotocol.io/) server works. The
758
- wizard lets you pick from a catalog, or drive it by flag:
759
-
760
- ```bash
761
- # stdio (local subprocess)
762
- npx reasonix --mcp "fs=npx -y @modelcontextprotocol/server-filesystem /tmp/safe"
763
-
764
- # multiple at once
765
- npx reasonix \
766
- --mcp "fs=npx -y @modelcontextprotocol/server-filesystem /tmp/safe" \
767
- --mcp "demo=npx tsx examples/mcp-server-demo.ts"
768
-
769
- # HTTP+SSE (remote / hosted)
770
- npx reasonix --mcp "kb=https://mcp.example.com/sse"
771
- ```
772
-
773
- `reasonix mcp list` shows the curated catalog. `reasonix mcp inspect
774
- <spec>` connects once and dumps the server's tools / resources /
775
- prompts without starting a chat. Progress notifications from
776
- long-running tools (2025-03-26 spec) render live as a progress bar
777
- in the spinner.
778
-
779
- Supported transports: **stdio** (local command) and **HTTP+SSE**
780
- (remote, MCP 2024-11-05 spec).
781
-
782
- ---
783
-
784
- ## CLI reference
785
-
786
- <details>
787
- <summary><strong>Commands, flags, env vars</strong> (click to expand)</summary>
788
-
789
- ```bash
790
- npx reasonix code [path] # coding mode scoped to path (default: cwd)
791
- npx reasonix # chat (uses saved config)
792
- npx reasonix setup # reconfigure the wizard
793
- npx reasonix chat --session work # named session
794
- npx reasonix chat --no-session # ephemeral
795
- npx reasonix run "ask anything" # one-shot, streams to stdout
796
- npx reasonix stats session.jsonl # summarize a transcript
797
- npx reasonix replay chat.jsonl # rebuild cost/cache from a transcript
798
- npx reasonix diff a.jsonl b.jsonl --md # compare two transcripts
799
- npx reasonix mcp list # curated MCP catalog
800
- npx reasonix mcp inspect <spec> # probe a single MCP server
801
- npx reasonix sessions # list saved sessions
802
- ```
803
-
804
- Common flags:
805
-
806
- ```bash
807
- --preset <auto|flash|pro> # model commitment (auto / locked-flash / locked-pro)
808
- --model <id> # explicit model id
809
- --harvest / --no-harvest # R1 plan-state extraction
810
- --branch <N> # self-consistency budget
811
- --mcp "name=cmd args…" # attach an MCP server (repeatable)
812
- --transcript path.jsonl # write a JSONL transcript on the side
813
- --session <name> # named session (default: per-dir for code mode)
814
- --no-session # ephemeral
815
- --no-config # ignore ~/.reasonix/config.json (CI-friendly)
816
- ```
817
-
818
- Env vars (win over config):
819
-
820
- ```bash
821
- export DEEPSEEK_API_KEY=sk-...
822
- export DEEPSEEK_BASE_URL=https://... # optional alternate endpoint
823
- export REASONIX_MEMORY=off # disable REASONIX.md + user memory
824
- export REASONIX_SEARCH=off # disable web_search / web_fetch
825
- export REASONIX_UI=plain # disable live rows (ghosting workaround)
826
- ```
77
+ Cache stability isn't a feature you turn on; it's an invariant the loop is designed around. That's the entire reason Reasonix is DeepSeek-only.
827
78
 
828
79
  </details>
829
80
 
830
81
  ---
831
82
 
832
- ## Library usage
833
-
834
- <details>
835
- <summary><strong>Programmatic API — embed reasonix in your own Node project</strong> (click to expand)</summary>
836
-
837
-
838
- ```ts
839
- import {
840
- CacheFirstLoop,
841
- DeepSeekClient,
842
- ImmutablePrefix,
843
- ToolRegistry,
844
- } from "reasonix";
845
-
846
- const client = new DeepSeekClient(); // reads DEEPSEEK_API_KEY from env
847
- const tools = new ToolRegistry();
848
-
849
- tools.register({
850
- name: "add",
851
- description: "Add two integers",
852
- parameters: {
853
- type: "object",
854
- properties: { a: { type: "integer" }, b: { type: "integer" } },
855
- required: ["a", "b"],
856
- },
857
- fn: ({ a, b }: { a: number; b: number }) => a + b,
858
- });
859
-
860
- const loop = new CacheFirstLoop({
861
- client,
862
- tools,
863
- prefix: new ImmutablePrefix({
864
- system: "You are a math helper.",
865
- toolSpecs: tools.specs(),
866
- }),
867
- harvest: true,
868
- branch: 3,
869
- });
870
-
871
- for await (const ev of loop.step("What is 17 + 25?")) {
872
- if (ev.role === "assistant_final") console.log(ev.content);
873
- }
874
- console.log(loop.stats.summary());
875
- ```
83
+ ## What's in the box
876
84
 
877
- `ChatOptions.seedTools` accepts a pre-built `ToolRegistry` for
878
- callers who want the `reasonix code` loop wiring without the CLI
879
- wrapper. See [docs/ARCHITECTURE.md](docs/ARCHITECTURE.md) for
880
- internals.
85
+ <p align="center">
86
+ <img src="docs/assets/feature-grid.svg" alt="Feature grid cache-first loop, plan mode, MCP first-class, sessions and dashboard, hooks, memory and skills" width="860"/>
87
+ </p>
881
88
 
882
- </details>
89
+ Permissions (`allow` / `ask` / `deny`), tool-call repair (flatten · scavenge · truncation · storm), and `/effort` for cheap turns round out the loop. [Architecture →](./docs/ARCHITECTURE.md) · [Dashboard mockup →](https://esengine.github.io/reasonix/design/agent-dashboard.html) · [TUI mockup →](https://esengine.github.io/reasonix/design/agent-tui-terminal.html) · [Website →](https://esengine.github.io/reasonix/)
883
90
 
884
91
  ---
885
92
 
886
- ## Benchmarks — verify the cache-hit claim yourself
93
+ ## Contributing
887
94
 
888
- Every abstraction here earns its weight against a DeepSeek-specific
889
- property — dirt-cheap tokens, R1 reasoning traces, automatic prefix
890
- caching, JSON mode. Generic wrappers leave these on the table.
95
+ Reasonix is solo-maintained but designed to grow. Scoped starter tickets — each with background, code pointers, acceptance criteria, and hints — live under the [`good first issue`](https://github.com/esengine/reasonix/labels/good%20first%20issue) label. Pick anything open.
891
96
 
892
- | | Reasonix default | generic frameworks |
893
- |---|---|---|
894
- | Prefix-stable loop (→ 85–95% cache hit) | yes | no (prompts rebuilt each turn) |
895
- | Auto-flatten deep tool schemas | yes | no (DeepSeek drops args) |
896
- | Retry with jittered backoff (429/503) | yes | no (custom callbacks) |
897
- | Scavenge tool calls leaked into `<think>` | yes | no |
898
- | Call-storm breaker on identical-arg repeats | yes | no |
899
- | Live cache-hit / cost / vs-Claude panel | yes | no |
97
+ **Open Discussions** opinions wanted:
98
+ - [#20 · CLI / TUI design](https://github.com/esengine/reasonix/discussions/20) — what's broken, what's missing, what would you change?
99
+ - [#21 · Dashboard design](https://github.com/esengine/reasonix/discussions/21) react against the [proposed mockup](https://esengine.github.io/reasonix/design/agent-dashboard.html)
100
+ - [#22 · Future feature wishlist](https://github.com/esengine/reasonix/discussions/22) what would you build into Reasonix next?
900
101
 
901
- On the same τ-bench-lite workload8 multi-turn tool-use tasks × 3
902
- repeats = 48 runs per side, live DeepSeek `deepseek-chat`, sole
903
- variable prefix stability:
102
+ **Before your first PR**: read [`CONTRIBUTING.md`](./CONTRIBUTING.md) short, strict project rules (comments, errors, libraries-over-hand-rolled). `tests/comment-policy.test.ts` enforces the comment ones; `npm run verify` is the pre-push gate. By participating you agree to the [Code of Conduct](./CODE_OF_CONDUCT.md). Security issues → [SECURITY.md](./SECURITY.md).
904
103
 
905
- | metric | baseline (cache-hostile) | Reasonix | delta |
906
- |---|---:|---:|---:|
907
- | cache hit | 32.8% | **90.2%** | +57.4 pp |
908
- | cost / task | $0.000992 | $0.000593 | **−40%** |
909
- | pass rate | 100% (24/24) | **100% (24/24)** | — |
104
+ ### Contributors
910
105
 
911
- **Reproduce without spending an API credit:**
912
-
913
- ```bash
914
- git clone https://github.com/esengine/reasonix.git && cd reasonix && npm install
915
- npx reasonix replay benchmarks/tau-bench/transcripts/t01_address_happy.reasonix.r1.jsonl
916
- npx reasonix diff \
917
- benchmarks/tau-bench/transcripts/t01_address_happy.baseline.r1.jsonl \
918
- benchmarks/tau-bench/transcripts/t01_address_happy.reasonix.r1.jsonl
919
- ```
920
-
921
- The committed JSONL transcripts carry per-turn `usage`, `cost`, and
922
- `prefixHash`. Reasonix's prefix hash stays byte-stable across every
923
- model call; baseline's churns on every turn. The cache delta is
924
- *mechanically* attributable to log stability, not to a different
925
- system prompt.
926
-
927
- Full 48-run report:
928
- [`benchmarks/tau-bench/report.md`](./benchmarks/tau-bench/report.md).
929
- Reproduce with your own API key: `npx tsx
930
- benchmarks/tau-bench/runner.ts --repeats 3`.
931
-
932
- MCP reference runs (one single prefix hash across all 5 turns even
933
- with two concurrent MCP subprocesses):
934
-
935
- | server | turns | cache hit | cost | vs Claude |
936
- |---|---:|---:|---:|---:|
937
- | bundled demo (`add` / `echo` / `get_time`) | 2 | **96.6%** (turn 2) | $0.000254 | −94.0% |
938
- | official `server-filesystem` | 5 | **96.7%** | $0.001235 | −97.0% |
939
- | **both concurrently** | 5 | **81.1%** | $0.001852 | −95.9% |
106
+ <a href="https://github.com/esengine/reasonix/graphs/contributors">
107
+ <img src="https://contrib.rocks/image?repo=esengine/reasonix" alt="Contributors to esengine/reasonix"/>
108
+ </a>
940
109
 
941
110
  ---
942
111
 
943
112
  ## Non-goals
944
113
 
945
- - **Multi-agent orchestration / sub-agents** (use LangGraph).
946
- - **Workflow DSL / DAG scheduler / parallel-branch engine** skills
947
- are prose; the model sequences via the normal tool-use loop.
948
- Keeps single-loop + append-only + cache-first invariants intact.
949
- - **Multi-provider abstraction** (use LiteLLM). Reasonix is
950
- DeepSeek-only on purpose — every pillar (cache-first loop, R1
951
- harvesting, tool-call repair) is tuned against DeepSeek-specific
952
- behavior and economics. Coupling to one backend is the feature.
953
- - **RAG / vector stores** (use LlamaIndex).
954
- - **Web UI / SaaS.**
955
-
956
- Reasonix does DeepSeek, deeply.
957
-
958
- ---
959
-
960
- ## Development
961
-
962
- ```bash
963
- git clone https://github.com/esengine/reasonix.git
964
- cd reasonix
965
- npm install
966
- npm run dev code # run CLI from source via tsx
967
- npm run build # tsup to dist/
968
- npm test # vitest (1482 tests)
969
- npm run lint # biome
970
- npm run typecheck # tsc --noEmit
971
- ```
114
+ - **Multi-provider flexibility.** DeepSeek-only on purpose — every layer is tuned around DeepSeek's specific cache mechanic and pricing. Coupling to one backend is the feature.
115
+ - **IDE integration.** Terminal-first; the diff lives in `git diff`, the file tree in `ls`. The dashboard is a companion, not a Cursor replacement.
116
+ - **Hardest-leaderboard reasoning.** Claude Opus still wins some benchmarks. DeepSeek V4 is competitive on coding; if your work is "solve this PhD proof" rather than "fix this auth bug," start with Claude.
117
+ - **Air-gapped / fully-free.** Reasonix needs a paid DeepSeek API key. For air-gapped or zero-cost runs see Aider + Ollama or [Continue](https://continue.dev).
972
118
 
973
119
  ---
974
120
 
975
121
  ## License
976
122
 
977
- MIT
123
+ MIT — see [LICENSE](./LICENSE).