npm - reasonix - Versions diffs - 0.11.1 → 0.11.3 - Mend

reasonix 0.11.1 → 0.11.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (8) hide show

package/README.md CHANGED Viewed

@@ -18,9 +18,10 @@
 [![downloads](https://img.shields.io/npm/dm/reasonix.svg)](https://www.npmjs.com/package/reasonix)
 [![node](https://img.shields.io/node/v/reasonix.svg)](./package.json)
-**A DeepSeek-native AI coding agent in your terminal.** Edits files as
-reviewable SEARCH/REPLACE blocks. Ink TUI. MCP first-class. No
-LangChain.
+**A DeepSeek-native AI coding agent in your terminal.** ~30× cheaper
+per task than Claude Code, with a cache-first loop engineered for
+DeepSeek's pricing model. Edits as reviewable SEARCH/REPLACE blocks.
+MIT-licensed. No IDE lock-in. MCP first-class.
 ---
@@ -71,6 +72,137 @@ command list.
 ---
+## Why Reasonix? (vs Cursor / Claude Code / Cline / Aider)
+Three things you'd come to Reasonix for, that nothing else combines:
+- **The cost economics actually land in your bill.** DeepSeek V4 is
+  ~30× cheaper than Claude Sonnet per token. Cheaper tokens alone
+  isn't the win — *cheap tokens with a 90%+ prefix-cache hit* is.
+  Reasonix's loop is engineered around append-only prompt growth so
+  the cache-stable prefix survives every tool call, which the
+  benchmarks section below verifies end-to-end (94.4% live, vs 46.6%
+  for a generic harness against the same workload). The `/stats`
+  panel tracks "vs Claude Sonnet 4.6" savings every turn so you can
+  watch your bill not happen.
+- **It lives in your terminal.** Pure CLI — no Electron, no VS Code
+  extension, no IDE plugin to wedge into your editor. Sits next to
+  git, tmux, and your shell history. macOS / Linux / Windows
+  (PowerShell, Git Bash, Windows Terminal all tested). The only
+  network call is to the DeepSeek API itself; no vendor server in
+  the middle.
+- **Open source and hackable, end to end.** MIT-licensed TypeScript.
+  The entire loop, tool registry, cache-stable prefix, TUI, MCP
+  bridge — all in `src/` under 30k lines. Fork it, ship a private
+  build, drop it into CI. No SaaS layer, no enterprise tier, no
+  feature gates.
+| | Reasonix | Claude Code | Cursor | Cline | Aider |
+|---|---|---|---|---|---|
+| Backend | DeepSeek V4 only | Anthropic only | OpenAI / Anthropic | any (OpenRouter) | any (OpenRouter) |
+| Cost / typical task | **~$0.001–$0.005** | ~$0.05–$0.50 | $20/mo + usage | varies | varies |
+| Where it runs | terminal | terminal + IDE | IDE (Electron) | VS Code only | terminal |
+| License | **MIT** | closed | closed | Apache 2 | Apache 2 |
+| Cache-first prefix loop | **engineered (94% hit)** | basic | n/a | n/a | basic |
+| MCP servers | **first-class** | first-class | — | beta | — |
+| Plan mode (read-only audit gate) | **yes** | yes | — | yes | — |
+| User-authored skills | **yes** | yes | — | — | — |
+| Edit review (no auto-write) | **yes** (`/apply`) | yes | partial | yes | yes |
+| Workspace switch (`/cwd`, `change_workspace`) | **yes** | — | n/a (per-window) | — | — |
+| Cross-session cost dashboard | **yes** (`/stats`) | — | — | — | — |
+| Sandbox boundary enforcement | **strict** (refuses `..` escape) | yes | partial | yes | partial |
+### Pick something else when
+- **You want multi-provider flexibility** (mix Claude / GPT / Gemini /
+  local Llama in one tool). Try [Aider](https://aider.chat) or
+  [Cline](https://cline.bot). Reasonix is DeepSeek-only on purpose —
+  every layer (cache-first loop, R1 harvesting, JSON-mode tool repair,
+  reasoning-effort cap) is tuned against DeepSeek-specific behavior
+  and economics. Coupling to one backend is the feature, not a
+  limitation we'll grow out of.
+- **You want IDE integration** (inline diff in your gutter,
+  multi-cursor, ghost text, refactor previews). Try
+  [Cursor](https://cursor.com) or Claude Code's IDE mode. Reasonix
+  is terminal-first; the diff lives in `git diff`, the file tree
+  lives in `ls`, the chat lives in your shell.
+- **You're chasing the hardest reasoning benchmarks.** Claude Opus
+  4.6 still wins some leaderboards. DeepSeek V4-pro is competitive
+  on most coding tasks but doesn't lead every benchmark. If your
+  task is "solve this PhD-level proof" rather than "fix this auth
+  bug," start with Claude.
+- **You need fully-local / fully-free**. DeepSeek's API has free
+  credit on signup, but isn't free forever. For air-gapped or
+  always-free, look at Aider + Ollama or [Continue](https://continue.dev).
+### "But DeepSeek now has an Anthropic-compatible API — can't I just point Claude Code at it?"
+You can. DeepSeek ships an official Anthropic-compatible endpoint at
+`https://api.deepseek.com/anthropic`, and Claude Code (or any Anthropic
+SDK client) talks to it without modification. The protocol works. The
+**caching economics** don't transfer, and that's the whole point.
+Look at DeepSeek's [own compatibility table](https://api-docs.deepseek.com/guides/anthropic_api):
+| Field | Status on DeepSeek's compat endpoint |
+|---|---|
+| `cache_control` markers | **Ignored** |
+| `mcp_servers` (API-level) | Ignored |
+| `thinking.budget_tokens` | Ignored |
+| Images / documents / citations | Not supported |
+`cache_control: Ignored` is the load-bearing line. Two completely
+different cache mechanics are colliding here:
+| | Anthropic native | DeepSeek auto-cache |
+|---|---|---|
+| Model | **Marker-based.** You put `cache_control` on a message; Anthropic caches "everything up to this marker" as a content-addressed unit. Multiple markers = multiple independent breakpoints. | **Byte-stable prefix.** The cache fingerprints the literal byte stream from byte 0. |
+| Claude Code's design | Built around this. Markers on system prompt + tool defs let the loop reorder, compact, or insert metadata after the markers without losing the cache. | n/a — Claude Code wasn't designed for byte-stable prefixes. |
+| What happens when Claude Code → DeepSeek compat | Markers stripped (ignored). Claude Code's main caching strategy disappears. | Falls back to auto-cache. But Claude Code's prefix isn't byte-stable (markers were the *substitute* for byte-stability), so auto-cache misses too. |
+Net effect: **Claude Code's loop, redirected at DeepSeek, gets the
+cheap tokens and loses the cache hit it depended on.** A loop running
+at 80%+ cache hit on Anthropic's marker cache lands somewhere in the
+40-60% range on DeepSeek's auto-cache (matches the generic-harness
+baseline in our benchmarks). Same model, same API, same workload —
+the loop's invariants don't fit the cache mechanic it's now talking
+to.
+Reasonix's loop was designed around byte-stable prefix from line one.
+No markers, no breakpoints — append-only is the invariant. That's why
+the same τ-bench workload lands at **94.4% cache hit** on Reasonix
+and **46.6%** on a cache-hostile baseline (committed transcripts;
+benchmarks section below). At DeepSeek's pricing — $0.07/Mtok
+uncached, ~$0.014/Mtok cached — the difference between 50% and 94%
+hit is **roughly 2.5× on input cost alone**.
+### "What about Aider / Cline / Continue?"
+They support DeepSeek natively (no compat layer needed) and you do
+get the cheap token price. What you don't get is the DeepSeek-
+specific loop work — those tools' loops support every backend
+generically (OpenAI / Anthropic / local Llama / ...) and use
+compaction + summarization patterns that destroy byte-stability. They
+land in the same 40-60% cache-hit range as the baseline. Plus a
+handful of DeepSeek-specific quirks generic loops don't handle:
+| Generic loops assume | DeepSeek actually does | Reasonix's fix |
+|---|---|---|
+| Reasoning emitted as a structured `thinking` block | R1 sometimes leaks tool-call JSON inside `<think>` tags | a `scavenge` pass that pulls escaped tool calls back out, otherwise the model thinks it called and waits for output that never comes |
+| Tool schemas validated strictly | DeepSeek silently drops deeply-nested object/array params | auto-flatten — nested params get rewritten to single-level prefixed names so the model sees them at all |
+| Tool-call args are well-formed JSON | DeepSeek occasionally produces `string="false"` and other malformed fragments | dedicated `ToolCallRepair` heals the common shapes before they hit dispatch |
+| Reasoning depth tuned via system-level switches | V4 exposes a `reasoning_effort` knob (`max` / `high`) | `/effort` slash + `--effort` flag, so users can step down for cheap turns |
+| Old tool results kept in full forever | 1M context — don't compact pre-emptively, but most agents do | call-storm breaker + result token cap, but the prefix is *never* rewritten; compaction lands as new turns at the tail |
+> Cache-stability isn't a feature you turn on; it's an invariant
+> the loop is designed around. Reasonix isn't yet-another agent
+> CLI — it's an agent CLI built around DeepSeek's specific cache
+> mechanic and pricing model.
+---
 ## `reasonix code` — pair programmer in your terminal
 Scoped to the directory you launch from. The model has native
@@ -771,7 +903,7 @@ cd reasonix
 npm install
 npm run dev code        # run CLI from source via tsx
 npm run build           # tsup to dist/
-npm test                # vitest (1007 tests)
+npm test                # vitest (1482 tests)
 npm run lint            # biome
 npm run typecheck       # tsc --noEmit
 ```

package/README.zh-CN.md CHANGED Viewed

@@ -18,8 +18,9 @@
 [![downloads](https://img.shields.io/npm/dm/reasonix.svg)](https://www.npmjs.com/package/reasonix)
 [![node](https://img.shields.io/node/v/reasonix.svg)](./package.json)
-**DeepSeek 原生的终端 AI 编程代理。** 编辑以可审查的 SEARCH/REPLACE 块呈现，
-落盘前必须确认。Ink TUI、原生 MCP、不依赖 LangChain。
+**DeepSeek 原生的终端 AI 编程代理。** 单次任务成本约为 Claude Code 的
+1/30，缓存优先的循环是为 DeepSeek 的定价模型量身打造的。编辑以可审查的
+SEARCH/REPLACE 块呈现，落盘前必须确认。MIT 许可、不绑 IDE、原生 MCP。
 ---
@@ -68,6 +69,120 @@ Windows Terminal）。任何时候按 `Esc` 中断；`/help` 查看完整命令
 ---
+## 为什么选 Reasonix？（vs Cursor / Claude Code / Cline / Aider）
+三件事，别家不会同时都给你：
+- **成本节省落到账单上。** DeepSeek V4 的 token 单价大约是 Claude Sonnet
+  的 1/30。光便宜还不够 —— *便宜的 token 配上 90%+ 的前缀缓存命中*才是关键。
+  Reasonix 的循环按 append-only 增长设计，缓存稳定的前缀在每次工具调用之间
+  都活着，下面的 benchmark 章节端到端验证过：实测 94.4% 缓存命中，对照组通用
+  框架只有 46.6%。`/stats` 面板每轮都跟踪 "vs Claude Sonnet 4.6" 的节省额，
+  你可以亲眼看着账单不涨。
+- **它住在终端里。** 纯 CLI —— 没有 Electron，没有 VS Code 插件，没有要
+  塞进编辑器的 IDE 插件。和 git、tmux、shell 历史并排。macOS / Linux /
+  Windows（PowerShell、Git Bash、Windows Terminal 都测过）。唯一的网络
+  请求就是 DeepSeek API 本身，中间没有厂商服务器。
+- **开源且彻底可改。** MIT 许可的 TypeScript。整个循环、工具注册表、
+  缓存稳定前缀、TUI、MCP 桥接 —— 全部在 `src/` 下，不到 3 万行。Fork
+  它、做私有构建、塞进 CI 都可以。没有 SaaS 层，没有企业版，没有功能闸门。
+| | Reasonix | Claude Code | Cursor | Cline | Aider |
+|---|---|---|---|---|---|
+| 后端 | 仅 DeepSeek V4 | 仅 Anthropic | OpenAI / Anthropic | 任意（OpenRouter）| 任意（OpenRouter）|
+| 单次任务成本 | **~$0.001–$0.005** | ~$0.05–$0.50 | $20/月 + 用量 | 视情况 | 视情况 |
+| 运行环境 | 终端 | 终端 + IDE | IDE（Electron）| 仅 VS Code | 终端 |
+| 开源协议 | **MIT** | 闭源 | 闭源 | Apache 2 | Apache 2 |
+| 缓存优先前缀循环 | **工程化（94% 命中）** | 基础 | n/a | n/a | 基础 |
+| MCP 服务器 | **原生支持** | 原生支持 | — | 测试中 | — |
+| 计划模式（只读审计闸门）| **支持** | 支持 | — | 支持 | — |
+| 用户编写的 skills | **支持** | 支持 | — | — | — |
+| 编辑审阅（不自动落盘）| **支持**（`/apply`）| 支持 | 部分 | 支持 | 支持 |
+| 工作区切换（`/cwd`、`change_workspace`）| **支持** | — | n/a（每窗一项目）| — | — |
+| 跨会话成本面板 | **支持**（`/stats`）| — | — | — | — |
+| 沙箱边界强制 | **严格**（拒绝 `..` 逃逸）| 支持 | 部分 | 支持 | 部分 |
+### 这些情况下应该选别的
+- **你想要多模型混用**（在一个工具里同时切 Claude / GPT / Gemini / 本地 Llama）。
+  试试 [Aider](https://aider.chat) 或 [Cline](https://cline.bot)。Reasonix
+  故意只绑 DeepSeek —— 每一层（缓存优先循环、R1 harvest、JSON 模式的工具
+  调用修复、reasoning_effort 上限）都是为 DeepSeek 的具体行为和经济模型
+  调出来的。绑死后端是设计选择，不是早晚要解决的限制。
+- **你想要 IDE 集成**（编辑器侧边栏 inline diff、多光标、ghost text、重构
+  预览）。试试 [Cursor](https://cursor.com) 或 Claude Code 的 IDE 模式。
+  Reasonix 是终端优先的：diff 在 `git diff` 里、文件树在 `ls` 里、对话
+  在 shell 里。
+- **你在追最难的推理 benchmark**。Claude Opus 4.6 还是赢一些榜单的。
+  DeepSeek V4-pro 在大多数编程任务上都很有竞争力，但不是每个 benchmark
+  都领先。如果你的任务是"证明这个 PhD 级别的数学命题"而不是"修这个
+  auth bug"，从 Claude 起步更合适。
+- **你需要完全本地 / 永远免费**。DeepSeek API 注册送额度，但不是永久
+  免费。要真正离线/永久免费，看看 Aider + Ollama 或者
+  [Continue](https://continue.dev)。
+### "DeepSeek 现在有 Anthropic 兼容 API 了，我直接拿 Claude Code 接上不就行？"
+可以接。DeepSeek 官方提供了 Anthropic 兼容端点
+`https://api.deepseek.com/anthropic`，Claude Code（或任何 Anthropic SDK
+客户端）不改一行代码就能连上去。**协议跑得通，缓存经济学跑不通** ——
+而后者才是关键。
+看 [DeepSeek 自己的兼容性表](https://api-docs.deepseek.com/guides/anthropic_api)：
+| 字段 | 在 DeepSeek 兼容端点上的状态 |
+|---|---|
+| `cache_control` 标记 | **Ignored（被忽略）** |
+| `mcp_servers`（API 层）| Ignored |
+| `thinking.budget_tokens` | Ignored |
+| 图像 / 文档 / 引用 | 不支持 |
+`cache_control: Ignored` 就是杀手级的那一行。这里有**两套完全不同的缓存
+机制在打架**：
+| | Anthropic 原生 | DeepSeek 自动缓存 |
+|---|---|---|
+| 模型 | **Marker 驱动。** 你在某条消息上打 `cache_control`，Anthropic 把"到此 marker 为止"的内容做内容寻址缓存。多个 marker = 多个独立断点。 | **Byte-stable prefix。** 缓存对字面字节流从第 0 字节起做指纹。 |
+| Claude Code 的设计 | 围绕这个设计的。在 system prompt + tool 定义上插 marker，让 loop 在 marker 之后做重排、压缩、插元数据都不丢缓存。 | n/a —— Claude Code 不是为 byte-stable prefix 设计的。 |
+| Claude Code 接 DeepSeek 兼容端点之后 | Marker 被 strip（忽略）。Claude Code 的主缓存策略消失。 | Fallback 到 auto-cache。但 Claude Code 的 prefix 不是 byte-stable 的（marker 本来就是 byte-stability 的*替代*），auto-cache 也命中不了。 |
+净效果：**Claude Code 的 loop 重定向到 DeepSeek 之后，便宜 token 拿到了，
+原本依赖的缓存命中没了**。一个在 Anthropic marker cache 上 80%+ 命中的
+loop，到 DeepSeek 的 auto-cache 上大概率掉到 40-60%（跟我们 benchmark 里
+通用 harness 的 baseline 同区间）。同一个模型、同一个 API、同一个负载 ——
+loop 的 invariant 跟它现在对话的缓存机制不匹配。
+Reasonix 的 loop 从第一行起就是按 byte-stable prefix 的不变量设计的。没有
+marker、没有断点 —— append-only 就是 invariant。这就是为什么同一份 τ-bench
+负载在 Reasonix 上是 **94.4% 缓存命中**、在 cache-hostile baseline 上是
+**46.6%**（已 commit 的 transcript，见下面 benchmark 段）。按 DeepSeek 的
+单价 —— $0.07/Mtok 非缓存、约 $0.014/Mtok 缓存命中 —— 50% 和 94% 命中之间
+**仅 input 这一侧就大约是 2.5× 的差距**。
+### "那 Aider / Cline / Continue 呢？"
+它们原生支持 DeepSeek（不需要兼容层），便宜 token 单价你确实拿到了。
+但你拿不到 DeepSeek-specific 的循环工程 —— 这些工具的 loop 是为**通用支持**
+所有后端（OpenAI / Anthropic / 本地 Llama / ...）设计的，用的是那种会破坏
+byte-stability 的通用压缩 / 摘要模式。命中率落在和 baseline 同样的 40-60%
+区间。再加上一堆 DeepSeek 怪癖通用 loop 不处理：
+| 通用 loop 假定 | DeepSeek 实际行为 | Reasonix 怎么处理 |
+|---|---|---|
+| 推理通过结构化 `thinking` 块产出 | R1 偶尔把 tool-call JSON 漏到 `<think>` 标签里 | `scavenge` 扫描把漏出的 tool call 拣回，否则模型以为自己已经调用，等不到结果 |
+| tool schema 严格校验 | DeepSeek 静默丢弃深嵌套 object/array 参数 | auto-flatten：嵌套参数被重写成单层 prefixed name，模型才看得见 |
+| tool-call args 是良构 JSON | DeepSeek 偶发 `string="false"` 之类的破碎片段 | 专用 `ToolCallRepair` 在 dispatch 之前修好常见形状 |
+| 推理深度靠系统级开关 | V4 直接暴露 `reasoning_effort` 旋钮（`max` / `high`） | `/effort` slash + `--effort` flag，简单任务可以降到 high 省钱 |
+| 老 tool result 永久保留 | 1M context 不需要主动 compact，但通用工具都会做 | call-storm breaker + 结果 token cap，但前缀**永不重写** —— 压缩作为新 turn 追加在尾部 |
+> 缓存稳定性不是一个开关，是 loop 设计之初就要建立的不变量。Reasonix
+> 不是"又一个 agent CLI"，是**围绕 DeepSeek 具体的缓存机制和定价模型
+> 设计的 agent CLI**。
+---
 ## `reasonix code` — 终端里的结对编程
 作用域为启动目录。模型自带 `read_file` / `write_file` / `edit_file` /
@@ -709,7 +824,7 @@ cd reasonix
 npm install
 npm run dev code        # 用 tsx 直接从源码跑 CLI
 npm run build           # tsup 打包到 dist/
-npm test                # vitest（1007 个测试）
+npm test                # vitest（1482 个测试）
 npm run lint            # biome
 npm run typecheck       # tsc --noEmit
 ```