reasonix 0.0.6 → 0.2.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +34 -7
- package/dist/cli/index.js +801 -24
- package/dist/cli/index.js.map +1 -1
- package/dist/index.d.ts +211 -2
- package/dist/index.js +446 -5
- package/dist/index.js.map +1 -1
- package/package.json +1 -1
package/README.md
CHANGED
|
@@ -58,13 +58,38 @@ actually stays byte-stable.
|
|
|
58
58
|
|
|
59
59
|
## Validated numbers
|
|
60
60
|
|
|
61
|
-
|
|
61
|
+
**τ-bench-lite** — 8 multi-turn tool-use tasks × 3 repeats = 48 runs per
|
|
62
|
+
side. Same tools / same prompt / same client on both sides, sole variable
|
|
63
|
+
is prefix stability. Measured on live DeepSeek `deepseek-chat`:
|
|
62
64
|
|
|
63
|
-
|
|
|
64
|
-
|
|
65
|
-
|
|
|
66
|
-
|
|
|
67
|
-
|
|
|
65
|
+
| metric | baseline (cache-hostile) | Reasonix | delta |
|
|
66
|
+
|---|---:|---:|---:|
|
|
67
|
+
| runs | 24 | 24 | — |
|
|
68
|
+
| **cache hit** | 46.6% | **94.4%** | **+47.7pp** |
|
|
69
|
+
| cost / task | $0.002599 | $0.001579 | **−39% (×0.61)** |
|
|
70
|
+
| vs Claude Sonnet 4.6 (token-count estimate) | — | — | **~96% cheaper** |
|
|
71
|
+
| pass rate | 96% (23/24) | **100% (24/24)** | Reasonix held the guardrail on every run |
|
|
72
|
+
|
|
73
|
+
**Verify it yourself — no API key, zero cost:**
|
|
74
|
+
|
|
75
|
+
```bash
|
|
76
|
+
git clone https://github.com/esengine/reasonix.git && cd reasonix && npm install
|
|
77
|
+
npx reasonix replay benchmarks/tau-bench/transcripts/t01_address_happy.reasonix.r1.jsonl
|
|
78
|
+
npx reasonix diff \
|
|
79
|
+
benchmarks/tau-bench/transcripts/t01_address_happy.baseline.r1.jsonl \
|
|
80
|
+
benchmarks/tau-bench/transcripts/t01_address_happy.reasonix.r1.jsonl
|
|
81
|
+
```
|
|
82
|
+
|
|
83
|
+
The JSONL transcripts committed in `benchmarks/tau-bench/transcripts/`
|
|
84
|
+
carry per-turn `usage`, `cost`, and `prefixHash`. Reasonix's prefix hash
|
|
85
|
+
stays byte-stable across every model call; baseline's prefix churns on
|
|
86
|
+
every turn. The cache delta is *mechanically* attributable to log
|
|
87
|
+
stability, not to a different system prompt.
|
|
88
|
+
|
|
89
|
+
Full 48-run report: [`benchmarks/tau-bench/report.md`][r]. Reproduce
|
|
90
|
+
with your own API key: `npx tsx benchmarks/tau-bench/runner.ts --repeats 3`.
|
|
91
|
+
|
|
92
|
+
[r]: ./benchmarks/tau-bench/report.md
|
|
68
93
|
|
|
69
94
|
---
|
|
70
95
|
|
|
@@ -77,7 +102,9 @@ npx reasonix chat # auto-saves to session 'default'; resumes next
|
|
|
77
102
|
npx reasonix chat --session work # use a different named session
|
|
78
103
|
npx reasonix chat --no-session # ephemeral — nothing persisted
|
|
79
104
|
npx reasonix run "ask anything" # one-shot, streams to stdout
|
|
80
|
-
npx reasonix stats session.jsonl #
|
|
105
|
+
npx reasonix stats session.jsonl # quick summary of a transcript
|
|
106
|
+
npx reasonix replay chat.jsonl # pretty-print a transcript + rebuild cost/cache offline
|
|
107
|
+
npx reasonix diff a.jsonl b.jsonl --md diff.md # compare two transcripts: cache/cost delta + first divergence
|
|
81
108
|
```
|
|
82
109
|
|
|
83
110
|
Sessions live as JSONL under `~/.reasonix/sessions/<name>.jsonl` — every
|