@bilalimamoglu/sift 0.3.2 → 0.4.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +191 -184
- package/dist/cli.js +2174 -339
- package/dist/index.d.ts +19 -1
- package/dist/index.js +2085 -310
- package/package.json +4 -2
package/README.md
CHANGED
|
@@ -1,247 +1,279 @@
|
|
|
1
1
|
# sift
|
|
2
2
|
|
|
3
|
-
|
|
3
|
+
[](https://www.npmjs.com/package/@bilalimamoglu/sift)
|
|
4
|
+
[](LICENSE)
|
|
5
|
+
[](https://github.com/bilalimamoglu/sift/actions/workflows/ci.yml)
|
|
4
6
|
|
|
5
|
-
|
|
7
|
+
<img src="assets/brand/sift-logo-minimal-teal-default.svg" alt="sift logo" width="140" />
|
|
6
8
|
|
|
7
|
-
|
|
9
|
+
Your AI agent should not be reading 13,000 lines of test output.
|
|
8
10
|
|
|
9
|
-
|
|
10
|
-
-
|
|
11
|
-
|
|
12
|
-
- the output is already short
|
|
13
|
-
|
|
14
|
-
## Install
|
|
15
|
-
|
|
16
|
-
Requires Node.js 24 or later.
|
|
11
|
+
On the largest real fixture in the benchmark:
|
|
12
|
+
**Before:** 128 failures, 198K raw-output tokens, agent reconstructs the failure shape from scratch.
|
|
13
|
+
**After:** 6 lines, 129 `standard` tokens, agent acts on a grouped diagnosis immediately.
|
|
17
14
|
|
|
18
15
|
```bash
|
|
19
|
-
|
|
16
|
+
sift exec --preset test-status -- pytest -q
|
|
20
17
|
```
|
|
21
18
|
|
|
22
|
-
|
|
19
|
+
```text
|
|
20
|
+
- Tests did not pass.
|
|
21
|
+
- 3 tests failed. 125 errors occurred.
|
|
22
|
+
- Shared blocker: 125 errors share the same root cause - a missing test environment variable.
|
|
23
|
+
Anchor: tests/conftest.py
|
|
24
|
+
Fix: Set the required env var before rerunning DB-isolated tests.
|
|
25
|
+
- Contract drift: 3 snapshot tests are out of sync with the current API or model state.
|
|
26
|
+
Anchor: tests/contracts/test_feature_manifest_freeze.py
|
|
27
|
+
Fix: Regenerate the snapshots if the changes are intentional.
|
|
28
|
+
- Decision: stop and act.
|
|
29
|
+
```
|
|
30
|
+
|
|
31
|
+
If 125 tests fail for one reason, the agent should pay for that reason once.
|
|
32
|
+
|
|
33
|
+
## What it is
|
|
23
34
|
|
|
24
|
-
|
|
35
|
+
Developers using coding agents — Claude Code, Codex, Cursor, Windsurf, Copilot, or any LLM-driven workflow that runs shell commands and reads the output.
|
|
36
|
+
|
|
37
|
+
`sift` sits between the command and the agent. It captures noisy output, groups repeated failures into root-cause buckets, and returns a short diagnosis with an anchor, a likely fix, and a decision signal. The agent gets a map instead of a wall of text.
|
|
38
|
+
|
|
39
|
+
## Install
|
|
25
40
|
|
|
26
41
|
```bash
|
|
27
|
-
|
|
28
|
-
sift doctor # verify it works
|
|
42
|
+
npm install -g @bilalimamoglu/sift
|
|
29
43
|
```
|
|
30
44
|
|
|
31
|
-
|
|
45
|
+
Requires Node.js 20+.
|
|
32
46
|
|
|
33
|
-
|
|
47
|
+
## Try it in 60 seconds
|
|
48
|
+
|
|
49
|
+
If you already have an API key, you can try `sift` without any setup wizard:
|
|
34
50
|
|
|
35
51
|
```bash
|
|
36
|
-
# OpenAI
|
|
37
|
-
export SIFT_PROVIDER=openai
|
|
38
|
-
export SIFT_BASE_URL=https://api.openai.com/v1
|
|
39
|
-
export SIFT_MODEL=gpt-5-nano
|
|
40
52
|
export OPENAI_API_KEY=your_openai_api_key
|
|
53
|
+
sift exec --preset test-status -- pytest -q
|
|
54
|
+
```
|
|
41
55
|
|
|
42
|
-
|
|
43
|
-
export SIFT_PROVIDER=openrouter
|
|
44
|
-
export OPENROUTER_API_KEY=your_openrouter_api_key
|
|
56
|
+
You can also use a freeform prompt for non-test output:
|
|
45
57
|
|
|
46
|
-
|
|
47
|
-
|
|
48
|
-
export SIFT_BASE_URL=https://your-endpoint/v1
|
|
49
|
-
export SIFT_PROVIDER_API_KEY=your_api_key
|
|
58
|
+
```bash
|
|
59
|
+
sift exec "what changed?" -- git diff
|
|
50
60
|
```
|
|
51
61
|
|
|
52
|
-
|
|
62
|
+
## Set it up for daily use
|
|
63
|
+
|
|
64
|
+
Guided setup writes a machine-wide config, verifies the provider, and makes the CLI easier to use day to day:
|
|
53
65
|
|
|
54
66
|
```bash
|
|
55
|
-
sift config
|
|
56
|
-
sift
|
|
67
|
+
sift config setup
|
|
68
|
+
sift doctor
|
|
57
69
|
```
|
|
58
70
|
|
|
59
|
-
|
|
71
|
+
Config lives at `~/.config/sift/config.yaml`. A repo-local `sift.config.yaml` can override it later.
|
|
60
72
|
|
|
61
|
-
|
|
73
|
+
If you want your coding agent to use `sift` automatically, install the managed instruction block too:
|
|
62
74
|
|
|
63
75
|
```bash
|
|
64
|
-
sift
|
|
76
|
+
sift agent install codex
|
|
77
|
+
sift agent install claude
|
|
78
|
+
```
|
|
79
|
+
|
|
80
|
+
Then run noisy commands through `sift`:
|
|
81
|
+
|
|
82
|
+
```bash
|
|
83
|
+
sift exec --preset test-status -- <test command>
|
|
65
84
|
sift exec "what changed?" -- git diff
|
|
66
85
|
sift exec --preset audit-critical -- npm audit
|
|
67
86
|
sift exec --preset infra-risk -- terraform plan
|
|
68
87
|
```
|
|
69
88
|
|
|
70
|
-
`sift exec` runs the child command, captures its output, reduces it, and preserves the original exit code.
|
|
71
|
-
|
|
72
89
|
Useful flags:
|
|
73
|
-
- `--dry-run
|
|
74
|
-
- `--show-raw
|
|
75
|
-
|
|
76
|
-
## Test debugging workflow
|
|
77
|
-
|
|
78
|
-
This is the most common use case and where `sift` adds the most value.
|
|
79
|
-
|
|
80
|
-
Think of it like this:
|
|
81
|
-
- `standard` = map
|
|
82
|
-
- `focused` or `rerun --remaining` = zoom
|
|
83
|
-
- raw traceback = last resort
|
|
90
|
+
- `--dry-run` to preview the reduced input and prompt without calling a provider
|
|
91
|
+
- `--show-raw` to print captured raw output to `stderr`
|
|
92
|
+
- `--fail-on` to let reduced results fail CI for commands such as `npm audit` or `terraform plan`
|
|
84
93
|
|
|
85
|
-
|
|
94
|
+
If you prefer environment variables instead of setup:
|
|
86
95
|
|
|
87
96
|
```bash
|
|
88
|
-
|
|
89
|
-
|
|
90
|
-
|
|
91
|
-
|
|
97
|
+
# OpenAI
|
|
98
|
+
export SIFT_PROVIDER=openai
|
|
99
|
+
export SIFT_BASE_URL=https://api.openai.com/v1
|
|
100
|
+
export SIFT_MODEL=gpt-5-nano
|
|
101
|
+
export OPENAI_API_KEY=your_openai_api_key
|
|
92
102
|
|
|
93
|
-
|
|
103
|
+
# OpenRouter
|
|
104
|
+
export SIFT_PROVIDER=openrouter
|
|
105
|
+
export OPENROUTER_API_KEY=your_openrouter_api_key
|
|
94
106
|
|
|
95
|
-
|
|
107
|
+
# Any OpenAI-compatible endpoint
|
|
108
|
+
export SIFT_PROVIDER=openai-compatible
|
|
109
|
+
export SIFT_BASE_URL=https://your-endpoint/v1
|
|
110
|
+
export SIFT_PROVIDER_API_KEY=your_api_key
|
|
111
|
+
```
|
|
96
112
|
|
|
97
|
-
|
|
113
|
+
## Why it helps
|
|
98
114
|
|
|
99
|
-
|
|
100
|
-
- A named family such as import, timeout, network, migration, or assertion
|
|
101
|
-
- `Anchor` — the first file, line window, or search term worth opening
|
|
102
|
-
- `Fix` — the likely next move
|
|
103
|
-
- `Decision` — whether to stop here or zoom one step deeper
|
|
104
|
-
- `Next` — the smallest practical action
|
|
115
|
+
The core abstraction is a **bucket**: one distinct root cause, no matter how many tests it affects.
|
|
105
116
|
|
|
106
|
-
|
|
117
|
+
Instead of making an agent reason over 125 repeated tracebacks, `sift` compresses them into one actionable bucket with:
|
|
118
|
+
- a label
|
|
119
|
+
- an affected count
|
|
120
|
+
- an anchor
|
|
121
|
+
- a likely fix
|
|
122
|
+
- a decision signal
|
|
107
123
|
|
|
108
|
-
|
|
109
|
-
- `focused` — groups failures by error type, shows a few representative tests
|
|
110
|
-
- `verbose` — flat list of all visible failing tests with their normalized reason
|
|
124
|
+
That changes the agent's job from "figure out what happened" to "act on the diagnosis."
|
|
111
125
|
|
|
112
|
-
|
|
126
|
+
## How it works
|
|
113
127
|
|
|
114
|
-
|
|
115
|
-
```text
|
|
116
|
-
- Tests did not complete.
|
|
117
|
-
- 114 errors occurred during collection.
|
|
118
|
-
- Import/dependency blocker: repeated collection failures are caused by missing dependencies.
|
|
119
|
-
- Anchor: path/to/failing_test.py
|
|
120
|
-
- Fix: Install the missing dependencies and rerun the affected tests.
|
|
121
|
-
- Decision: stop and act. Do not escalate unless you need exact traceback lines.
|
|
122
|
-
- Next: Fix bucket 1 first, then rerun the full suite at standard.
|
|
123
|
-
```
|
|
128
|
+
`sift` follows a cheapest-first pipeline:
|
|
124
129
|
|
|
125
|
-
|
|
126
|
-
|
|
127
|
-
|
|
128
|
-
|
|
129
|
-
|
|
130
|
-
Anchor: search <TEST_ENV_VAR> in path/to/test_setup.py
|
|
131
|
-
Fix: Set the required test env var and rerun the suite.
|
|
132
|
-
- Contract drift: snapshot expectations are out of sync with the current API or model state.
|
|
133
|
-
Anchor: search <route-or-entity> in path/to/freeze_test.py
|
|
134
|
-
Fix: Review the drift and regenerate the snapshots if the change is intentional.
|
|
135
|
-
- Decision: stop and act.
|
|
136
|
-
- Next: Fix bucket 1 first, then rerun the full suite at standard.
|
|
137
|
-
```
|
|
130
|
+
1. Capture command output.
|
|
131
|
+
2. Sanitize sensitive-looking material.
|
|
132
|
+
3. Apply local heuristics for known failure shapes.
|
|
133
|
+
4. Escalate to a cheaper provider only if needed.
|
|
134
|
+
5. Return a short diagnosis to the main agent.
|
|
138
135
|
|
|
139
|
-
|
|
136
|
+
It also returns a decision signal:
|
|
137
|
+
- `stop and act` when the diagnosis is already actionable
|
|
138
|
+
- `zoom` when one deeper pass is justified
|
|
139
|
+
- raw logs only as a last resort
|
|
140
140
|
|
|
141
|
-
|
|
142
|
-
2. If `standard` already shows root cause, `Anchor`, and `Fix`, trust it and act.
|
|
143
|
-
3. `sift escalate` — deeper render of the same cached output, without rerunning.
|
|
144
|
-
4. `sift rerun` — after a fix, refresh the full-suite truth at `standard`.
|
|
145
|
-
5. `sift rerun --remaining --detail focused` — zoom into what is still failing.
|
|
146
|
-
6. `sift rerun --remaining --detail verbose`
|
|
147
|
-
7. `sift rerun --remaining --detail verbose --show-raw`
|
|
148
|
-
8. Raw test command only if exact traceback lines are still needed.
|
|
141
|
+
For recognized formats, local heuristics can fully handle the output and skip the provider entirely.
|
|
149
142
|
|
|
150
|
-
|
|
143
|
+
The deepest local coverage today is test debugging, especially `pytest`, with growing support for `vitest` and `jest`. Other presets cover typecheck walls, lint failures, build errors, audit output, and Terraform risk detection.
|
|
151
144
|
|
|
152
|
-
|
|
145
|
+
## Built-in presets
|
|
153
146
|
|
|
154
|
-
|
|
155
|
-
- `sift rerun` = rerun the cached command at `standard`, show what resolved or remained
|
|
156
|
-
- `sift rerun --remaining` = rerun only the remaining failing test nodes
|
|
157
|
-
- `Decision: stop and act` = trust the diagnosis and go fix code
|
|
158
|
-
- `Decision: zoom` = one deeper sift pass is justified before raw
|
|
147
|
+
Every preset runs local heuristics first. When the heuristic confidently handles the output, the provider is never called.
|
|
159
148
|
|
|
160
|
-
|
|
149
|
+
| Preset | Heuristic | What it does |
|
|
150
|
+
|--------|-----------|-------------|
|
|
151
|
+
| `test-status` | Deep | Bucket/anchor/decision system for pytest, vitest, jest. 30+ failure patterns, confidence-gated stop/zoom decisions. |
|
|
152
|
+
| `typecheck-summary` | Deterministic | Parses `tsc` output (standard and pretty formats), groups by error code, returns max 5 bullets. |
|
|
153
|
+
| `lint-failures` | Deterministic | Parses ESLint stylish output, groups by rule, distinguishes errors from warnings, detects fixable hints. |
|
|
154
|
+
| `audit-critical` | Deterministic | Extracts high/critical vulnerabilities from `npm audit` or similar. |
|
|
155
|
+
| `infra-risk` | Deterministic | Detects destructive signals in `terraform plan` output. Returns pass/fail verdict. |
|
|
156
|
+
| `build-failure` | Deterministic-first | Extracts the first concrete build error for recognized webpack, esbuild/Vite, Cargo, Go, GCC/Clang, and `tsc --build` output; falls back to the provider for unsupported formats. |
|
|
157
|
+
| `diff-summary` | Provider | Summarizes changes and risks in diff output. |
|
|
158
|
+
| `log-errors` | Provider | Extracts top error signals from log output. |
|
|
161
159
|
|
|
162
|
-
|
|
160
|
+
Presets marked **Deterministic** bypass the provider entirely for recognized output formats. Presets marked **Deterministic-first** try a local heuristic first and fall back to the provider only when the captured output is unsupported or ambiguous. Presets marked **Provider** always call the LLM but benefit from input sanitization and truncation.
|
|
163
161
|
|
|
164
162
|
```bash
|
|
165
|
-
sift
|
|
166
|
-
sift exec --
|
|
167
|
-
sift exec --
|
|
163
|
+
sift exec --preset typecheck-summary -- npx tsc --noEmit
|
|
164
|
+
sift exec --preset lint-failures -- npx eslint src/
|
|
165
|
+
sift exec --preset build-failure -- npm run build
|
|
166
|
+
sift exec --preset audit-critical -- npm audit
|
|
167
|
+
sift exec --preset infra-risk -- terraform plan
|
|
168
168
|
```
|
|
169
169
|
|
|
170
|
-
|
|
171
|
-
- later cycles = what changed, what resolved, what stayed, and the next best action
|
|
172
|
-
- for `test-status`, resolved tests drop out and remaining failures stay in focus
|
|
170
|
+
On an interactive terminal, `sift` also shows a small stderr footer so humans can see whether the provider was skipped:
|
|
173
171
|
|
|
174
|
-
|
|
172
|
+
```text
|
|
173
|
+
[sift: heuristic • LLM skipped • summary 47ms]
|
|
174
|
+
[sift: provider • LLM used • 380 tokens • summary 1.2s]
|
|
175
|
+
```
|
|
175
176
|
|
|
176
|
-
|
|
177
|
+
Suppress the footer with `--quiet`:
|
|
177
178
|
|
|
178
179
|
```bash
|
|
179
|
-
sift exec --preset
|
|
180
|
-
sift rerun --goal diagnose --format json
|
|
180
|
+
sift exec --preset typecheck-summary --quiet -- npx tsc --noEmit
|
|
181
181
|
```
|
|
182
182
|
|
|
183
|
-
|
|
183
|
+
## Strongest today
|
|
184
184
|
|
|
185
|
-
|
|
185
|
+
`sift` is strongest when output is:
|
|
186
|
+
- long
|
|
187
|
+
- repetitive
|
|
188
|
+
- triage-heavy
|
|
189
|
+
- shaped by a small number of shared root causes
|
|
186
190
|
|
|
187
|
-
|
|
191
|
+
Best fits today:
|
|
192
|
+
- large `pytest`, `vitest`, or `jest` runs
|
|
193
|
+
- `tsc` type errors and `eslint` lint failures
|
|
194
|
+
- build failures from webpack, esbuild/Vite, Cargo, Go, GCC/Clang
|
|
195
|
+
- `npm audit` and `terraform plan`
|
|
196
|
+
- repeated CI blockers
|
|
197
|
+
- noisy diffs and log streams
|
|
188
198
|
|
|
189
|
-
|
|
190
|
-
- `typecheck-summary`: group blocking type errors by root cause
|
|
191
|
-
- `lint-failures`: group repeated lint violations and highlight the files or rules that matter
|
|
192
|
-
- `audit-critical`: extract only high and critical vulnerabilities
|
|
193
|
-
- `infra-risk`: return a safety verdict for infra changes
|
|
194
|
-
- `diff-summary`: summarize code changes and risks
|
|
195
|
-
- `build-failure`: explain the most likely build failure
|
|
196
|
-
- `log-errors`: extract the most relevant error signals
|
|
199
|
+
## Test debugging workflow
|
|
197
200
|
|
|
198
|
-
|
|
199
|
-
sift presets list
|
|
200
|
-
sift presets show test-status
|
|
201
|
-
```
|
|
201
|
+
This is where `sift` is strongest today.
|
|
202
202
|
|
|
203
|
-
|
|
203
|
+
Think of it like this:
|
|
204
|
+
- `standard` = map
|
|
205
|
+
- `focused` = zoom
|
|
206
|
+
- raw traceback = last resort
|
|
204
207
|
|
|
205
|
-
|
|
208
|
+
Typical loop:
|
|
206
209
|
|
|
207
210
|
```bash
|
|
208
|
-
sift
|
|
209
|
-
sift
|
|
211
|
+
sift exec --preset test-status -- <test command>
|
|
212
|
+
sift rerun
|
|
213
|
+
sift rerun --remaining --detail focused
|
|
210
214
|
```
|
|
211
215
|
|
|
212
|
-
|
|
216
|
+
If `standard` already gives you the root cause, anchor, and fix, stop there and act.
|
|
217
|
+
|
|
218
|
+
`sift rerun --remaining` narrows automatically for cached `pytest` runs.
|
|
219
|
+
|
|
220
|
+
For cached `vitest` and `jest` runs, it reruns the original full command and keeps the diagnosis focused on what still fails relative to the cached baseline.
|
|
221
|
+
|
|
222
|
+
For other runners, rerun a narrowed command manually with `sift exec --preset test-status -- <narrowed command>`.
|
|
213
223
|
|
|
214
224
|
```bash
|
|
215
225
|
sift agent status
|
|
216
|
-
sift agent
|
|
226
|
+
sift agent show claude
|
|
217
227
|
sift agent remove claude
|
|
218
228
|
```
|
|
219
229
|
|
|
220
|
-
##
|
|
230
|
+
## Where it helps less
|
|
221
231
|
|
|
222
|
-
|
|
232
|
+
`sift` adds less value when:
|
|
233
|
+
- the output is already short and obvious
|
|
234
|
+
- the command is interactive or TUI-based
|
|
235
|
+
- the exact raw log matters
|
|
236
|
+
- the output does not expose enough evidence for reliable grouping
|
|
223
237
|
|
|
224
|
-
|
|
225
|
-
|
|
226
|
-
|
|
227
|
-
|
|
238
|
+
When it cannot be confident, it tells you to zoom or read raw instead of pretending certainty.
|
|
239
|
+
|
|
240
|
+
## Benchmark
|
|
241
|
+
|
|
242
|
+
On a real 640-test Python backend (125 repeated setup errors, 3 contract failures, 510 passing tests):
|
|
243
|
+
|
|
244
|
+
| Metric | Raw agent | sift-first | Reduction |
|
|
245
|
+
|--------|-----------|------------|-----------|
|
|
246
|
+
| Tokens | 305K | 600 | 99.8% |
|
|
247
|
+
| Tool calls | 16 | 7 | 56% |
|
|
248
|
+
| Diagnosis | Same | Same | — |
|
|
249
|
+
|
|
250
|
+
The table above is the single-fixture reduction story: the largest real test log in the benchmark shrank from `198026` raw tokens to `129` `standard` tokens.
|
|
228
251
|
|
|
229
|
-
|
|
252
|
+
The end-to-end workflow benchmark is a different metric:
|
|
253
|
+
- `62%` fewer total debugging tokens
|
|
254
|
+
- `71%` fewer tool calls
|
|
255
|
+
- `65%` faster wall-clock time
|
|
256
|
+
|
|
257
|
+
Both matter. The table shows how aggressively `sift` can compress one large noisy run. The workflow numbers show how that compounds across a full debug loop.
|
|
258
|
+
|
|
259
|
+
Methodology and caveats live in [BENCHMARK_NOTES.md](BENCHMARK_NOTES.md).
|
|
260
|
+
|
|
261
|
+
## Configuration
|
|
262
|
+
|
|
263
|
+
Inspect and validate config with:
|
|
230
264
|
|
|
231
265
|
```bash
|
|
232
|
-
sift config show
|
|
266
|
+
sift config show
|
|
233
267
|
sift config show --show-secrets
|
|
234
268
|
sift config validate
|
|
235
269
|
```
|
|
236
270
|
|
|
237
|
-
|
|
238
|
-
1. CLI flags
|
|
239
|
-
2. environment variables
|
|
240
|
-
3. repo-local `sift.config.yaml`
|
|
241
|
-
4. machine-wide `~/.config/sift/config.yaml`
|
|
242
|
-
5. built-in defaults
|
|
271
|
+
To switch between saved providers without editing files:
|
|
243
272
|
|
|
244
|
-
|
|
273
|
+
```bash
|
|
274
|
+
sift config use openai
|
|
275
|
+
sift config use openrouter
|
|
276
|
+
```
|
|
245
277
|
|
|
246
278
|
Minimal YAML config:
|
|
247
279
|
|
|
@@ -262,37 +294,12 @@ runtime:
|
|
|
262
294
|
rawFallback: true
|
|
263
295
|
```
|
|
264
296
|
|
|
265
|
-
##
|
|
266
|
-
|
|
267
|
-
- redaction is optional and regex-based
|
|
268
|
-
- retriable provider failures (`429`, timeouts, `5xx`) are retried once
|
|
269
|
-
- `sift exec` detects interactive prompts (`[y/N]`, `password:`) and skips reduction
|
|
270
|
-
- pipe mode does not preserve upstream pipeline failures; use `set -o pipefail` if needed
|
|
271
|
-
|
|
272
|
-
## Releasing
|
|
273
|
-
|
|
274
|
-
This repo uses a manual GitHub Actions release workflow with npm trusted publishing.
|
|
275
|
-
|
|
276
|
-
1. bump `package.json`
|
|
277
|
-
2. merge to `main`
|
|
278
|
-
3. run the `release` workflow manually
|
|
279
|
-
|
|
280
|
-
The workflow runs typecheck, tests, coverage, build, packaging smoke checks, npm publish, tag creation, and GitHub Release creation.
|
|
281
|
-
|
|
282
|
-
Release notes: if `release-notes/v<version>.md` or `release-notes/<version>.md` exists, the workflow uses it. Otherwise it falls back to GitHub generated notes.
|
|
283
|
-
|
|
284
|
-
## Maintainer benchmark
|
|
285
|
-
|
|
286
|
-
```bash
|
|
287
|
-
npm run bench:test-status-ab
|
|
288
|
-
npm run bench:test-status-live
|
|
289
|
-
```
|
|
290
|
-
|
|
291
|
-
Uses the `o200k_base` tokenizer and reports command-output budget as the primary benchmark, with deterministic recipe-budget comparisons and live-session scorecards as supporting evidence.
|
|
292
|
-
|
|
293
|
-
## Brand assets
|
|
297
|
+
## Docs
|
|
294
298
|
|
|
295
|
-
|
|
299
|
+
- CLI reference: [docs/cli-reference.md](docs/cli-reference.md)
|
|
300
|
+
- Worked examples: [docs/examples](docs/examples)
|
|
301
|
+
- Benchmark methodology: [BENCHMARK_NOTES.md](BENCHMARK_NOTES.md)
|
|
302
|
+
- Release notes: [release-notes](release-notes)
|
|
296
303
|
|
|
297
304
|
## License
|
|
298
305
|
|